PDA

View Full Version : Problems Load balance network - FC4t2


aadm
2005-05-07, 06:24 PM CDT
Hi,
I´m using FC4t2 as gateway for a small network that should balance between two ISPs.

In this machine there is three network interfaces:
eth0 -> Cable modem ISP
eth1 -> Router Dlink - Adsl ISP
eht2 -> Internal network

I´m using a script Arno´s Iptables Firewall Script enabling protection in both external interfaces. (Arno´s Site - http://rocky.molphys.leidenuniv.nl/)

When I use eth0 or eth1 it works fine.

But when I enable the route for the balance acording using the indications in Linux Advanced Routing Control ...(http://lartc.org/howto/lartc.rpdb.multiple-links.html)

I´m using this command:
ip route add default scope global nexthop via 10.1.1.1 dev eth1 weight 1 nexthop via xxx.xxx.xxx.xxx dev eth0 weight 1

In this moment the route is enabled e it is possible to surf the web from internal network.
In about 1 minute the server stops.
I can take a look in /var/log/messages and there is a lot (40,50 per sec) of these kernel messages (next) , starting from the momment that I did the ip route command.

Does anyone have a clue ?

best regards,
Augusto


May 7 18:38:38 concordiasrv kernel: =======================
May 7 18:38:38 concordiasrv kernel: [<c0105a80>] do_IRQ+0x51/0x82
May 7 18:38:38 concordiasrv kernel: [<c0103c1e>] common_interrupt+0x1a/0x20
May 7 18:38:38 concordiasrv kernel: [<c0372f37>] schedule+0x367/0x7b3
May 7 18:38:38 concordiasrv kernel: [<c028d590>] elv_next_request+0x12/0x154
May 7 18:38:38 concordiasrv kernel: [<c013fff2>] autoremove_wake_function+0x0/0x37
May 7 18:38:38 concordiasrv kernel: [<c0374949>] io_schedule+0xe/0x16
May 7 18:38:38 concordiasrv kernel: [<c017d8c5>] sync_buffer+0x2e/0x31
May 7 18:38:38 concordiasrv kernel: [<c0374b55>] __wait_on_bit+0x42/0x5e
May 7 18:38:38 concordiasrv kernel: [<c017d897>] sync_buffer+0x0/0x31
May 7 18:38:38 concordiasrv kernel: [<c017d897>] sync_buffer+0x0/0x31
May 7 18:38:38 concordiasrv kernel: [<c0374bd6>] out_of_line_wait_on_bit+0x65/0x6d
May 7 18:38:38 concordiasrv kernel: [<c0140029>] wake_bit_function+0x0/0x3c
May 7 18:38:38 concordiasrv kernel: [<c017d931>] __wait_on_buffer+0x29/0x2e
May 7 18:38:38 concordiasrv kernel: [<c0181db4>] sync_dirty_buffer+0x99/0xec
May 7 18:38:38 concordiasrv kernel: [<ce859c83>] journal_get_descriptor_buffer+0x86/0x96 [jbd]
May 7 18:38:38 concordiasrv kernel: [<ce852fe7>] journal_write_commit_re>] sys_ioctl+0x5d/0x6b
May 7 18:38:38 concordiasrv kernel: [<c0103a61>] syscall_call+0x7/0xb
May 7 18:38:38 concordiasrv kernel: Badness in dst_release at include/net/dst.h:154 (Not tainted)
May 7 18:38:38 concordiasrv kernel: [<c0301f19>] __kfree_skb+0x141/0x146
May 7 18:38:38 concordiasrv kernel: [<c034d441>] arp_process+0x7e/0x488
May 7 18:38:38 concordiasrv kernel: [<c036eb21>] packet_rcv_spkt+0xd7/0x371
May 7 18:38:38 concordiasrv kernel: [<c034d938>] arp_rcv+0xed/0x14e
May 7 18:38:38 concordiasrv kernel: [<ce846fa9>] init_stall_timer+0x6d/0x70 [uhci_hcd]
May 7 18:38:38 concordiasrv kernel: [<c0308eb7>] netif_receive_skb+0x1cf/0x274
May 7 18:38:38 concordiasrv kernel: [<c0308fc3>] process_backlog+0x67/0xe4
May 7 18:38:38 concordiasrv kernel: [<c03090fb>] net_rx_action+0xbb/0x2bf
May 7 18:38:38 concordiasrv kernel: [<c01282be>] __do_softirq+0x3e/0x8a
May 7 18:38:38 concordiasrv kernel: [<c0105b85>] do_softirq+0x3e/0x42
May 7 18:38:38 concordiasrv kernel: =======================

desipher
2005-05-07, 07:20 PM CDT
That is pretty cool your running pretty much samething as multihome bgp with load balancing. First thing that comes to my mind is that your advertising wrong ip to wrong isp. I don't really use fedora as gateway I prefer bsd with pf.

aadm
2005-05-08, 06:16 PM CDT
In fact I´m a kind a newbie in this topic .. I´m not sure if I using bgp. I have issued just the iproute command .... Nothing else .. Bgp it is in iproute2/kernel implementation ?

The customer ask for fedora. But I should use FC3, not ... FC4.

In this case my IPs were assigned by DHCP.
Maybe the firewall rules are sending the wrong IP. I will take a look.

Thanks for the ideas ...
Augusto

desipher
2005-05-09, 09:30 AM CDT
Nah I'm not saying your running bgp but your network setup is cool concept way you can route stuff. You can't run bgp over cable or dsl line it has be atleast over T1 line or above. I just think your firewall is broadcasting the wrong IP to the wrong ISP on the transmit out of your network.

AndyGreen
2005-05-09, 09:35 AM CDT
lol your kernel is spewing backtraces... I don't think it's anything you are doing wrong but some kernel bugs. I see this in the backtrace

arp_process+0x7e/0x488

so I think maybe it is time to try older / newer kernel versions. If nothing fixes it, time to compile an unpatched kernel.org kernel. And if that does not fix it, time to bug it at bugme.osdl.org.