[Asterisk-Users] Bonded ethernet ports and *

Wed Dec 14 08:07:30 MST 2005

> Rich - Even though I mentioned ethernet failover, I might have made it still
> a little too broad. The linux ethernet bonding module has been around for
> years, and there are several modes the linux bonding module can use which
> include:
> 
> mode=0 (balance-rr)
> Round-robin policy: Transmit packets in sequential order from the first
> available slave through the last. This mode provides load balancing and
> fault tolerance.
> 
> mode=1 (active-backup)
> Active-backup policy: Only one slave in the bond is active. A different
> slave becomes active if, and only if, the active slave fails. The bond's MAC
> address is externally visible on only one port (network adapter) to avoid
> confusing the switch. This mode provides fault tolerance. The primary option
> affects the behavior of this mode.
> 
> mode=2 (balance-xor)
> XOR policy: Transmit based on [(source MAC address XOR'd with destination
> MAC address) modulo slave count]. This selects the same slave for each
> destination MAC address. This mode provides load balancing and fault
> tolerance.
> 
> mode=3 (broadcast)
> Broadcast policy: transmits everything on all slave interfaces. This mode
> provides fault tolerance.
> 
> mode=4 (802.3ad)
> IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share
> the same speed and duplex settings. Utilizes all slaves in the active
> aggregator according to the 802.3ad specification.
> 
> What I was talking about was simply mode 1, active-backup. Some of our past
> equipment's network interfaces had some issues with link up/down which could
> only be traced back to the ethernet port itself, so using bonding to use two
> ports for active/backup failover works very smoothly. Our policy is 500ms
> mii monitor for link status, and then a wait of 500ms before actually
> failing over for a total of about 1s of possible down time. This also
> benefits us as we use redundant switches in our distribution layer so that
> if one of the switches goes down, it automatically switches over. My
> question was really more of the bonding module than anything else, and how
> much more overhead it puts on. Most of the other modes(except 0) typically
> require trunk ports or special switch setup, since my issues are not
> bandwidth related, I've stayed away from them. I'd agree that nics are the
> least concerning, but if you have an extra eth port, and aren't using it for
> something already, why not make it a failover port..

Cool... just test the implementation to ensure what you are expecting is
truly what happens with no assumptions. The majority of my previous comments
were oriented around that thought process and see'ing a large number of
system admin's that assume all documentation, etc, is 100% accurate. Typically
its not.

A fairly common assumption is the failover happens in xxx milliseconds, but
due to nic card design (etc) a different MAC address is used in the failover
condition. That confuses the hell out of the layer-3 boxes and negates the
value of the failover. (All documentation, etc, is correct but actual
implementation in this example is limited by the nic card's inability
to use a different MAC address from what's programmed into it. There are
a large number of current nic cards like that.)

I'd suggest that your comment about "...traced back to the ethernet port..."
and using the failover approach is sort of like saying rebooting the box
fixed the problem. No, it bypassed the problem; what was the root cause
of the problem?

I'd certainly agree with comments relative to high availability and redundancy,
and it sounds like you've done the technical research (and probably testing)
to validate the implementation. That's excellent, but I can assure you that's
not the norm for the majority of implementations that I've seen. (Then again,
we are not typically contracted into a business where their network and
system resources are working well. ;)

As far as the added overhead, I've never attempted to quantify it. But, it
shouldn't be all that difficult to measure its impact from a throughput
and failover perspective. Best guess: probably insignificant overhead.

Rich