[Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

Sun Jan 4 22:51:46 MST 2004

On Sun, 2004-01-04 at 21:23, Rich Adamson wrote:
> Part of the point of many of the questions is that there really are a
> lot of dependencies on devices other then asterisk, and simply going down
> a path that says clustering (or whichever approach) can handle something
> is probably ignoring several of those dependencies which does not actually
> improve the end-to-end availability of asterisk. (Technically, asterisk
> is up, you just can't reach it because your phone (or whatever) doesn't
> know how to get to it.)
> 
> Using another load-balancing box (F5 or whatever) only moves the problem
> to that box. Duplicating it, moves the problem to another box, until
> the costs exponentially grow beyond the initial intended value of the
> solution. The weak points become lots of other boxes and infrastructure, 
> suggesting that asterisk really isn't "the" weakest point (regardless of 
> what its built on).

Rich is hitting the main point in designing anything for high
reliability. So lets enumerate failures and then what if anything can be
done to eliminate them.

1. Line failures.
I'll lump them together as they can occur anywhere from the CO to your
premises. I've experienced them in just about every section in my short
time in this part of the industry. I have had lines broken inside the
CO. I have had water get to the lines along the street during
construction, and it could have just as easily been the construction
people cutting the line if they had been any more careless. Inside the
building problems that luckily aren't as likely to crop up after
install. BTW, this is the same even if your incoming phones are VoIP
lines. 

2. Hardware failure. 
This can be drives, memory, cpu, NIC, or any other part that basically
renders the hardware unavailable or unstable.

3. Software failure.
This could be any number of bugs not yet found or that will be
introduced later.

4. Phones.
This can be split to a VoIP and an analog section as the problems and
solutions are different.
a. VoIP
b. analog

5. Power.
This also falls into two parts split on VoIP and analog as it doesn't
help to have power on the switch if all your phones go dark. Think about
in cases where there is a storm or other adverse conditions and you need
to call authorities.

So now you go to solutions. 
1. Your solution to this is based on budget because the only solutions
cost a monthly fee. Also for truely good solution, the install fee will
go up too. Basically the solution here comes via redundancy. Not just in
multiples, but in getting the lines from different locations and making
sure they don't follow the same paths. Most locations are not wired from
different paths unless your location attracted a fiber loop. So if you
have to have it, it might cost quite a bit or not be available.

2. Raid and hot swap drives combined with hot swap redundant power
supplies. This is about the limit of what is currently available on a
budget in the x86 world. Also with Raid, make sure you have actual
redundancy. Raid doesn't always mean you are in a condition all the time
to recover from a failure. If it is really important, you will also have
hot spares in the machine. As you can see, this adds cost each time you
add a drive to make a system more resilient to failure.

During a recent presentation at our LUG, it was explained that even Raid
can fail. The presenter had several drives die all together due to an AC
failure. They had hot spares, but as drives failed, extra stress was
applied to weakened drives till they failed. Soon they exceeded their
fault tolerance and had to rely on what they could scrape together from
backups to recover.

So if possible, look into Raid equipment that has some form of interface
to see what is going on especially if you aren't in a monitored
environment. If your Raid is able to generate messages at the driver
layer and you can watch these messages, you can fix a problem before it
escalates. 

While multiple machines is another way of solving a total system
failure, you are probably more likely to experience a line failure more
often than a hardware failure if you treat your hardware well. Some
forms of this solution also require software modification.  

3. This one basically only is combated by due diligence. Mark and the
other CVS comiters due their best to review everything before it goes
in. Those who write patches try not to write buggy code. The
implementers should still spend some time testing all the components to
verify the functions work as needed.   

4. Phones luckily have few failures. And when they do fail, it doesn't
usually take down any other phones. Analog phones can be just swapped
out as there are few differences between them. Only ADSI would
complicate this, but not if you had spares of the same ADSI phones. VoIP
is pretty much the same.

5. Power is important as good clean power makes your hardware last
longer. Add to this that it is needed to survive any adverse weather
conditions. Analog phones makes your power requirements able to be
centralized. VoIP either needs to be powered with power over ethernet or
you will need to support power at every phone. At some point a good
enterprise solution is for a building backup, and small UPSs for the
units that it is important to isolate from a generator coming online. I
have witnessed fried hardware during a generator test while working in a
hospital. So this is pretty high on the list of importance.

So places to work to make at least high availability on a budget for the
small to medium company. 
1. Code. Test it well before deployment.
2. Power. This is important to keep your hardware safe.
3. Hardware. Start with good quality, then add in redundancy.

-- 
Steven Critchfield <critch at basesys.com>