[Asterisk-Users] UPDATE - 512 Calls w/ Dig Rec: NFS Setup and Benchmarks

Mon Oct 3 14:54:16 MST 2005

List members,

My previous post "SUCCESS - 512 Simultaneous Calls with Digital Recording" documents using a RAM disk to eliminate the I/O 
bottleneck associated with digitally recording calls via the Monitor application. By recording directly to a RAM disk I was able to maintain good call quality on 512 simultaneous calls.

This post documents moving the calls from the RAM disk to a hard disk on a remote machine via NFS. The setup is not resource intensive on the Asterisk server and should not impact call quality. As always, I welcome suggestions for improvement and the identification of errors and omissions.

------------------------------------------------------------

Asterisk Configuration
======================

The two leg files of a call will be moved immediately after the call is complete via the MONITOR_EXEC and MONITOR_EXEC_ARGS variables. MONITOR_EXEC is generally used to replace soxmix as the application for mixing the raw leg files, but I'm using it as a hook to move them to an NFS mounted drive specified by MONITOR_EXEC_ARGS as follows:

-From the dialplan (extensions.conf):
 exten => _XXXX,1,SetVar(MONITOR_EXEC=mvdr)
 exten => _XXXX,2,SetVar(MONITOR_EXEC_ARGS=/digrec-nfs/)
 exten => _XXXX,3,Monitor(pcm||m)

mvdr is a shell script sitting in /usr/sbin/:
 #!/bin/bash
 /bin/nice -n 19 mv $1 $4 &
 /bin/nice -n 19 mv $2 $4 & 

Digitally recording via Monitor can also be initiated from agent channels and queues. The MONITOR_EXEC and MONITOR_EXEC_ARGS variables are still set from the dialplan, but you must tell Asterisk to mix the files for them to be used. This is accomplished as follows:

-Agents: The channels must be configured in agents.conf for recording:
 recordagentcalls=yes	; The leg files are always joined

-Queues: Each queue must be configured in queues.conf for recording and joining the leg files:
 monitor-format=pcm
 monitor-join=yes

Using this hook to trigger the moves of the leg files has two distinct advantages. First, the leg files are removed from the RAM disk as soon as possible, minimizing the amount of RAM needed to buffer the calls. Secondly, the RAM disk is volatile storage so moving the leg files to stable storage as soon as possible minimizes the number of digital recordings that will be lost in the event of an Asterisk server crash.

NFS Configuration
=================

A fast NFS connection is needed for two reasons. First, the size of the RAM disk is limited by the amount of physical memory so we have to move data off it as quickly as possible to avoid filling it. Secondly, minimizing the amount of time needed to transmit the leg files prevents a large number of moves from building up on the system. Too many background processes leads to resource consumption which inhibits Asterisk's ability to maintain call quality.

To attain the needed speed, I chose asynchronous NFS (version 3) using UDP and 8K block sizes transmitted via a crossover Gigabit connection configured for jumbo frames. The Asterisk server is the NFS client in order to minimize resource consumption, and the Digital Recording server runs the NFS daemons. 

I decided to use an asynchronous NFS transfer because it allows the NFS server to reply to NFS client requests as soon as it has processed the request, without waiting for the data to be written to disk. This yields better performance at the cost of possible data corruption in the event of an NFS server crash. 

UDP was chosen because it is a stateless protocol and will not cause the NFS client to hang if the NFS server crashes in the middle of a packet transmission. The Asterisk server (NFS client) uses a soft, interruptable mount to prevent hanging if the Digital Recording server (NFS server) crashes, as well.

Jumbo frames are used to minimize the number of CPU interrupts and general proccessing overhead for a given data transfer size. The 8K block sizes (plus packet headers) fit into the 9000 byte MTU allowing for efficient transfers between the NFS client and server without packet fragmentation. 

nfsd is started at boot on the Digital Recording server (NFS server) in runlevels 3, 4, and 5.

Note that throughout the configuration I am sacrificing data integrity at the expense of speed and NFS client reliability. I feel this is an acceptable trade, given the realtime nature of Asterisk and the criticality of speed to this application.

This configuration involves hardware and software, so I'll review both. If you would like full copies of any configuration files, please contact me off list.

Hardware Profiles
-----------------

-Asterisk Server (NFS Client)
 -   Machine: Dell PowerEdge 6850
 -       CPU: Four Intel Xeon MP CPUs at 3.16 GHz - Hyperthreaded
 -       RAM: 20 GB (4 GB System / 16 GB RAM Disk)
 -       NIC: Intel Pro/1000 MT Dual Port Server Adapter
 - Exp. Slot: PCI-X, 64 bit, 133 MHz

-Digital Recording Server (NFS Server)
 -   Machine: Dell PowerEdge 1850
 -       CPU: One Intel Xeon CPU 2.80 GHz - Hyperthreaded
 -       RAM: 2 GB
 -       NIC: Intel Pro/1000 MT Gigabit Ethernet Adapter - Onboard
 -Hard Disks: Two 73 Gigabyte SCSI Drives Configured in a RAID 1 (Mirrored)
 -Controller: PERC 4e/Si RAID Controller - Onboard

-Physical Connection
 -Cat5e Crossover Cable

Software Profiles
-----------------

-Asterisk Server (NFS Client)
 -         OS: Fedora Core 3 - 2.6.12-1.1376_FC3smp Kernel
 -RAM Disk FS: ext2
 - NIC Driver: e1000

-Digital Recording Server (NFS Server)
 -          OS: Fedora Core 3 - 2.6.12-1.1376_FC3smp Kernel 
 -Hard Disk FS: ext3
 -  NIC Driver: e1000

RAM Disk Setup
--------------

-Asterisk Server (NFS Client)
 -Edit the bootloader configuration file (/boot/grub/grub.conf in my case) to pass the ramdisk_size parameter to the kernel:
  kernel /vmlinuz-2.6.12-1.1376_FC3smp ro root=LABEL=/ quiet ramdisk_size=16777216
 -Append lines to the /etc/rc.local init script to format and mount the RAM disk at boot time:
  # Formats and mounts the RAM disk used to buffer digital recordings.
  /sbin/mke2fs -q -b 1024 -m 0 /dev/ram0
  /bin/mount /dev/ram0 /digrec

Subnet Setup
------------

-Asterisk Server (NFS Client)
 -Edit the /etc/sysconfig/network-script/ifcfg-eth# file for the dedicated NIC so that it is on a different subnet than the other NICS in the system. No default gateway is necessary. The following values work on my network:
  IPADDR=192.168.55.2
  NETMASK=255.255.255.0

-Digital Recording Server (NFS Server)
 -Edit the /etc/sysconfig/network-script/ifcfg-eth# file for the dedicated NIC so that it is on a different subnet than the other NICS in the system. No default gateway is necessary. The following values work on my network:
  IPADDR=192.168.55.1
  NETMASK=255.255.255.0

After rebooting both machines you can confirm the setup using the route command:

[root at asterisk ~]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.55.0    *               255.255.255.0   U     0      0        0 eth3
192.168.51.0    *               255.255.255.0   U     0      0        0 eth2
XXX.XXX.X.X     *               255.255.0.0     U     0      0        0 eth2
default         XXX.XXX.XX.XX   0.0.0.0         UG    0      0        0 eth2

Note that the eth3 interface is only used for the 192.168.55.0 subnet. If you see similar behavior on both of your machines, your subnet setup was successful.

Jumbo Frame Setup
-----------------

-Both Servers (NFS Client and NFS Server)
 -Edit the /etc/sysconfig/network-scripts/ifcfg-eth# file for the dedicated NIC so that it has a 9000 byte MTU:
  MTU=9000

After rebooting both machines you can confirm the setup using the tracepath command:

[root at asterisk ~]# tracepath 192.168.55.1
 1:  192.168.55.2 (192.168.55.2)                            0.147ms pmtu 9000
 1:  192.168.55.1 (192.168.55.1)                            1.852ms reached
     Resume: pmtu 9000 hops 1 back 1

Note that the pmtu (path MTU) is 9000. If you see similar behavior on both of your machines, your jumbo frame setup was successful. 

Gigabit Ethernet Confirmation
-----------------------------

As long as both of the NICs you configured are Gigabit NICs they should auto-negotiate a speed of 1000 Mbps. You can confirm this by using the ethtool command on both machines:

[root at asterisk ~]# ethtool eth3
Settings for eth3:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: d
        Current message level: 0x00000007 (7)
        Link detected: yes

NFS Server Optimizations
------------------------

-Digital Recording Server (NFS Server)
 -Edit the NFS init script (/etc/rc.d/init.d/nfs in my case) so that it starts 20 nfsd threads. This number is an estimate. Increase the number if netstat -s reports UDP socket overflows, decrease it if uptime reports a rising load average:
  # Number of servers to be started by default
  [ -z "$RPCNFSDCOUNT" ] && RPCNFSDCOUNT=20  
 -Edit the start section of the NFS init script (/etc/rc.d/init.d/nfs in my case) to increase the socket input queue for nfsd. This is important for NFS servers with heavy write loads, because each instance of nfsd uses this queue to store write requests while processing them:
  start)
     # Save the original size of the socket input queue - MJR
     rmem_default=`cat /proc/sys/net/core/rmem_default`
     rmem_max=`cat /proc/sys/net/core/rmem_max`

     # Increase the size of the socket input queue (32K per nfsd) - MJR
     rmem_nfsd=`echo "32768*$RPCNFSDCOUNT" | bc -l`
     echo $rmem_nfsd > /proc/sys/net/core/rmem_default
     echo $rmem_nfsd > /proc/sys/net/core/rmem_max

     # Start daemons.  
     ...

     # Return the socket input queue to its original size - MJR
     echo $rmem_default > /proc/sys/net/core/rmem_default
     echo $rmem_max > /proc/sys/net/core/rmem_max
     ;;     

Export Options
--------------

-Digital Recording Server (NFS Server)
 -Edit /etc/exports so that the desired directory is exported asynchronously for reading and writing, without squashing root:
  /var/Recorded   192.168.55.2(rw,async,no_root_squash)

Mount Options
-------------

-Asterisk Server (NFS Client)
 -Edit /etc/fstab so that the exported directory is mounted with the appropriate options for a fast connection (see NFS Configuration):
  192.168.55.1:/var/Recorded  /digrec-nfs  nfs  rw,soft,intr,bg,nolock,rsize=8192,wsize=8192,nfsvers=3,udp  0 0

NFS and RAM Disk Confirmation
-----------------------------

After rebooting both machines, you can confirm the setups using the df command on the Asterisk server (NFS client):

[root at asterisk ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda6              5162796    535948   4364592  11% /
/dev/sda2               124443     20067     97950  18% /boot
none                  10286816         0  10286816   0% /dev/shm
/dev/sda5              5162796   1645400   3255140  34% /home
/dev/sda7             10317828   4500676   5293036  46% /usr
/dev/sda8             47524452   1687956  43422332   4% /var
/dev/ram0             16510000      3855  16506145   1% /digrec
192.168.55.1:/var/Recorded
                      47564616   1257512  43890936   3% /digrec-nfs

Note the last two entries. The RAM disk is shown as /dev/ram0 mounted on /digrec and the NFS mount is shown as 192.168.55.1:/var/Recorded mounted on /digrec-nfs. If you see similar behavior, your NFS and RAM disk setups were successful.

Further Optimization Options
----------------------------

The disks of my NFS server are configured as a RAID 1 (mirrored). This offers redundancy at the expense of slower writes, since each write is made to both disks. Configuring the disks as a RAID 0 (striped) would yield better NFS performance by decreasing write times if disk I/O on the NFS server proves to be a bottleneck.

The e1000 drivers used by my Intel NICs can be configured via command line parameters set in /etc/modprobe.conf. By increasing the number of RxDescriptors and TxDescriptors the drivers could buffer more incoming packets and queue more transmits, respectively. As per Intel's driver guide, the default settings are generally the recommended settings.

NFS Benchmarking
================

My first benchmarks were performed by creating files of zeroed bytes from /dev/zero across the NFS mount. The commands to do this for each set of benchmarks follows:

   2 GB - time dd if=/dev/zero of=/digrec-nfs/testfile bs=16k count=131072
   1 GB - time dd if=/dev/zero of=/digrec-nfs/testfile bs=16k count=65536
 256 MB - time dd if=/dev/zero of=/digrec-nfs/testfile bs=16k count=16384
  30 MB - time dd if=/dev/zero of=/digrec-nfs/testfile bs=16k count=1920
   3 MB - time dd if=/dev/zero of=/digrec-nfs/testfile bs=16k count=192

For reference, a 3 MB file is roughly 6 minutes of PCM audio and a 30 MB file is roughly 1 hour of PCM audio. These sizes represent one leg of an average call and one leg of a long call for my application. The results of testing on these files are of the most interest to me, and the larger files are used simply to stress test the NFS configuration under heavier loads than should ever be seen.

I used sar on the Asterisk server (NFS client) to monitor CPU usage and iostat on the Digital Recording server (NFS server) to monitor CPU and disk usage. The NFS mount was unmounted and remounted after each test to clear any caches. Without further ado, here are the numbers:

                           Client CPU  Server CPU  Server Max. Blocks
Filesize  Runtime    Mbps  Min. Idle%  Min. Idle%  Written / Sec.
--------  ---------  ----  ----------  ----------  ------------------
  2 GB    0m54.756s  299   84.00       03.50       134768.00
  2 GB    0m45.843s  357   84.65       06.44       121317.65
  2 GB    0m48.347s  339   77.13       07.04       126800.00
  1 GB    0m23.355s  351   87.30       07.46       119041.58
  1 GB    0m22.110s  371   87.41       07.46       114035.64
  1 GB    0m24.061s  340   83.41       08.42       138233.66
256 MB    0m03.146s  651   90.91       40.50        75896.00
256 MB    0m03.150s  650   92.54       41.00         7045.65 *
256 MB    0m03.145s  651   93.40       41.79       106488.00
 30 MB    0m00.374s  642   97.38       80.50         6579.06 *
 30 MB    0m00.373s  643   97.62       83.00        60966.34
 30 MB    0m00.373s  643   98.13       83.50         5328.15 *
  3 MB    0m00.041s  585   99.62       95.17         4902.11
  3 MB    0m00.041s  585   99.75       95.48         4584.44
  3 MB    0m00.041s  585   99.63       95.60         4456.03

* These numbers look a little off. I think I stopped iostat before the NFS server's write buffers had been flushed to disk. Ooops!

My second benchmarks were performed by creating 120 files, each of which was 3 MB in size (360 MB total). They were named in pairs to simulate sixty pairs of leg files, each containing 6 minutes of PCM audio. The goal of this test was to measure the stress on both machines when sixty average calls end and are transferred using the mvdr script over a given interval. The first set of tests had a 1 second sleep between each call termination (typical call volume), the second set had a 0.1 second sleep (high call volume), and the third set of tests had no sleep (unrealistically high call volume/stress test). The script used to conduct these benchmarks had the following format:

 #!/bin/bash
 mvdr 001a.txt 001b.txt dummy /digrec-nfs/ &
 sleep 1		# 0.1	# 0.0
 mvdr 002a.txt 002b.txt dummy /digrec-nfs/ &
 sleep 1		# 0.1	# 0.0
 mvdr 003a.txt 003b.txt dummy /digrec-nfs/ &
 sleep 1		# 0.1	# 0.0
 ...

The approximate run times that I reported are just that. On both the NFS client and the NFS server these times were approximated by reviewing the sar and iostat logs and noting when the CPU began to be utilized and when it returned to idle. The maximum number of simultaneous processes on the NFS client (Asterisk server) is also an approximation. I simply ran ps repeatedly during the tests and counted how many instances of the mvdr script were running.

It is important to note that the leg files were being transferred from the RAM disk of the Asterisk server (NFS client) to the hard disk of the Digital Recording server (NFS server) in order to simulate the actual scenario of the production environment. Once again, the NFS mount was unmounted and remounted after each test to clear any caches.

                    Client    Client    Client     Server    Server    Server
Total     Sleep     Approx.   CPU Min.  Max. Sim.  Approx.   CPU Min.  Max. Blocks 
Filesize  Interval  Run Time  Idle%     Processes  Run Time  Idle%     Written / Sec.
--------  --------  --------  --------  ---------  --------  --------  --------------
360 MB    1.0s      60s       97.63       2        60s       55.28      61624.00
360 MB    1.0s      60s       92.39       0        60s       75.00      62865.31
360 MB    1.0s      60s       95.13       2        60s       67.33      62214.14
360 MB    0.1s      07s       90.75       2        16s       18.91      97615.69
360 MB    0.1s      07s       90.51       2        15s       24.88     110518.81
360 MB    0.1s      07s       90.75       8        14s       23.38      95264.00
360 MB    0.0s      05s       75.00      95        14s       20.10      93259.41
360 MB    0.0s      05s       80.75     103        14s       24.50     106352.00
360 MB    0.0s      05s       81.40      99        14s       21.00     114091.09

If you would like a zip file containing the raw test data or copies of my test scripts and files, please contact me off list.

Conclusions
-----------

The speed of the NFS connection is more than adequate for this application and at realistic call volume levels the added strain on the Asterisk server should not effect call quality. The bottleneck looks to be CPU and possibly disk I/O on the NFS server. This is not unexpected, since NFS uses code embedded in the kernel which must be executed on the host CPU and large amounts of data are being transferred across the connection in small periods of time.

I will monitor memory usage (to see how large the RAM disk actually needs to be) and CPU (for stress) on the production system once it is up and report my findings to the list.

Future Plans and Unresolved Issues
==================================

I wrote Windows software for another project that mixes leg files, indexes them by call time, and archives them after a given period of time. I plan to port that code to a set of shell scripts that will be run on the Digital Recording server out of cron. If anyone knows of an existing project that has accomplished this already, please let me know.

Before writing these scripts, I have two questions that need answered:

1) How can I tell when a file is complete on the NFS server?

2) What will happen on the NFS client if the NFS server crashes (I expect the leg files to be written to the local mount point until the mount is reesablished)?

If you can answer either of these questions, I would greatly appreciate your contribution.

NFS Test Tools
==============

df
ethtool
ifconfig
lspci
netstat
nfsstat
showmount
tracepath
uptime
/proc/net/rpc/nfsd

Sources
=======

Linux Administration Handbook
by Evi Nemeth, Garth Snyder, Trent R. Hein
Chapter 17 - The Network File System
http://www.amazon.com/exec/obidos/tg/detail/-/0130084662/103-5601970-3037440?v=glance

Linux NFS-HOWTO
by Tavis Barr, Nicolai Langfeldt, Seth Vidal, Tom McNeal
Chapter 5 - Optimizing NFS Performance
http://nfs.sourceforge.net/nfs-howto/

Gigabit Ethernet Jumbo Frames and Why You Should Care
by Phil Dykstra
http://64.233.161.104/search?q=cache:_XJNulIGbj0J:sd.wareonearth.com/~phil/jumbo.html+%22gigabit+ethernet+jumbo+frames%22+%22and+why+you+should+care%22&hl=en

Linux* Base Driver for the Intel® PRO/1000 Family of Adapters
by Intel Corporation
http://www.intel.com/support/network/sb/cs-009209.htm

Linux Ramdisk mini-HOWTO
by Van Emery
http://www.vanemery.com/Linux/Ramdisk/ramdisk.html

------------------------------------------------------------

Thank you for taking the time to read this post. I will try create a wiki page from it (with all of your additions, of course) when time permits. If anyone would like to perform this task feel free to do so, but please contact me first. I want to avoid multiple pages with the same content being created.

Yours truly,

Matthew Roth
InterMedia Marketing Solutions
Software Engineer and Systems Developer