Maxwell Network Emulator Products and Solutions
Results of Testing Voice Quality on Several VoIP Phones and Devices

End-to-End VoIP Product Comparison Testing with Maxwell

Overview
Summary
How We Tested

Table 1: Voice Source, Cisco-7905-to-Cisco-7905
Table 2: Voice Source, Cisco-7905-to-Cisco-7960
Table 3: Voice Source, Cisco-7960-to-Cisco-7935-ConfPhone
Table 4: Voice Source, Cisco-7960-to-Cisco-ATA186
Table 5 - Cisco ATA186 POTS-Adapter
Table 6 - Pingtel xpressa PX-1
Table 7 - Pingtel xpressa Software Phones
Appendix A - Notes

Overview

This report summarizes the results of comparison testing between six Voice-over-IP products:

  • Pingtel Instant Xpress (a software phone on a Dell PC)
  • Cisco 7960 (a full-featured VoIP phone)
  • Cisco 7905 (a streamlined-functionalityphone)
  • Cisco 7935 "ConfPhone" (a conferencingphone)
  • Cisco ATA186 "POTS Adapter" (a device that allows a plain ordinary telephone to be connected to the Internet for VoIP use).

All hard and software phones were tested under a variety of network conditions nearly identical to what they might encounter when operated in the open Internet. Although it is possible to upgrade network infrastructure to provide QoS guarantees, this will not be helpful in most cases. In the general case, "anyone-talking-to-anyone" communications, product performance could be affected by network disturbances. These disturbances include jitter, packet drop, reorder, duplication, and others. The report includes sound clips so one can hear and judge the voice quality results of representative conditions. We used the Maxwell[R] Network Emulator to impose these real-world network conditions. The test methodology is also described.

Summary:

Maxwell makes it very easy to perform side-by-side product, product-version, regression and -interoperability comparisons [1] under controlled and realistic network conditions.

Some VoIP phone manufacturer's documentation recommends that the user's network be designed within certain quality-of-service conditions, such as jitter not to exceed 30ms, so we used those as a starting point in these measurements. We picked 25ms and 30ms. The Pingtels performed well at jitter levels far in excess of these, as you can hear from the sound clips.

The sound clips also demonstrate how combinations of network disturbances or impairments affect the phones. Individual impairments may not affect voice quality, however, in combination with other impairments, voice quality is degraded. For example, at an average jitter of 25ms, we found no audible distortion in the Cisco 7960 phones unless we also added reordering.

The sound clips are in WAV format, which most desktop computers can play. All are digitized at 8KHz, 16-bit resolution, monophonic. For reference, CD-quality sound is 44.1KHz, 16-bit resolution, stereo. You can listen to the recordings and judge for yourself. Reference recordings are also included for `best case' network conditions. You will need a PC with a sound card. It is best to listen with good headphones, rather than typical desktop PC speakers. Actual sound quality effects are more accurate if the sound comes from a source near your ears, just as it does with a regular phone. Good headphones also block ambient noise, allowing you to hear just the recording.

The following tables show network conditions at which the phones were tested. Only three of the many kinds of possible network impairments were tested. These three are:

Jitter: uniformly-distributed random amounts of delay added to voice data packets. Maxwell keeps track of the arrival time and the exit time of each packet, automatically calculating an average delay which is displayed and updated in real-time by the graphical user interface and also shown in the tables below.

Drops: voice data packets are randomly selected to be dropped. Distribution function is uniform. The mean is given; e.g., at 1% drop, 1 out of a hundred packets is dropped. This number applies to both directions, which means that the effective packet loss in each direction is about half that number (e.g., when the drop rate is set to 3%, each direction was showing a drop rate of 1.5%. The table column for drops has been adjusted for this fact

Reorder: the order in which packets arrive can be changed. The higher the number, the more reordering takes place. In real networks, packet-reordering can take place occasionally when routes are adjusted, and consistently over tandem links (a commonplace solution when a quick bandwidth-upgrade is needed). You can think of the reorder-number as being the number of extra data links: e.g., reorder 0 -> one data link, reorder 1-> two tandem data links, reorder 2 -> three tandem data links, etc.

We did not duplicate, modify or corrupt packets, though Maxwell can do those things too.

The table below provides a sound clip for each phone under each condition, along with a text notation of the voice quality. Click on the sound clip to hear for yourself exactly what the indicated network conditions do to the tested equipment (i.e., "what that sounds like"). These clips were digitized at 8KHz, 16-bit monophonic.

How We Tested

Other than the effects introduced by Maxwell, the network was a quiet internal 10/100 switched LAN, i.e., almost perfect.

Two kinds of audio source material were used: a snippet from a local radio station's news reporting[2], and a 1000 Hz test tone. Both were recorded onto CD-R media and played using an RCA portable CD player. The headphone jack was connected via adapter to RJ11 connector on the phone. The CD player's volume control was adjusted so that with no impairments from the Maxwell, the signal was loud, clear and undistorted.

For all measurements except the ones to the Cisco 7935 ConfPhone (which has no handset), the receiving phone's handset cord was connected thru an RJ11 adapter to ministereo plug, and fed directly into a PC sound card. The purpose in doing so was to avoid speaker-to-microphone distortion and background noise pickup. The recording volume control was adjusted for maximum clarity and volume without distortion when the Maxwell was set to no impairments.

Since the Cisco 7935 ConfPhone has no handset and no way to directly record the output signal, for these measurements, an AudioTechnica ATR20 cardioid low-impedance microphone was suspended one inch above the Cisco 7935 ConfPhone speaker. These measurements were taken in a separate and quiet (though not anechoic) room, away from our lab's equipment, RF emissions, and fan noises.

For each set of tests, a reference recording was made. Listen [3] to the reference recording to hear what "best case" sounds like.

For reference recordings, Maxwell was set to 0 ms jitter, 0% drop, no reordering. In other words, Maxwell did not impair any of the traffic. These reference files contain some noise picked up by the sound card and cabling, not introduced by either Maxwell or its effect on VoIP traffic. It is recognizable as 60-Hz "hum" and also hiss. You hear it in all samples. Network effects on VoIP tends by be heard as gaps and dropouts, or in some cases like the person is gargling or talking underwater. For the test-tone measurements, instead of a steady tone, it sounds more like you're listening to Morse Code.

  • Tables 1 thru 4 below show results when the audio sources are human voices, a woman's and a man's, speaking clearly.
  • Table 5 shows the POTS Adapter.
  • Table 6 shows the Pingtel "hard" phone.
  • Table 7 shows the Pingtel "software" phone.


Table 1: Voice Source, Cisco-7905-to-Cisco-7905

Jitter
(in ms)
Drop
(in %)
Reorder Recording Comments
0 0 0 Reference file
25 1 1 Gargling/"underwater sound". Annoying.
25 3 1 Gargling, echoing, extreme distortion. Very annoying, intelligible only for slowly speaking talkers
25 4 1 Very distorted. Unacceptable.
30 5 2 Very distorted


Table 2: Voice Source, Cisco-7905-to-Cisco-7960

Jitter Drop Reorder Recording Comments
0 0 0 Reference file
25 1 1 Slightly distorted
37 3 0 Distorted but understandable
37 5 0 Distorted, noise pops
37 7 1 Very distorted

Table 3: Voice Source, Cisco-7960-to-Cisco-7935-ConfPhone

Jitter Drop Reorder Recording Comments Notes
0 0 0 Reference These measurements were taken using a microphone and thus subject to some speaker-to-microphone distortion
30 1 0 Some distortion  
30 2 1 Distortion  
30 4 1 Very distorted  
30 5 1 Very distorted  

Table 4: Voice Source, Cisco-7960-to-Cisco-ATA186

Jitter Drop Reorder Recording Comments Notes
0 0 0 Reference file  
30 0 0    
30 2 1 Slightly-noticeable gargling sound  
30 4 0 Clear  
30 4 1 Very distorted  

Tests started from no impairments, then increased drop percentage at one- percentage-point intervals. At each interval, jitter started at 0 ms (the "set-point") then increased, and the same with reordering. This order may be meaningful depending upon how the receiving units compensated for these impairments.

Table 5 - Cisco ATA186 POTS-Adapter

Jitter
(in ms)
Drop
(in %)
Reorder Recording Comments
0 0 0 Reference file, no impairments
30 0 0  
30 2 1  
30 4 0  
30 4 1  

Table 6 - Pingtel xpressa PX-1

Jitter
(in ms)
Drop
(in %)
Reorder Recording Comments[4]
0 0 0 Reference, no impairments
155 12.5 1 Some warbling
155 15 1 Warbling
200
5
2
Toggle b/n no impairments.Can hear some distortion, echo and warbling at the transitions. Very garbled in spots.
247 2.5 2 Definite warbling, unacceptable quality

Table 7 - Pingtel xpressa Software Phones

Jitter
(in ms)
Drop
(in %)
Reorder Recording Comments[4]
200 0 0 Clear
200 0 0 Toggles between no impairments and 200ms jitter. Some warbling audible at the transitions.
200 5 0 Toggle no impairments. No discernible changes at the transitions.
200 10 0 Toggle no impairments. Some echo audible at transitions
200 10 1 Very bad, unacceptable. Warbling.

Appendix A - Notes


The Cisco 7905-to-Cisco 7960 jitter25msdrop3pctreorder1 measurement had to be redone. The phone had dropped its connection and no audio was present in the recorded file. Cause unknown.

When the Cisco 7935 ConfPhone is taken off HOLD (un-muting the speaker), even with no impairments set by the Maxwell, the resulting voice quality is sporadic for a period of time (~30s)

During most Cisco VoIP phone; tests, connection to its call set-up director software would be lost, but audio data continued to be sent and the source audio was still audible. Hanging up the phone in some cases did not restore this connection. Reducing the impairment parameter "reorder" back down was not enough to make the phones work: I could dial but the call would not complete when I picked up. The called phone kept ringing even after its handset was picked up.

Why this matters: Security vulnerability -- Denial of Service

This would be a security vulnerability at the very least in the sense of a denial-of-service attack: by manufacturing "bad" network conditions, it would be possible to prevent the phones from switching between calls or placing another call. This occurs under conditions where the voice quality is bad but still intelligible. Although in principle it would be possible for the network to be installed such that the phones were on a separate physical network than PCs, which as we know tend to be vulnerable to Email-based and other forms of virus. Since the Cisco 7960s have a PC LAN connector, in practice this might not be so easy to enforce.

References:

RTP: RFC1889


Footnotes

[1] Within the limits of what firmware versions are allowed to be simultaneously extant

 

[2] In the real world, one is much more likely to hear and converse daily with many different people with many different speaking rates and accents. We kept these measurements relatively simple, but bear in mind when you listen to the recordings that the effects of small distortions are magnified greatly when the speaker is speaking rapidly using a thick accent.

 

[3] With all these recordings, it is best to use good headphones for the following reasons: (1) You will hear the recording better, (2) You will hear the specific distortion, (3) Good headphones block ambient noise, and (4) You will be less distracted by other noises.

 

[4] The rows containing a comment indicating "toggling" mean that the recording contains a few seconds (about four) where the network conditions are as indicated for that row, followed by a few seconds (also about four) in which all impairments were set to zero. This repeats several times. By switching between "perfect" network conditions and impaired conditions, you can hear how the tested equipment's compensation algorithms adapt to sudden changes in conditions. Some compensation algorithms adapt to poor conditions gradually, and can continue to adjust ad conditions degrade, but sudden changes from "good" to "very bad" can cause momentary service disruptions as they adapt.
 
Terms of Use -  Privacy Policy -  Trademarks
©2001 - 2009 InterWorking Labs, Inc. ALL RIGHTS RESERVED.
For more information, please contact InterWorking Labs.