| Causes and Correlation of Network Impairments |
|
The purpose of this paper is to discuss the ways in which networks may be imperfect and how we evaluate and deal with those limitations. What do we mean by the "imperfect network"?There is no such thing as a perfect network. The laws of physics and mathematics impose limitations that are random (such as noise on copper or fiber optic cables, or hardware or software failures in routers and switches) or predictable (such as the speed of electrical pulses on wires or light on fiber optics). There are other sources of imperfection: Packets may be lost or delayed by transient congestion in switching elements of the net. Packets may be lost, replicated, or reordered by changes in routing or transitions between slow-path and fast-path routing mechanisms. Packet reordering may also be caused by "load balancing" of traffic between a pair of routers using a set of parallel links. These conditions tend to occur in bursts that span periods of time ranging from a few milliseconds to a few minutes. However longer periods are not atypical - congestive losses can last as long as the competing packet flows fight over some scarce resource, typically buffers, in a switching device. Instability in internet routing can cause bursts of lost or reordered packets as route tables are adjusted. There are times when there is no usable route for packets to flow from some point A to some other point B on the net. And reordering caused by parallel telecommunications links can last for as long as those links are in place. Many people tend to think of these as unusual or rare conditions. In the core of the Internet, a place of large data pipes, high powered switches and routers, and (usually) good traffic engineering, these conditions are infrequent, but they do occur. However, if one considers packet paths that pass through the periphery of the net - as most packet paths do - then one encounters overloaded exchanges and links, older and under provisioned equipment, and lack of 24x7 monitoring and operational coverage. The Ways That Networks ErrThere are many sources of the imperfect network. The following table defines various ways in which networks err.
Causes of the Network Errors
How do impairments manifest themselves?There is no single way in which network impairments make themselves visible. For example, in applications that tend to move a lot of data over a long distance TCP connection, packet loss, jitter, and reordering tend to trigger TCP's congestion avoidance algorithms and thus cause considerable diminishment in throughput. In VoIP applications, jitter and delay combine with the result that the people trying to speak end up speaking over one another. VoIP voice quality can degrade in the face of any impairment. Even the perceived responsiveness of web browsing can significantly degrade if DNS query packets are lost or delayed. What about your own network?Even small networks can be impaired. Any net with more than a few switches and routers, and particularly any net with out-of-campus connections, is likely to experienced impaired services. In many cases, smaller networks may be more subject to impairments than larger networks that are monitored by 24x7 Network Operations Centers (NOCs). It isn't that the larger networks are more immune; it's just that on the larger networks somebody (other than the users) are watching and might be able to notice problems, isolate the causes, and initiate a repair. How can I tell if my network has a problem with these things?As we mentioned earlier, the sensitivity of applications to network impairments varies widely with the nature of the application and to a lesser degree with the quality of the implementation of the protocol stacks used by the application. So, there are really two questions:
Since nearly every network has some level of impaired service, perhaps the pragmatic approach is to inventory the applications running on your network in order to create some kind of service level definition. Impairments that don't rise to the level where they erode that level of service are impairments that may be safe to ignore. It would, however, be necessary to review that service level as new applications are added, old ones removed, and as the overall traffic demands and patterns change. Even the introduction of a new router or switch may change the behavior of the net. Let's assume for the moment that you do come up with a service level definition. For example, for VoIP applications you might come up with the following:
How would you measure whether your network meets these service levels? But, more importantly, how would you even know in the first place whether these numbers are actually useful and whether they represent the kinds of services your applications actually need? It can cost a great deal of time and money to over-engineer a network to provide service levels that your applications do not need. And it can be more than simply embarrassing to discover after the fact that your shiny new (and expensive) network, even though it meets the service definitions, doesn't do the trick?
With such a testbed you could have more confidence that your service level definitions are, in fact, representative of what you need from your production networks. There are a number of ways one could go about building a testbed. One can build a miniature version of a proposed network and hope that it adequately reflects the behavior of the full scale network. This approach is expensive and inflexible. Another approach is to use mathematical simulations and models. These methods take a great deal of expertise to design, implement, and evaluate. And in many instances the results may be rather detached from reality. The software found inside network devices frequently, indeed, almost always, does not act with mathematical precision, or indeed with anything that even approximates that kind of precision. The approach that we advocate is to use tools that actually produce, under controlled and repeatable conditions, a variety of of network impairments so that the proposed applications can be tested and evaluated under near real-life conditions. There are several tools available for inducing impairments into networks. Most share an ability to create some or all of the types of impairments described above. Most of these tools manipulate either all packets or classify packets into flows most frequently defined by a simple 5-tuple scheme (source-IP, destination-IP, IP protocol type (UDP, TCP), source-port, destination port). Those kinds of tools are often adequate when one is testing general classes of equipment. However, experience with networks has taught us that many network applications have sensitivity to certain patterns of impairments. In fact it is this kind of sensitivity that frequently allows crackers to break into equipment or to create denial of service attacks. However, there is only one tool that is able to dig more deeply into flows and allow one to create impairments that might expose these pattern-based flaws. That tool is InterWorking Labs' Maxwell Network Emulator. What can be done about network impairments?We can deal with network impairments in two ways - we can make the network better or we can make the applications better. David Isen's now famous paper, "Rise of the Stupid Network" (the actual paper is at http://www.rageboy.com/stupidnet.html and http://www.hyperorg.com/misc/stupidnet.html) could be construed as an argument for pushing the burden onto the applications while leaving the underlying network as simple as possible. There is much merit in this approach - in fact it is only in the applications where most of us, including application vendors, have any ability to control the quality of our internet experience.
At the same time, it is possible to consider the design of devices and applications at the edge of the network and optimize the ability of these applications to properly compensate and handle impaired network situations. Best Approach: Pre-Deployment TestbedsThe best approach with either the stupid network or a network engineered for high quality packet delivery is to create pre-deployment testbeds. A pre-deployment testbed allows us to evaluate the range of impairments that our existing and future applications can tolerate. With that knowledge we can better understand the engineering and investment tradeoffs between building more sophisticated applications versus demanding (and obtaining) improved service levels from our network infrastructures. |