t critical failure point in most networks is the backbone.
Asynchronous transfer mode (ATM) has been getting more than a little notice recently as a network backbone. This is understandable because it offers high bandwidths of 155 Mbps and up, and supports a large variety of services. ATM also offers built-in support for constant bit rate (CBR) services such as voice and video. Furthermore, it offers quality of service (QoS) capabilities that let voice, video, and data coexist over a common network fabric.
For all its advantages, however, ATM does not offer built-in redundancy. This is strange, particularly because other backbone networks support redundancy in their topology. For example, Fiber Distributed Data Interface (FDDI) has dual concentric, counter-rotating rings. As the figure
"FDDI Fault Tolerance"
illustrates, should a fiber in the FDDI backbone break, or a device on the FDDI link fail, the devices on either side of the break sense this and a
utomatically wrap network signals, leaving one contiguous ring. This means that network services are usually unaffected. Similarly, connectionless networks, such as TCP/IP over Ethernet, can retransmit and redirect packets that go astray across the network.
When it comes to redundancy, ATM is different. Unlike Ethernet or FDDI, ATM is connection-oriented, not connectionless. It is usually configured in a point-to-point or point-to-multipoint fashion. Failures result in lost cells, lost packets, and dropped connections. This can mean lost services to a single switch, multiple switches, or even the entire network. The severity of the loss depends on which cables and switches the outage affects.
Therefore, fault-tolerance design should be foremost in our minds when planning or implementing ATM networks. Fault tolerance means that the network can survive the loss of one or more connections, the loss of one or more switches, and the loss of a source of LAN emulation (LANE) services. Each one of these c
ases requires some special attention during the design phase if you desire fault-tolerant network operation.
Cable Redundancy and Switch Failures
The effect of a lost cable or a failed switch is somewhat similar. In each case, the network loses necessary services. The ATM network configuration shown in the figure
"Full Mesh and Dual Homing"
has designed-in redundancy that helps overcome the loss of a switch or a cable. In doing so, it provides decent fault tolerance for ATM networks.
Notice that every ATM switch in the figure directly connects to every other switch. This is a
full-mesh
configuration. If an interswitch cable fails or a switch crashes, calls to and from the affected switch are automatically rerouted. Suppose that a construction worker cuts fiber B. Suddenly, switch A becomes disconnected from switch D. The route from switches A to D can be reestablished through either switch C (fibers D and F) or switch B (fibers A and E). Which rout
e the device chooses is determined by either preconfiguration or whichever switch is heard from first after the outage. Most ATM devices can handle this kind of call rerouting in a few seconds.
Once the primary link returns to service, ATM devices typically place new calls on the restored link. Calls that the device reroutes after the failure aren't restored to the primary link, because doing so would disrupt them again.
End-Point Redundancy
ATM end-point devices can also take advantage of redundant connections. An ATM end-point device is an ATM client or device such as a workstation, server, switch, or router. Often, the end-point device supports the interconnection of legacy networks such as Ethernet or Token Ring. If it were to lose connections to the ATM network, multiple legacy workstations and servers could lose network services. Therefore, it is advisable to provide some form of redundancy to the end-point device.
You can provide redundancy by connecting the end-poi
nt device in a
dual-homed
fashion. Dual homing simply connects a single end-point device to multiple ATM switches. The figure "Full Mesh and Dual Homing" shows an end-point device that connects to both switches C and D. Should one of the switches or cables fail, the other switch would automatically serve the end-point device.
Keep in mind that today's ATM devices are statically routed. Therefore, when setting up a full-mesh or dual-homed configuration, you often must manually provide each switch with the static routes to every adjacent switch. You do this by entering the Network Services Access Point (NSAP) ATM address of each adjacent switch or by defining multiple default routes.
Cooperating LES/BUS Services
There is more to bulletproofing an ATM network than simply providing full-mesh or dual-homed configurations. ATM legacy networks often depend on LANE, which supports transmission of unicast and broadcast packets using a LANE client (LEC). The LEC learns the ATM NS
AP addresses of the other stations by consulting the LANE Server (LES).
The Broadcast and Unknown Server (BUS) handles standard broadcast traffic, such as a TCP/IP ARP. The LES/BUS pair handles connections to the emulated LAN (e-LAN) and the outside world. The LEC learns the address of the LES/BUS from the LANE Configuration Server (LECS).
The problem is that often the LES, LECS, and BUS services reside on the same physical ATM device. If that device fails or loses its connections to the network, it can cause problems for the e-LAN.
One method of ensuring the constant availability of LANE services is to create redundant LES and BUS servers in multiple ATM devices. You configure the LES/BUS servers to operate as mirror images of each other in the same e-LAN. Should one fail, the other LES/BUS can take over and supply LANE services to the clients. (For more information on how you can do this, see "Reliable ATM Networking" in the April BYTE.)
Failures happening in the LECS are usually les
s of a concern. The only time the LECS is contacted is when a LANE client is looking for the NSAP address of the LES/BUS. This generally occurs during power-up or initial connection of the client. Once the LEC knows the LES/BUS address, it does not need to access or consult the LECS further.
If the LECS fails or is cut off from the network, it will affect only new client connections. Already-established clients will continue to operate as before, because they have obtained the LES/BUS NSAP address. Therefore, while a LECS failure can prevent a new client from joining the e-LAN, it will not adversely affect stations currently participating on the e-LAN. Additionally, you can preconfigure the NSAP address of the LES/BUS directly into some clients, negating the need for the LECS entirely.
Redundancy can offer an additional benefit to the ATM network as well. You can configure redundant BUS and LES services to provide load sharing along with reliability. Each cooperating LES/BUS operates in a round-ro
bin fashion: LES/BUS pair one serves client A, LES/BUS pair two serves client B, and so forth. This shares the load relatively equally. Load sharing is particularly valuable to the BUS, because it can become extremely busy handling broadcast traffic in the e-LAN.
This covers some of the basic issues regarding how to provide ATM network redundancy. Next month, I'll describe in more detail how to provide redundant LES/BUS services. I will also consider fault-recovery times and show some disadvantages to providing redundancy.
illustration_link (17 Kbytes)

Unlike ATM, FDDI devices use a mechanism that establishes a backup route w
hen a line fails.
illustration_link (13 Kbytes)

The full-mesh network provides multiple routes if there is a line failure.
Jeffrey Fritz is responsible for advanced network technology development for West Virginia University. He is the author of Remote LAN Access: A Guide for Networkers and the Rest of Us (Manning Publications/Prentice-Hall PTR). You can contact him at
jfritz@wvu.edu
.