Features

November 2008


Beyond Testing

Route analytics keeps power on

End-to-end visibility into its critical Layer 3 infrastructure protects utility’s data.

by Alex Henthorn-Iwane

Out of the disastrous 2003 Northeast blackout, when 40 million Americans lost power and outage-related financial losses topped $6 billion, new reliability standards emerged. Utilities have realized that meeting those standards means the networks that pass along critical information about the state of the grid must be equally reliable. In the blackout, a misconfiguration led to a management systems outage, which coincided with an electrical grid issue that could not be addressed because of lack of visibility into the system.

So, when a network engineer at a large Southern public utility realized that critical data from his company’s electric power grid monitoring systems was not available over one of the network’s failover links, he was justifiably concerned. The cause turned out to be a simple error: an electrician had left a cable unplugged following maintenance.

The utility’s traditional SNMP-based network-management tools, however, had failed to alert him to the downed link, because they could detect failures only on specifically configured devices. Instead, the problem was found by a recently deployed route-analytics system, showing the network engineer exactly how data was being routed across the network and alerting him to any routing-path changes.

The utility, which supplies power to millions of users in a multistate area, has an open shortest path first (OSPF)-based network deployed across two locations, with approximately 70 Layer 3 devices (routers and switches) in four distinct networks; a reliability and control network, which runs internal IT applications; a supervisory control and data acquisition (SCADA) network, which monitors the power grid and brings in data measurements from across the coverage area; and two firewall zones.

The SCADA network gathers tens of thousands of measurements per second, passing them to the internal network where grid operators do monitoring, analysis and planning. This data, however, is useful only if it is accessible. If a natural disaster knocks out a power line or a substation, and a routing failure occurs at the same time, the engineers have no way of looking into the grid to see what happened.

To determine whether data was flowing on the correct paths through the network, the utility needed end-to-end visibility into the critical Layer 3 infrastructure. After some research, the utility learned about route analytics, a technology that lets users visualize, monitor and analyze an IP network’s logical (routing) operation. Route-analytics solutions work by monitoring routing protocol exchanges to create an end-to-end view of the routing topology, then learning in real time when routing changes occur–information invisible to SNMP tools.

The utility first used route analytics to find the downed failover link caused by the electrician’s error. On a day-to-day basis, the utility uses route analytics to get visual feedback on how the network is forwarding data, allowing the network operations staff to proactively discover routing errors and prevent downtime.


Alex Henthorn-Iwane

In one instance, when a firewall was down for maintenance, an outage occurred on one of the multiprotocol label-switching links the company uses to connect to external agencies. Even though the fully redundant network architecture meant that removing a firewall should have had no impact, a partner’s traffic kept fruitlessly trying to get through the downed firewall. With route analytics, the failure of a routing device or link can be simulated before maintenance, to validate that redundant paths are working as intended.

The utility maintains multiple links between its locations, but occasionally, due to OSPF link-metric misconfigurations, traffic would choose a slower, less-efficient path over an available faster one. With route analytics, network engineers can quickly find a problem’s cause and correct it immediately.

Alex Henthorn-Iwane is vice president of product marketing at Packet Design, Palo Alto, Calif.

For more information (click here)


Comments
Posted by: Giel Oberholster on Wednesday, November 19, 2008
Although better functionality is available with IPT I personally do not think it is cheaper on the long run. Where you previously had 3 or 4 varaibles that could go faulty in a TDM solution per phone, you now have at least 13 to 15. Smaller companies do not have the network expertise to fault find problems related to QOS. We still see a trend where the "IT" and the "PABX" divisions are devided in a converged communication environment which leads to inability to propperly manage and control the infrastructure leading to unforseen failures.


Add a Comment
Comments will be proofed by editorial before being posted live. This may take up to one business day.
Name


Email Address


Type comment here: