Network Management Performance -
Tips and Tools
By Eddie Rabinovitch,
UniNews Online Network Technology Columnist
"Managing network performance? That's really easy - simply add bandwidth when the response time is getting slow". Well, that used to be one of the paradigms way back, when LAN applications were first extended to the WAN (and the term "paradigm" was not in use as frequently as it is now.) Did this model work? Well, if it did, only in very few and marginal cases. Simply, because bandwidth is an important, but certainly not the only factor affecting network performance. In this column we will touch upon these factors and describe some tools and techniques for measuring network performance.
First, let's define what "Network Performance Management" actually means. Simply stated, it refers to how well the network serves its users. It requires analysis and control of the throughput and error rates. Network performance management includes the processes of quantifying, measuring, reporting, and controlling of responsiveness, availability, and utilization for the different network components. It's important to emphasize: network performance has to be measured end-to-end. What truly matters is how well the performance is perceived by end users, in other words the performance of the network as a whole. The performance of each of the individual network components, while important, is less critical to measure than end-to-end results.
Let's examine some of the key questions that have to be addressed by network performance managers:
- What is happening on the network, in other words how much data is traveling across each segment, at what percentage utilization?
- How well does the physical topology of the network handle traffic requirements?
- Do some users need more bandwidth for their tasks than others?
- How successfully are users' performance requests handled by each network device?
- Which users are using the most resources and what types of data are they sending?
The Performance Management Challenge
With the rapid growth and modernization of communication technology, the task of performance management becomes increasingly challenging. Due to the ever growing demand for powerful on-line applications and expanding data-processing networks, the complexity of network topology and communication equipment becomes more and more sophisticated. Performance management in general, and end-user response time management in particular, become more difficult. Although each individual network element may be thoroughly understood, combining the elements into a complete network and understanding the interrelationships and end-to-end network effects of these combinations is a formidable task for which there are no easy answers or quick references. For intelligent management of network performance information it is important to capture and maintain historical information to be able to identify trends and deviations from baselines. Here then are the major steps necessary for implementation of network performance management:
- Identification of performance needs
- Definition of time intervals and data to be captured
- Capture and maintenance of performance database metrics and measurements
- Packaging performance data for modeling tools
- Report definition: establishing thresholds, alarm rates, and granularity
- Performance analysis, including:
Network Traffic Analysis
Recommendation for changes and upgrades
In modern dynamic client/server networks, where network topology changes frequently, it is usually difficult to estimate and predict the exact route taken by a particular message. And its even more difficult to monitor the response time for this message. The number of parameters that have to be taken into account for managing network performance of modern open multi-vendor multi-protocol networks is large. The number of links and their use, latency and use of each node, propagation delay, and many other factors can affect network performance. A wealth of information, with detailed data on use and performance of each of the individual segments and components in the network, has to be collected and analyzed for network performance management and capacity planning.
Tools and Techniques
Tools and techniques for performance management of hierarchical legacy networks have been around for quite some time. However, this is a callow subject in the new paradigm of client/server networks. (I just caught myself speaking of "legacy" and "new paradigm". Have you realized how computers not only changed the way we do business today, but have enriched our lexicon as well? It is quite interesting that in addition to new words many new terms and expressions have been created as well; many to the point of cliche. It is almost impossible to find any paper discussing a modern computer environment without a reference to "legacy" or the "new paradigm"). Anyway, back to the subject on network performance management. The approach, used in traditional ("legacy") systems is quite simple but very powerful: for any given network segment and component it is important to collect and keep as much information as possible. Subsequently, by using powerful data manipulation tools, reports on network behavior, utilization, performance, response times, etc. can be produced for any given period of time. For example, in an SNA environment, special System Management Facility (SMF) records are cut not only for network related events but also for performance data and statistics. Helping network managers to understand the health and well-being of their networks, custom-tailored baseline, trend, and exception analysis reports, based on SMF data, are periodically created by powerful data manipulation tools, with the most popular one probably being SAS Institute's (Cary, NC) System. SAS Institute's data manipulation products can be also used in modern open systems networks. However, for such networks, the major problems and challenges are more related to definition of processes and techniques for performance data collection, than to data manipulation tools.
In contrast to the proprietary legacy networks, with their abundance of both real-time and historical performance data, open network management SNMP based tools, usually collect and present information for a limited period of time. This approach is sufficient for real-time problem determination and fault management. However, baselining, trend analysis, performance management, and capacity planning require more data collected for much longer periods of time. Since the network topology of modern client/server networks is much more complex this is quite a hefty task.
One of the tools, specifically developed to address network performance management for SNMP based networks, is TRENDsnmp+ by DeskTalk Systems, Inc. (Torrance, CA). TRENDsnmp+ allows deployment of distributed SNMP managers to critical locations throughout the network. Each one of the remote managers polls local SNMP agents, processes the information and stores it into an SQL based distributed database. It supports SNMP MIBs and RMON (Remote MONitoring) extensions based data. Because of the close proximity of SNMP managers to their agents, the wide area network is not overloaded with SNMP based performance traffic. However, due the nature of the distributed SQL based relational database, all data can be accessed for analysis, allowing presentation of a total picture for overall network performance. TRENDsnmp+ supports database replication, direct access to the SQL database for data manipulation, and includes a variety of standard baseline, trend, and exception reports.
A Digression into the World of Standards
The first RMON (RFC 1271), published in November, 1991, focused specifically on the Ethernet. In 1993, with RFC 1513, the RMON Working Group extended the standard with Token Ring extensions. Due to high market demand and increasing customer interest, RMON-compliant vendor implementations were rapidly developed and brought to market. The first RMON products were often developed by independent LAN monitoring vendors. For example, in 1992 AXON began shipping their RMON-compliant LANServant Manager and Probes to OEMs, and in 193 to end-users. Wide acceptance and adoption of the RMON standard by network infrastructure vendors followed. With proven, interoperable vendor implementations, the RMON MIB moved to Draft Standard status in December, 1994 and was assigned the new RFC number of 1757 making obsolete the original RFC 1271. With the RMON MIB, network managers can collect information from remote network segments both for network troubleshooting and performance management. The information in the RMON MIB includes:
- current and historical traffic statistics for a network segment, for a specific host on a segment, and between hosts (matrix);
- a versatile alarm and event mechanism for setting thresholds and notifying the network manager of changes in network behavior;
- a powerful, flexible filter and packet capture facility which can be used to deliver a complete, distributed protocol analyzer.
Most of the tools for open network performance management are based on SNMP and RMON probes data. However, RMON data is not sufficient for thorough network performance analysis. Since it only covers the lower layers of the OSI model the data cannot be mapped to specific applications. That was one of the reasons for development of the RMON2 standard.
The RMON2 Working Group began their efforts in July, 1994. Their top priority was to go up the protocol stack and provide statistics on network- and application-layer traffic. This is the most notable change and improvement to the MAC-layer statistics, provided by the RMON standard. By monitoring at the higher protocol layers, RMON2 provides the information that network managers need to see beyond the segment and get an internetwork or enterprise view of network traffic (although, RMON vendors were already delivering some of these higher-layer protocol capabilities in the form of protocol distribution graphs, MAC-to-IP address translations, and application traffic analysis). This is implemented through proprietary extensions to their RMON products. It's important to emphasize that RMON2 is in no way a superset of, or replacement for, RMON. Both MIBs are required, with RMON providing the data for segment monitoring and protocol analysis and RMON2 providing the data for network and application monitoring. RMON2 provides the technology to perform business management of the network, enabling network managers to view their network in terms of application and network resource use rather than as individual devices. This provides insight into traffic patterns and applications usage that enables the network administrator to optimize current network resources; make sound business decisions regarding network growth capacity planning; and monitor traffic flow, an important capability when designing and managing corporate intranets. For instance, based on the information gathered, rather than simply adding more networking equipment to alleviate bottlenecks, network managers can make decisions to tune and redesign traffic flow to make better use of existing resources or more intelligently add resources. Using RMON2 network managers can see who is talking to whom on the network and what applications they're using such as business applications or web surf stuff! This helps establish policies regarding the proper use of the network. "As network managers become more responsible for ensuring service levels of applications, they are demanding tools that provide that new perspective", said Steve Waldbusser, principal architect at International Network Services, and author of the RMON MIB and RMON2 MIB. "RMON2's application level monitoring provides a crucial view into today's business applications." Just recently, in March 1997, RMON2 was approved by the Internet Engineering Task Force (IETF) as proposed standards RFC 2021 - Remote Network Monitoring MIB Version 2, and RFC 2074 - Remote Network Monitoring MIB Protocol Identifiers. In this significant decision, the IETF recognized that the RMON2 MIB is stable and open for product development and interoperability.
Performance management for switched media, such as ATM, introduces additional challenges. RMON was developed for shared media, such as LANs, and is not very suitable for ATM. Therefore, in July of 1995 a group of manufacturers of ATM switches, test equipment, and RMON products announced formation of the AMON (ATM MONitoring) group to develop an AMON agent. Such an agent will route copies of troublesome virtual circuits to a test port for monitoring ATM layers. While some ATM-specific MIBs already exist, the proposed ATM Circuit Steering MIB expands the contents higher in the ATM stack to include virtual paths, virtual circuits, destination ATM addresses, and timers. This, however, does not address all requirements and additional standards will be needed to define what data should be collected and how to present that data to a network management system. It is quite challenging to capture the data at full rate merely because of ATM's 155 to 650Mbps fabric and the limited amount of available storage. Therefore, the specifications also discuss data sampling methods that allow capture of frames, rather than random cells, which would yield little useful information.
More Tools and Techniques
However, not all network performance management tools are SNMP based. EcoNET from Compuware (Farmington Hills, MI), is using a proprietary protocol for collecting data from distributed Single Monitors. As opposed to the specialized hardware based RMON probes, Super Monitors are Windows-based applications. Super Monitors are communicating with Windows-based Single View central data visualization and reporting console. Single View correlates and merges data from typically 5-10 Super Monitor agents and then stores it into a relational database. Data collected by one Single View can be shared with additional Single View consoles and the relational database is ODBC compliant. EcoNET also includes powerful reporting capabilities, with numerous standard reports. The most impressive feature offered by EcoNET is the ability to measure and track application performance across client/server networks, quantifying the network load cost of supporting each application.
Last month an interesting announcement came from 3Com Corp. (Santa Clara, CA), whose TranscendWare architecture allows monitoring and managing networks end-to-end. Transcend Networking is a three-part framework for:
- Scaling performance of the campus LANs;
- Extending the reach to remote sites and users on the WANs;
- Managing the growth of enterprise networks.
The framework specifies ways to construct networks that are easy-to-use, or "transparent" to users, as well as simpler and more cost effective to design and manage. TranscendWare modules are the software that powers the Transcend Networking framework.
The Case for Service Level Agreements
One of the critical criteria established in many installations for performance measurements is the Service Level Agreement (SLA). An SLA is a contract that defines the information technology and network services to be provided and acceptable levels of performance. In the SLA model, the IT department becomes a service provider within the enterprise, and end-users become the consumers. The SLA contract defines minimum and maximum levels of performance, reliability, security and cost. The contract contains metrics that determine how success or failure are to be measured and details on how report information will be presented to users and management for verification.
Some of the typical SLA metrics include network uptime, application availability, and network and application response time. Beyond just providing a way to measure the performance of the department, SLAs present the opportunity for IT professionals to convey the value they are delivering to their organization for the money spent. SLAs allow IT organizations to measure their effectiveness against a set of defined and agreed to performance-based metrics. SLAs provide a clear and unbiased picture of the IT organizations effectiveness so that they can spend less time justifying the current cost of ownership and value to users and management and have an easier time justifying network additions and upgrades. Furthermore, when network management is outsourced to an external service provider, SLA management allows the IT organization to confirm SLA agreements between their company and the outsourcer.
An effective SLA solution requires three components to be implemented:
- comprehensive, enterprise-wide data collection;
- measurement and intuitive reporting;
- the ability to enforce the determined SLAs.
InfoVista Corp's (Redwood City, CA) SLA conformance management system, combined with 3Com's TranscendWare's pervasive data collection modules, empower SLA providers (i.e., MIS management, service providers, etc.) to specify the behavior of their network according to business needs and to deliver policy-based adaptive networking. TranscendWare software enables the network to give individual users and applications their contracted level of service. 3Com's service level agreement products support the ability to set, manage and enforce network policies. Policy-based management capabilities within TranscendWare software allow an SLA provider to set up and implement underlying policies in the network to meet the SLA metric objectives. For example, when creating an SLA to meet response time requirements for a diagnostic imaging application, the SLA provider must set network policies to deliver this application traffic with the desired level of service over the network. 3Com's TranscendWare software delivers a global policy for setting the required policies and network control for implementing the underlying policies.
Some of the pre-defined reports of the system will include high-function and boundary switches, routers, hubs, network interface cards and remote access products. In addition, the solution, which is based on industry standards, will work in multi-vendor environments, creating an SLA solution that will work industry-wide. The InfoVista system can tap into all of 3Com's collected data, as well as multi-vendor data sources, to provide a single view across networks, servers and applications. The software consolidates and measures historical and real-time information to efficiently manage quality of service and service level agreements. The reports within InfoVista identify if, when and where service level agreement targets are being achieved. InfoVista software's reports include:
- exception reports to quickly and easily identify SLA exceptions;
- summary reports that provide a historical view of service or performance indicators;
- detail reports, providing a view of network equipment for troubleshooting and resource optimization.
Furthermore, InfoVista provides for customized reports and, an important feature for many network managers, Web access to all reports.
Network performance management is one of the critical and challenging subjects in managing modern client/server networks. Since most network managers follow the simple rule of "better safe than sorry", quite often we find over-engineered and, obviously, over-priced network infrastructures. However, network performance management seems to have gained momentum recently, and some tools, as described above, can be used today to help network managers to manage performance and capacity planning not only of their hierarchical legacy, but also modern distributed client/server networks.
rfc 1271 ftp://ds.internic.net/rfc/rfc1271.txt
rfc 1513 ftp://ds.internic.net/rfc/rfc1513.txt
rfc 1757 ftp://ds.internic.net/rfc/rfc1757.txt
rfc 2021 ftp://ds.internic.net/rfc/rfc2021.txt
rfc 2074 ftp://ds.internic.net/rfc/rfc2074.txt
Back | Table of Contents | Next