It was a Friday afternoon when alarms began blinking at ZF Steering Systems in SchwÃ¤bisch GmÃ¼nd ”“ level red.
Stefan Zeul, who is in charge of network and infrastructure at ZF Steering Systems, quickly recognizes that this is a virus attack on a Chinese factory. He cuts off the replication of the Chinese active domain controller to the other company controllers and informs his colleagues in China: If they haven’t taken care of the problem within the next two hours, he will remove the factory from the network.
“Fortunately, it did not get to that point, since our colleagues on site were able to quickly identify the weak point,” Zeul remembered in a recent interview with automotiveIT.
The Friday did not turn into a black Friday for ZF Steering Systems and Zeul attributed this to the company’s global monitoring system.
“Without these tools, the virus attack would have worked,” he said. “As a result, noone in the company would likely have been able to log on the following Monday.”
This would not just have affected development and administration, but production in 15 factories too.
That’s because the entire input and output of goods is controlled by a centralized SAP system. In production, components would have been picked manually for a while.
But the delivery of complete steering systems, including the electric data transmission needed for it, isn’t possible without the SAP system.
“A pro-active, centralized monitoring system of our entire IT infrastructure is absolutely necessary,” said Thilo Helmig, the manager of the computing and communication systems at ZF Steering Systems, which is part of the German automotive supplier ZF Friedrichshafen.
“This meanwhile applies to our production systems, too. The reason is that they are increasingly based on standard IT components,” Helmig added.
Helmig’s department monitors about 300 servers. In addition, there is a network with 600 components from Alcatel Lucent and Cisco as well as various applications and systems:
Besides the SAP system, the company uses Oracle databases, Microsoft exchange servers, Microsoft SQL servers as well as VMware as a virtualization platform.
The monitoring was implemented with a number of HP tools, which deliver the data to a central event management platform, the HP operations manager.
Thilo Helmig’s team has installed monitoring agents on some systems. On the other hand, other systems are monitored without agents ”“ including production computers.
At the same time, this involves servers that control the latest generation of production lines from the PA-ATMO, a Robert Bosch business unit.
Standard TCP-IP protocol used
These lines use standard TCP-IP Internet protocol for network communication. That means that every connected robot and every production line has an IP address. The central component is a production computer, in the form of a highly available standard server.
The operating system, the database, network components and special applications, such as robot control, all run on it.
When the first of these advanced production lines were built in several factories last year, production IT was on the hunt for a monitoring system for its production computers.
Thilo Helmig proposed taking a solution that had already proven itself in the near-production environment and employing it in production directly.
“This was only logical given the importance of these systems,” he said. “That’s because large portions of the factory come to a stop when a computer fails.”
But things don’t even reach that point, thanks to the monitoring. The reason is that the oversight takes place proactively.
For example, factory employees receive feedback early if a robot doesn’t deliver the required torque for tightening a screw ”“ even before the component migrates to the next station in the production line.
In the process, Thilo Helmig’s team functions as an internal service provider. For example, it informs production about looming problems in the assembly process and the IT-related causes of recurring problems.
Furthermore, IT continues to establish rules in the monitoring system for maintenance staff, so they can be notified when a specific error configuration occurs and can keep the production process running.
At the same time, IT and production staff can be sure to take action when the alarm goes off.
Old system sent too many error messages
That wasn’t always the case.
The previous monitoring system in use at ZF Steering Systems sent error reports about an inaccessible system or a component every minute. As a result, the monitoring console practically overflowed with reports containing the same content.
Said Zeul: “Many an emergency was submerged in the volume of alarms.”
Today, there is a factor contributing to monitoring accuracy: The company can lodge its own rules in the HP system to place various events in temporal and causal relation to one another.
One example: If a server fails, the agents for the file system, processor and response time don’t trigger an alarm at the same time.
“Furthermore, IT infrastructure monitoring offers us strategic advantages,” said IT manager Helmig.
For example, the auto supplier uses its monitoring results for service level management: The system is used to monitor service level agreements that the group has concluded with its individual national companies.
This still largely involves the monitoring of systems and components. In the foreseeable future, however, ZF Steering Systems wants to monitor entire business processes from end to end.
One of them is an accounting procedure in the SAP system, together with all the systems supporting the process. “In the end, with pro-active monitoring, we want to reach the point that IT problems are as good as eliminated for the departments.”
By Sabine Koll