Artificial intelligence is one of the top five topics on the agenda of many CIOs and CDOs in the auto industry. It’s not unusual for them to sit down with Wolfgang Wahlster, the head of the German Research Center for Artificial Intelligence (DFKI), and get first-hand information on the latest technologies. Then they jointly develop ideas for innovative controls, autonomous driving and Industry 4.0. DFKI, with locations in Saarbruecken, Kaiserslautern, Bremen and Berlin and more than 700 scientists, is considered the world’s largest research institute in the field.
Mr. Wahlster, based on revenues and the size of its research staff, DFKI is world’s largest research center for artificial intelligence. But where does it rank for output?
This year, DFKI is celebrating its 30th anniversary. Over this period, more than 80 companies have spun off from DFKI and are making their way in the German market and internationally. SemVox is an example in the automotive field – it developed a high-performance platform for the Audi A8 enabling proactive AI-based interactions that can be used to control more than 200 functions with voice input. It goes far beyond what we see with Siri and Alexa. Its most exciting feature is an entire tool chain that allows Audi engineers to adapt dialog control systems to new vehicle services without the help of AI technical experts.
What makes this solution so extraordinary?
The user talks to an assistance system in a natural voice, using the phrases that he chooses to use, dispensing with rigid menu structures and limited commands, without diverting his attention from traffic. As in inter-human dialog, references to previously addressed content are possible. This new generation of AI-based dialog system captures the user’s intent and has the capacity to independently pose follow-up questions when the input is incomplete and to integrate context. That’s what distinguishes a good AI-based voice system: it can recognize, understand and act appropriately.
There is already a range of standard programs that recognize speech and reliably convert it into writing…
Yes, the rate of error detection has declined dramatically over the last few years. A robust voice signal analysis is standard today even without special system training for the multimodal approaches in which free hand gestures, body position and gaze input play a role. At this year’s CES in Las Vegas, we teamed up with our partner Nuance to show how artificial intelligence can support this technologically, and we won an innovation award for the work.
Do you mean gaze input in combination with AI-based voice dialog?
Exactly. This solution facilitates access to site-specific information in real time even during driving. Sensor-fusion and a multimodal approach allow voice, gazes and other sources of information to be evaluated simultaneously and the appropriate information is delivered via the car’s infotainment system as a dialog and in natural speech. For example, as the driver, I can ask questions about the surrounding buildings, hours of operation orattractions, such as “How long is that seafood restaurant over there open today and what are its daily specials?” I can access recommendations or contact the restaurant directly to reserve a table.
How does that work?
The system evaluates gazes, head position and spoken questions and can provide what is called a reference resolution. It knows exactly which building is meant. We carried out the advance work on this at DFKI in 2010, and the first functioning prototype was shown at CeBIT in 2012.
Doesn’t complexity rise with the synchronous evaluation of different modalities?
Speech, gestures and gazes are always ambiguous. It is only the combination that allows us to mutually eliminate the ambiguity. To put it another way, the more modalitiesyou combine, the clearer the results. Many engineers have a hard time with this basic principle of human communication. But people should not be forced to use just one particular modality in a certain situation. Depending on the situation and personal preferences,the user should have the option of phrasing hisrequest by voice in combination with gestures or with a gaze combination with speech or just with speech or gestures. We make the case for the maximum degree of freedom in vehicle controls. Incidentally, this applies to all the occupants of the vehicle, not just the driver, to the same degree.
Changing the subject: You minted the term “Industry 4.0” at Hannover Messe in 2011. It has now become established worldwide. How well has German industry come to grips with the new manufacturing logic?
Based on global comparisons, Germany has developed a two-to three-year head start. The penetration has not progressed nearly as far in other European countries, the United States, Korea, Japan and China as it has in this Germany. Just last year, we counted 89,000 publications that included the term Industry 4.0. In particular, it is making a powerful impact on the thoughts and actions of automakers and suppliers. There is no German automaker that isn’t committing to these technologies for its assembly operations. Companies like Bosch are in the forefront of the effort to bring the Internet of Things and artificial intelligence into factories.
Have manufacturers and suppliers found a viable way to migrate from Industry 3.0 into this new world of manufacturing?
Yes. Facilities and robots are no longer rigidly controlled by a centralized system. They can be used flexibly. In an era of volatile market demand, these multi-adaptive factory concepts are a true blessing. In a mixed manufacturing operation, automakers can produce several models on one line – without refitting it. Or there will be more new machines to handle innovative production steps, and factories will be able to integrate them immediately into the ongoing manufacturing operations. The buzzword is “Plug and Produce.” Every product and every tool will bring along a self-description into its semantic product memory as a digital twin, allowing textual as well as technical communication. That is the key advantage of Industry 4.0 thatAmerican providers of IT infrastructure, in particular, did not initially see. They just wanted to replace the zoo of outmoded bus systems and standardize the communication protocols. But that approach falls short.
So the keyword is M2M communication?
Yes, the networking of all devices in a factory up to the level of individual sensors represents 10 to 20 percent of Industry 4.0.The real action is one level higher where the coding of mechanical engineering know-how takes place. It enables devices and systems to understand one another and enter into a dialog with technical experts, who will continue to be needed.
How is all this reflected in your research?
In 2015, working with the Center for Mechatronic and Automation Technology (ZeMa), DFKI began to establish a center for innovative production technologies in a former ABB production facility in Saarbruecken. We call it Power4Production, or P4P,for short. It focuses on the smart networking products and production environments that create value in industry, especially in the auto sector.
What issues are on the research agenda there?
Some examples are the direct communication between the workpiece and the machine as well as human-robot communication in combination with what are known as cyber-physical production systems. We are working closely with BMW, Daimler and Volkswagen. We want to show what the auto manufacturing of the future might look like in the era of artificial intelligence. We assemble teams of skilled workers and robots from different manufacturers, give them various tasks, and then see how they coordinate among themselves. Social behavior and empathy, that is, the capacity to recognize intentions of others and be responsive to them, play an important role if you want to achieve true teamwork between people and robots.
Do you envision fully robotic manufacturing and factories devoid of people?
The human being continues to be at the center of Industry 4.0. We have recognized that the human sensory-motor system is unbeatable at this point. No robot plays football as well as Ronaldo and no robot can install the interior paneling on a door with the precision and flexibility of an experienced skilled worker. Achieving a comparable level of performance would require programming expenditures that would break any budget dedicated to rationalizing operations. But it absolutely makes sense to divide up tasks in assembly operations: Robots can take over everything that is monotonous or physically demanding. Humans will continue to exploit their capabilities wherever common sense, expertise and sensitivity are required. For example, human perception will beat any robot on visual, aesthetics-based inspections.
Why is that?
Because the worker examines the product holistically from the customer’s standpoint. He doesn’t just compare individual pixels woodenly.
Artificial intelligence is expected to increasingly take over administrative tasks, such as examining queries on customs duties, taxation or financing…
It will come to that. The key word here is cognitive intelligence – a dimension where deep learning systems such as AlphaGo from Google DeepMind or the AI-based poker system Libratushave already shown that they are superior to human understanding in specialized areas. I think the potential for automation seems to be the greatest by far in this environment, and it is even easier and fasterfor car companies to maximize than in the factory.
How seriously should we take the fear of losing control that artificial intelligence often elicits?
We have just received support from the German Federal Ministry of Education and Research for a project that focuses on this. It is important to have clearly regulated transfers of control in both directions if you want to strengthen passengers’ trust in autonomous trains, ships, aircraft and, of course, cars. Let’s take Tesla as an example; If a Tesla drives a section of road semi-autonomously and then suddenly insists on an immediate transfer of control, without any visual indication of the reason, the driver feels a loss of control. That really won’t work. In our research, we are working toward a proactive, explanation-based transfer of control.
How should we envision that?
Since semi-autonomous cars require high-resolution roadmaps, they could inform the driver at an early stage that he will have to take over the wheel in two or three minutes – perhaps because the stored road data is no longer adequate for continued semi-autonomous driving. Or because the mobile wireless network is faltering and the connectivity needed for constant updates from the cloud threatens to break off. Transparency creates trust and strengthens the new technology.
In March, DFKI teamed up with testing company TÜV Süd to develop an open platform for the validation of AI modules used in autonomous vehicles. Where is this project headed?
Our goal is to test all the AI modules in networked, autonomous vehicles. Consumers should be certain that the industry’s products are suitable and safe for road use – just as the body, the engine all other physical components are. Aside from validation scenarios, the Genesis platform will prepare material covering a multitude of critical driving situations to target and train neuronal networks. It uses synthetic data along with real-life data gathered during test drives. The synthetic data is needed for many traffic scenarios that are too rare and too varied to be covered by real data. There is a huge demand for these simulations.
Do you think many automakers’ stated objective of driving autonomously at stage five as early as 2025 is realistic?
There is certainly still a lot to do before we reach that point. We do see many automakers and suppliers vigorously building up their expertise in artificial intelligence. The German auto industry has geared their budgets and organizations for the challenge and is working hard with DFKI on these issues.
Finally, please give us your best assessment: Where will artificial intelligence be headed in the next two years?
We are still living with a paradox in AI research. The highly intellectual tasks for which we humans need high intelligence are often easy for AI systems if they have the right algorithms. But the simple tasks that are easy for us in daily life are very difficult for them. One example is walking quickly through a bustling, crowded shopping area, without bumping into anyone. This is not just a matter of coordinating movements. All kinds of experiential knowledge and social rules have to be processed. Mobile systems will only build up the everyday intelligence shaped by experience and individual episodic memory if we never shut them off in the evening. This is something the auto industry needs to deal with. As soon as the ignition key comes out, the intelligence of anyautonomous automobile, however well it functions, falls to exactly zero. Computers and sensors in cars always need to be “on.”
Interview by Ralf Bretting and Hilmar Dunker