The auto industry is currently undergoing a massive technology upheaval. Artificial intelligence (AI) provides an opportunity for automotive players to rethink everything. Some believe that AI is the key differentiator in the auto industry. But this is inaccurate. It’s not the AI, it’s the data that is the source of differentiation. NVIDIA CEO Jensen Huang was quoted as saying “in the future your data is your company’s source code.”
The data being generated from vehicles is critically important to the future of the auto industry and the benefits of this data are generally understood. There is, however, a tremendous volume of untapped data not being leveraged. This is predominantly due to a lack of tightly integrated solutions or data silos.
The auto industry has picked up many of the most popular open source tools to handle big data. These tools, which include Hadoop, Kafka and other products, range from storage, batch processing, real-time data, machine learning frameworks and analytics. The problem with the existing integration of these tools is that they all require far too much data movement in order to process the data timely enough to be applied to AI use cases. The implication is that, as the auto industry generates data at a pace never before seen, the data will remain largely untapped due to lack of integrated infrastructure to support the plethora of use cases.
Untapped data is key
People and processes must be taken into consideration to gain value from this data. While AI is a toolset being used for solving a problem, it can hardly be considered the differentiator. As Ted Dunning, MapR chief application architect and Apache Software Foundation board member, says: “In the new cheap learning age, developers will avail themselves of the powerful and easy-to-use machine learning frameworks — and do so without having to understand all the complex mathematics driving the predictions and recommendations and optimizations under the covers.”
Consider the amount and variety of data in the context of the most important business asset — people. Data scientists are the employees charged with figuring out how to best leverage the data and apply it to AI based solutions. The problem for these folks is that approximately 80% of their time is spent on data wrangling tasks such as collecting and moving data. That leaves a mere 20% of their time for building models.
There are potentially hundreds or thousands of models for a single use case and all need to be tested and then analyzed against each other. There simply isn’t enough time in the day for data scientists to solve the business problems in a timely manner. Moving big amounts of data between big data systems and an AI environment is too expensive. Moreover, huge efforts are required to version the data and the models used for addressing AI problems. And open source technologies lack the necessary integration capability to simplify the process and give people the time needed to solve the problems. This is a hurdle that must be overcome for the auto industry to really make substantial advances with this untapped data.
Systems must run in a coherent, shared manner, and there should be no need to move data between environments. Only then can workloads be handled in real time and can large scale analytics and applied AI use cases be implemented based on the same underlying data.
Moving data between systems is the silent killer for productivity. With the immense volume of data, it is important to do away as early as possible with any processes that waste time. Every time data has to be moved, organizational agility is reduced. With the auto industry’s current focus on AI, it’s worth reminding automotive companies that, without a plan for the untapped data, there’s no future for AI in the industry.
By Jim Scott
(Jim Scott is director of enterprise strategy and architecture at MapR Technologies, a California-based business software company headquartered in Santa Clara, California. MapR provides access to a variety of data sources from a single computer cluster. Scott is an experienced leader having worked across various industries in the course of his career. He is cofounder of the Chicago Hadoop Users Group (CHUG) and helped grow a now flourishing community around next generation technologies. Jim is on Twitter as @kingmesal.)