Hi! I’m Kiryl Maltsav, a 21-year-old who recently graduated, and is currently working as a junior Java developer at ACA...
Is your data ready for Artificial Intelligence?Dorien Jorissen
In the near future, Artificial Intelligence (AI) will bring your company to the next level. Increasing productivity, use of resources, maintainability, staffing efficiency and much more. But before that can happen, you need to collect data and provide enough examples to train your AI algorithms. Whether your company is active in the financial sector or the medical sector, whether you’re focused on warehousing or garbage disposal, every company has one thing in common: data already flows through the organization.
This blog post aims to make you aware of the importance of data collection as a stepping stone to Artificial Intelligence. Only when your data is visible, adequate, and complemented with external data and representative for your demographic, can you profit from positive opportunities that present themselves in today’s world and enables you to make better business decisions.
Unhide your data
Accessible data can be put to good use. Surely somebody knows how many people are working for your company, how much inventory you keep, how much stock you’ve been moving over the last couple of months, and how your factory scores on efficiency and productivity. But what happens with this data once it has been acquired? A nice presentation to the board? Are these numbers stored somewhere in the cloud? Perhaps they are available in a centralized database? Or worst of all, perhaps they are in an Excel file on a private drive collecting dust?
In many companies, only a limited number of people have access to certain assets. Since this implies that data is isolated from the rest of the organization, we call them information silos. Not only does this imply distrust in the organization, it provides a limitation to the team or application processing the data. For the same data, there might be different interpretations between teams, or a correlation between features might remain hidden because the data is distributed over different silos.
There’s a big advantage when data is generally available in a standardized way. Not only can you rely on the trustworthiness of the source, you can guarantee a minimum of quality and completeness. If you build a company culture centered around data and start collecting that data in a uniformed way today, it will fuel your artificial intelligence tomorrow.
Keep more than just YOUR data
Although predicting the future is never certain, you can avoid surprises by incorporating external factors. For instance, when you’re selling electric cars, an increasing oil price might have a positive influence on your sales. A change of government policy on the other hand might have a negative influence. A heat wave might require that your employees get more breaks to prevent exhaustion, which has an influence on productivity. Even annotating data with company initiatives can be beneficial: marketing campaigns (hopefully) result in increased visibility of your organization and solutions, which leads to more sales. That’s why the numbers of your organization should be stored together with external facts and figures that impact the processes which are valuable for your business.
A machine learning algorithm can easily consider these extra parameters to extract a connection between multiple sets of data. It’s able to make a distinction between seasonal effects, the effect of climatic conditions and a general trend of increasing sales. Centralizing decision-making around company data is important, but so is external data: the world around us changes constantly. Be prepared to collect a LOT of data.
Be wary of biased data
There are many examples of where data mining has wrongfully concluded the significance of a certain input feature. Having a complete representation of your inventory or customer base is vital to the impact of data analysis. Besides that, normalization of your input can prevent that your model ever becomes aware of unwanted features. A neural network designed to detect skin cancer was able to identify a correlation between the presence of a ruler next to a tumor when analysing pictures. In an attempt to classify wolves and huskies, scientists deliberately selected images with a specific background to train their algorithm. Thus proving that biased data leads to an inaccurate machine learning model. This is a difficulty that even experienced data scientists face. No wonder experts say they spend more time preparing the data than designing models and training them…
It makes more sense to worry about the data and be less picky about what algorithm to apply.
– Artificial Intelligence: A Modern Approach (S. Russell and P. Norvig)
Even though collected data is very valuable for your company, you probably didn’t collect it with use for AI applications in mind. It therefore probably contains disruptive features which will influence the learning process. It’s vital to reflect on and asses your data collection from here on out if you want to prepare it for use in AI applications.
More and more companies are changing their process to be data-driven in order to have a competitive advantage. For one to understand how certain aspects influence your productivity, it’s important to collect high quality data. When your sources are reliable and you have a suitable application to present insightful patterns, you can use this to support business decisions.
Today, the hard part is not collecting the data. There are enough tools that will help you do just that. The real challenge lies in the structuring and capturing of the right data. Finding a solution that fits for your specific case isn’t easy, but you can start by setting up a database or data warehouse, thinking about how you’ll structure your data, and then applying it. If you need help or if you have questions, click here to contact us and shoot us a message!
Take action today, because knowing how to realize this takes time and practice. Prepare your company for a data-driven culture and start building knowledge on machine learning to leverage the potential benefit you gain from your data.