30MHz is participating in the autonomous greenhouse challenge: growing tomatoes without entering the greenhouse. They’re managing the greenhouse from behind their laptops, and have to guide their decisions based on the real-time data they receive from the indoor climate, outside conditions and weather forecasts. To be able to do this they’ve been developing multiple machine learning applications. These applications guide the cultivation strategy and subsequently, the actions taken to reach the desired climate.
Machine learning challenges
However, there are many challenges in developing and operationalising large scale machine learning applications. One reason is the inherent nature of machine learning. Data are ever-evolving and models are stochastic, which means you have no certainty about what will happen in advance.
In software engineering, code is version controlled to manage changes over time (i.e. the numbered software updates of your smartphone). In machine learning, there are no standardised solutions to manage changes in code, data and model characteristics at the same time. And this is largely due to the (im)maturity of the field. There are many initiatives trying to solve this problem, for example, MLflow and Data Version Control (DVC), but these have their own limitations which are out the scope of this blog.
AWS project & solutions
To solve some of these problems 30MHz has been fortunate to receive the help of two machine learning engineers from Amazon Web Services (or AWS). AWS is a cloud provider, and the company is using their services to host – among others – servers, database and machine learning models. As a company, 30MHz has been closely working together with AWS for quite some years. For this reason, and because they’re excited about the work, 30MHz had the opportunity to learn from and work with AWS engineers at their own office in Amsterdam for more than two weeks.
The goals of the project were twofold:
- Isolate the machine learning process for every grower/customer of 30MHz. They’ve been developing a solution that enables them to automatically train machine learning models if new data is collected for – potentially – thousands of growers. This entire model training process is isolated for every single grower. This solution kills two birds with one stone. First, they’re adhering to customers' wishes to use their data solely for their own needs. For they are owners of the data and therefore decide what happens with it. Second, the company is able to learn unique growing patterns that are specific to each grower.
- Develop a framework to support machine learning at scale. Machine learning code constitutes only a fraction of the entire solution. Together with AWS the company has been developing and automating many of these components. This provides them a couple of advantages. First, it improves the quality and reliability of their machine learning applications because they track and monitor everything closely. And second, they’re able to accelerate future projects with this framework because many of the components in the framework are reusable.
Improve and automate
With AWS' knowledge and experience, 30MHz has been able to improve and automate a large part of their machine learning infrastructure. The result is a scalable and robust framework for machine learning applications on the 30MHz platform.