React as a way to act
First step of the process: scanning whatever is in the fridge. For that part, we opted for a continuous camera stream and object detection. With this approach, users can freely move their phone around in the fridge and scan all the products one after the other. What if a product is not detected? We added an interface that allows users to manually add missing products after scanning. This way, no product is left out.
Identifying the food to save
For the object detection, we needed to use a lightweight model in the browser. We therefore chose to use the SSD MobileNet algorithm. As a pre-trained model, it recognizes objects based on a series of images it has already been trained on. We could thus just retrain it on a small data set, which didn’t require a lot of processing power and time.
By using IBM Watson Machine Learning, we could leverage the power of cloud computing while using a simple interface. Additionally, we used multiple high-end Graphing Processing Units (GPUs) to train the model.
Preparing the ground
Before using the app for good, we needed to train it with data that was as close as possible to what the model would eventually be fed with. In our case: smartphone camera images. So, we tried to shoot a minimum of 50 images per product beforehand. We took those pictures from multiple angles in order to increase the chances of recognition. On top of that, we used different backgrounds to further improve the model's accuracy. In general, it is crucial to train the model for different use cases. Where and how the model will be used will surely differ from user to user.
To train the model, we needed to make sure that a person would also be able to recognize and label the product within 2 seconds. Because if a human can’t identify a product, neither can a computer program.
After capturing all the necessary images, the most time-consuming task turned out to be their labeling. For each object in the image, we had to draw a rectangular shape. Quite the effort! To simplify this tedious task, we used a free cloud storage solution from IBM. With their cloud annotation tool, IBM provides a web interface to easily label images.
Training the model
So, how did we concretely proceed to build Wattoo? To start training our model, we installed IBM’s Cloud command line interface (CLI). On Mac, you can use the following command:
$ brew install cloud-annotations/tap/cacli
By default, the model is only trained for 500 steps. To boost the accuracy of the model, we increased the steps in the config.yaml file. That’s where we could also decide on which GPU we wanted to use.
To start training, run the following commands:
$ cacli login
$ cacli train
The progress can be checked with:
$ cacli progress <model_id>
When it's finished, the model can be downloaded with the following command:
$ cacli download <model_id> --tfjs --coreml --tflite
Linking model and app
When the model recognizes an object, it also communicates where it is located. We used that location information and added an illustrative animation letting the user know the recognition was successful.
For every recognized ingredient, the model assigns a label to it (e.g. "salt"). The system then looks in the database for the recipes that match those labelled ingredients. Those results are eventually presented to the user. If the user has most of the remaining ingredients from the recipe, they’ll be able to prepare themselves a waste-free meal. And that’s really what this was all about in the end.
The icing on the cake
Optimal user experience is paramount in distilling the benefits of your service and/or product in your app. By seamlessly bridging the gap between creativity and technology, we make sure that the user doesn’t reach any threshold and can experience the platform as you intended them to. Are you looking to translate an innovative business proposition into a best-in-class (mobile) application? Get in touch with us! We’re there to make it happen.