The WWDC Scholarship 2019 - Our summary

We were at Apple's annual developer conference and report in the blog post

What is the WWDC?

The WWDC ("Worldwide Developers Conference"), is Apple's annual developer conference. New features of iOS and macOS as well as new technologies are presented there. Furthermore you can talk to Apple developers, get tips and meet other developers.

How do you get to WWDC?

To participate in WWDC, you have to be very lucky. Tickets for participation will be raffled off. If you are lucky and selected, you will have to pay 1599 CHF to participate.
But there is another way to participate. As a student you can submit a self-written program and Apple will select up to 350 people to participate for free. But the competition is fierce, last year there were eight to nine thousand applicants and this year there were even more.

My application

When I heard that the "WWDC Scholarships" program existed, I was thrilled and decided to participate. I quickly knew that I wanted to do something with machine learning. I looked at the latest changes and came across CreateML. With CreateML you can create a model on a Mac, which can later be used with CoreML and Vision on macOS and iOS. I started the project during the Christmas holidays in December. I experimented with CreateML and CoreML and saw what you can do with it. First I recognized animals but then I started with handwriting recognition. Then I played around with the MNIST dataset and recognized some numbers. But since you shouldn't use public resources, I couldn't use MNIST. I decided to recognize numbers and operators to create a handwritten calculator. For this I needed as many images as possible for each character (0-9, +, -, *, /, =).

My first idea was to let the user draw each character, then train the model and use it directly. Unfortunately it is not very great to write with a mouse. Therefore I decided not to realize this idea.


The other possibility, which exists, is to train the model on a Mac, then copy the model to an iPad and recognize the handwriting there. I chose this option because it is better to draw on a touch screen and because the training can take a while and so it is better to do it in advance.


The first version I developed was a drawing surface, a "recognize" button and a label with the result. You could draw a number or an operator and then write it into the label.


Then I added a "Calculate" button. When this button was pressed, the calculation in the label was calculated and the result was displayed.
The first version worked, but the program was neither accurate in recognition, nor very performant, nor very user-friendly. Therefore I wanted to make some changes. I found out that with Vision I can not only classify the images, but on much more. Vision can recognize where there are texts in an image and where the individual letters are (see image below).

Adam Giesingers Bewerbung für das WWDC Scholarship

So you could write all numbers directly one after the other without having to wait or press a button. Unfortunately Vision only worked in about 70% of the cases. Vision had most problems with short texts like 3+5, so I had to discard this idea.


My next idea was to let the user wait for a short time after each character until the character was recognized. If it is an equal sign, the calculation should be done. After I implemented this idea, you could draw whole calculations with numbers and operators, you just had to wait a short time between the characters. Since I am working on this project alone, I don't have much data available. Therefore I had to draw all training data myself. As a result, the accuracy of the recognition was not very high (more about this can be found in the "Problems" section).


I found out that the accuracy increased greatly when I removed the operators, so I did. Now you have to enter the operators using a button.


But the performance was still pretty bad. After some adjustments in image processing and using multiple threads the performance was very good.

Functioning of the programme

Video zur Funktionsweise des WWDC Scholarships Programms
Play

My training data (about 5000 pictures) are on my Mac. With CreateML I then create the MLModel (file format of the Machine Learning Models). Then I test the accuracy with some other pictures to see the progress. Then I drag my MLModel into the actual project.

When you start the program, you will see the view, which is shown on the right. The big field, is the drawing area, on the right are different buttons and below is the label with the invoice.

If you draw a number, it will be sent to the model. A short time later the recognition is finished, the drawing turns green and the result is shown below.

How the recognition works

I have already created my MLModel, this one I load as VNCoreMLModel. As soon as you stop drawing, the drawn image is converted into an image with the function UIGraphicsGetImageFromCurrentImageContext() and given to another function. There a VNCoreMLRequest is created with my MLModel and the completionHandler (which is executed as soon as the function is finished). Additionally a VNImageRequestHandler with the image is created. With this VNImageRequestHandler you can execute a VNCoreMLRequest for a single image. Then the request is executed with handler.perform([request]). Now the function, which is defined in the completionHandler, is executed. The result and the Confidence (how sure the algorithm is that the answer is correct) is given. In this function the results are read and output. If the confidence is below 60%, an alert with the two most probable numbers is displayed. Then the number is written into the label and you can draw the next number.

Problems

Detection accuracy is deep

To solve this problem, more training images are needed. I measured the accuracy several times during the training process and came to the following results:

Grafik zur Genauigkeit des Projekts
Source: Namics

The project

The whole project is on GitHub: github.com/adamgiesinger/wwdc-2019-scholarship

Conclusion of the project:

I had a lot of fun on this project and I learned a lot of new things. In the end, my submission was even accepted and now I can attend the WWDC live in San José from June 3 to 7. ? 
I would recommend every interested person to apply as well, because it is a great thing!