Machine learning and computer vision problems

Machine learning is definitely among the hot topics in computer science nowadays. It has found diverse applications in the real world and its impact is expanding even further.

Recently I was approached by a client who had lost some money while playing blackjack online. He claimed that the online casino was cheating and wanted to prove his hypothesis. For this purpose, he wanted an automatic way to record and review some of the games that have been played. We decided to create a computer vision system to be able to track the last games

After some research for available object detection algorithms, we stopped at the Darknet Yolo v3/v4. With a C++ backend and stable results among benchmarks, it looked like a solid choice.

To train the algorithm, we had to gather 1000 samples per class. For example a set of thousand As, Qs … yellow or red cars. Gathering this dataset and labeling is basically 90% of the work required for building a solid model. So we did quite a bit of coding so that the data collection is automated and fed straight into the system.

All looked good and easy. Until we moved into production and actual deployment of the model. We realized that on a pc with a decent GPU a prediction with Yolo v4 could take about 0.1 seconds. However, a prediction on a CPU computer deployed in the cloud would be much slower – between 1 and 3 seconds depending on the number of cores and their frequency. The problem here was that for a system processing live data of a few tables with a lag of few seconds created imperfections and errors while processing. At the same time paying for a few computers with a strong GPU on the cloud is rather expensive. We had to solve this problem in a way that is both affordable and meets the requirements for processing a live feed instead of storing data into queues. And there came the clever solution. Instead of feeding a single image of a great resolution to the model, we could only crop the area of the interest (the table with cars ) and stitch together multiple pictures of tables in 1 image. Thus we could fit around 64 inference images into a single picture. Finally, we could interpret results from which original image they came and therefore map the results to the input. This would require only 1 computer with a GPU or a single fast CPU. Therefore the proposed solution fit into all requirements and was applicable.

 

We managed to record play data for 3 weeks period for random players. It turned out that the casino was receiving a 1.79% profit for that time which was close to the expected 1.5% average edge and our client just turned to be really unlucky for some of his play. However, we learned a very useful trick for computer vision inference.