Canarias Machine Learning

Book Cataloguer

Publishing

In Spain there are more than 80.000 new books every year, to properly sell the books they must be catalogued to assign the right bookshelf in the bookshops, with this machine learning aplication saves 95% of time in catalogation and it is more accurate than real people

Introduction

Our first machine learning project, it is a NLP classifier problem, the book database has near to 1 millon books with book reviews, ML has to catalogue the book choosing the right catalogue

Research

The first implementation used and standard embeding database to encode reviews titles and rest of the bibliografic information available, the result was quite optimal, Agapea bookshop chain is using this version to automate the books catalogation. This version was developed with Keras with Tensorflow as backend.
We tested PRModel with this problem, in this case no word embedings where necessary, each diferent word of each field is considered a binary parameter, generating an input with 1.200.921 diferent parameters for 526 diferent outputs (categories) , the learning process was much faster it took only 24h and the results where similar or better.

Final Result

As no word embeding transformations are necessary, the prediction time is fast and accurated, as the trainig time is fast with standard CPUs, the model can be trained weekly for the updates. As collateral product, the system detects previus erroneous catalogued books. As we have research metrics about what mappings are correct, most of the system works automaticaly. The system saves the 65% of work, because can give almost 100% confidence on 65% of predictions

mapping errors detected

75%

Secure mappings

65%

Success on secure mappings

99.4%

PRModel

Machine Learning Framework

Selling Predictor

Size stocks to demand