I used to work on teams that would try and solve customer issues with data science and machine learning and after asking the customer about the problem they were trying to solve the second question was inevitable: what data do you have?

Data to data scientists is like ingredients to a chef. The higher the quality of data that could be obtained the better the outcome.

Often getting access to data and then cleaning it up was the most time-consuming aspect of the work. I remember one project that had around 4000 images of skin that we needed to classify that required some very expensive dermatologists to work through and classify, image by image.  

At re:Invent last week a new service called Amazon SageMaker Ground Truth was Launched which is aimed at speeding up and scaling this process. The service is aimed at first automatically labelling images based upon some initial human created training data. If the automated service can’t label the images to a specified threshold that you define you then have the option of distributing the work to in-house labelling professionals, outsourcing to  Amazon Mechanical Turk or outsourcing the job to some handpicked labelling organisations on AWS Marketplace.

I only learned about Amazon Mechanical Turk when I first joined AWS, but it’s unlike any service I have ever seen from a cloud vendor before. Mechanical Turk is a crowdsourcing marketplace that makes it easier for individuals and businesses to outsource their processes and jobs to a distributed workforce who can perform these tasks virtually. These jobs could include anything from conducting simple data validation and research to more subjective tasks like survey participation, content moderation, and more. Its sort of like a scalable and on-demand workforce; applying the elasticity and agility benefits that we see in cloud computing to a temporary workforce.

Depending upon the privacy and sensitivity of the images or content you are classifying, Mechanical Turk could be a quick route to get labelling done quickly.

It’s pretty fascinating to think, that any AI or ML technology you have ever used, whether that be Image Search on your iPhone or face recognition on Facebook have had, to at least some extent, a human manually labelling images and helping our models understand.