Azure vs AWS vs GCP

Handwriting Recognition

Note: Based on communication from Amazon, the AWS service mentioned, Amazon Textract, does not currently support handwriting recognition.

Did you know that we can use Artificial Intelligence (AI) to convert a handwritten document into digital text? Did you know that we can also preserve the location of the text accurately? A handwritten letter, postcard or biography could be converted to text using OCR (optical character recognition). At the same time, we can also find out the location (x and y coordinates) of every single character on the image. Location, along with the width and the height of the character, can help us create a digitized version of that document while preserving the original format.

Imagine a hospital with decades worth of handwritten patient records; imagine businesses with decades of handwritten invoices; imagine handwritten texts dating back to medieval, if not ancient times - the possibilities are endless. It's just a matter of using the appropriate AI model in the language of choice.

Our data team is comprised of hands-on software architects, data engineers and data scientists that are highly competent in scaling applications and making them high performance. While we have created models like these that work on-premise, we also wanted to compare the current cloud offerings. In this blog post, we will be comparing Amazon Textract, Google Vision, and Microsoft Azure Vision API for recognizing some samples of handwritten text.

Architecture

The application follows a microservices architecture. We have a load balancer that takes all requests from the browser. The solution follows the following steps:

  • User uploads a picture using a browser.
  • The request goes to the load balancer.
  • Load balancer designates the request to an application server.
  • The application server handling that request calls the desired micro-service.
  • The micro-service invokes the appropriate Cloud API that has the pre-trained model that converts handwritten image to digitized text and returns the output back to the UI.
Handwriting Recognition - Microservices Architecture

The pre-trained models that convert handwritten images to digitized text are artificial intelligence models that are running in the cloud. The Cazton team has created similar AI models that can run on premise and the architecture we have used here is a multi-cloud architecture. That means it can run on-premise, in a hybrid scenario (which means on a cloud as well as on-premise), and in a multi cloud scenario. However, the point to note is that we can run this anywhere without making changes in the code or the deployment model.

Comparing Images

We will be comparing the following three services for their accuracy to convert handwritten documents correctly:

  • Amazon Textract
  • Microsoft Azure Vision
  • Google Vision

Note: We will be counting word mistakes (i.e. a word, if not ignored, is counted as one mistake if one or more characters in it are incorrectly transcribed by service).

For comparison, we take three sample images found on the internet:

  • English handwriting sample, 193 words, 1-page.
  • English handwriting sample, 116 words, 1-page.
  • French handwriting sample, 341 words, 2-page.

Sample Image #1

Microsoft makes the least number of mistakes on this image, followed by Google and then Amazon. We will be comparing Microsoft and Google since their mistakes were comparable and not include Amazon in this analysis.

Side-by-side comparison for Microsoft Azure Vision digitized text (zoomed lower half of actual input image to the micro-service):

Handwriting Recognition - English Text Sample with Microsoft Azure Vision

Here are certain results we have chosen to ignore due to the errors of the original drafter:

  • se7eN is spelt incorrectly.
  • Signatures have been ignored.
  • The numbers (1 to 7) are within a circle. This has surely confused all the AI services.
  • twist guesteditorbhp@gmail.com when reading from left-to-right seems to be in a line and so in digitized text it appears likewise.

Assuming the above points, in the above image snippets, it made 11 word-mistakes. Looking at the document overall, Microsoft Azure Vision made 12 word-mistakes.

Side-by-side comparison for Google Vision digitized text (zoomed center of actual input image to the micro-service):

Handwriting Recognition - English Text Sample with Google Vision

Here are certain results we have chosen to ignore:

  • No. 1 to 3 above are the same for Google.
  • Some words, which were strikethrough like "contained" and "as" were not interpreted and are penalized as mistakes here.

Assuming the above points, in the above image snippets, it made 18-word mistakes. Looking at the document overall, Google Vision made 20-word mistakes.


Sample Image #2

Microsoft makes the least number of mistakes on this image, followed by Google and then Amazon. We will be comparing Microsoft and Google since their mistakes were comparable and not include Amazon in this analysis.

Side-by-side comparison for Microsoft Azure Vision digitized text (zoomed section of actual input image to the micro-service):

Handwriting Recognition - Cursive English Text Sample with Microsoft Azure Vision

Result:

  • From the above zoomed section, Microsoft Azure Vision made 2 mistakes.
  • Looking at the document overall, it made 2 mistakes.

Side-by-side comparison for Google Vision digitized text (zoomed section of actual input image to the micro-service):

Handwriting Recognition - Cursive English Text Sample with Google Vision

Result:

  • In the above section, Google Vision made 31 mistakes.
  • Looking at the document overall, it made 31 mistakes.

In the above image, if we look at the word "Jelicitations", Google Vision API almost gets the entire word correct except for the first character. The closest word that matches the digitized text "Jelicitations" is "Felicitations". A possible solution for such an error would be to check the probability along with natural language processing.

For example, for the alphabet "J" if the AI engine gives a probability of 0.9 out of 1 and for the alphabet "F", it gives 0.8 out of 1. We could then use natural language processing to get the nearest correct word that matches a word in our dictionary. In the above image, it would be giving more priority to the alphabet "F" over "J".


Sample Image #3

Google makes the least number of mistakes on this image, followed by Microsoft and then Amazon. We will be comparing Microsoft and Google since their mistakes were comparable and not include Amazon in this analysis.

Interesting Observations:

  • Google looks at the first page and then the second page of the image for recognition.
  • Microsoft looks at both pages as a single page and recognized sentences from left to right across both images.

Digitized text from Google Vision compared to actual image:

Handwriting Recognition - French Text Sample with Microsoft Azure Vision

Result:

  • Google Vision made 4 mistakes overall.

Digitized text from Microsoft Azure Vision compared to Actual image:

Handwriting Recognition - French Text Sample with Google Vision

Result:

  • Microsoft made 19 mistakes overall.

Out of all the services, Microsoft Azure Vision performed the best for English language, while Google Vision performed the best for French language. Amazon Textract is clearly a distant third.


Disclaimer: These results are based on the state of the AI models on May 20, 2020, and have been examined using a limited dataset. The results can change over time with newer models or better training of existing models.

Handwriting Recognition - Result

If your company is building applications using Big Data and Artificial Intelligence or creating web or mobile applications on cloud, microservices or on-premise, please check our services to find out why Fortune500 companies use Cazton for building multi-billion dollar revenue generating applications.

Cazton is composed of technical professionals with expertise gained all over the world and in all fields of the tech industry and we put this expertise to work for you. We serve all industries, including banking, finance, legal services, life sciences & healthcare, technology, media, and the public sector. Check out some of our services:

Cazton has expanded into a global company, servicing clients not only across the United States, but in Oslo, Norway; Stockholm, Sweden; London, England; Berlin, Germany; Frankfurt, Germany; Paris, France; Amsterdam, Netherlands; Brussels, Belgium; Rome, Italy; Quebec City, Toronto Vancouver, Montreal, Ottawa, Calgary, Edmonton, Victoria, and Winnipeg, as well. In the United States, we provide our consulting and training services across various cities like Austin, Dallas, Houston, New York, New Jersey, Irvine, Los Angeles, Denver, Boulder, Charlotte, Atlanta, Orlando, Miami, San Antonio, San Diego and others. Contact us today to learn more about what our experts can do for you.


Credits:
  • Image #1: https://burninghousepressblog.files.wordpress.com/2019/02/3-2019-guest-editors-guidelines-elytron-frass-bhp.jpg?w=1400
  • Image #2: https://fiverr-res.cloudinary.com/images/q_auto,f_auto/gigs/101199593/original/720acb453a66b48c43227c2221db3291d3d4957e/create-a-custom-handwritten-letter.jpg
  • Image #3: https://pictures.abebooks.com/LASCAR/30243072533.jpg

Software Consulting

Would you like some world class consultants be at your service? Our experts are able to quickly identify, predict, and satisfy your current and future need.

Learn More

Trainings & Workshops

Would you like some world class training? Choose from one of the existing packages or contact us to create a training fully customizable for your needs.

Learn More

Recruiting & Staffing

Would you like some recruiting help? We provide full-service technical staffing to suit your needs: contract, contract-to-hire, full-time.

Learn More

Copyright © 2024 Cazton. • All Rights Reserved • View Sitemap