Azure vs AWS vs GCP

Note: Based on communication from Amazon, the AWS service mentioned, Amazon Textract, does not currently support handwriting recognition.

Did you know that we can use Artificial Intelligence (AI) to convert a handwritten document into digital text? Did you know that we can also preserve the location of the text accurately? A handwritten letter, postcard or biography could be converted to text using OCR (optical character recognition). At the same time, we can also find out the location (x and y coordinates) of every single character on the image. Location, along with the width and the height of the character, can help us create a digitized version of that document while preserving the original format.

Imagine a hospital with decades worth of handwritten patient records; imagine businesses with decades of handwritten invoices; imagine handwritten texts dating back to medieval, if not ancient times - the possibilities are endless. It's just a matter of using the appropriate AI model in the language of choice.

Our data team is comprised of hands-on software architects, data engineers and data scientists that are highly competent in scaling applications and making them high performance. While we have created models like these that work on-premise, we also wanted to compare the current cloud offerings. In this blog post, we will be comparing Amazon Textract, Google Vision, and Microsoft Azure Vision API for recognizing some samples of handwritten text.

The application follows a microservices architecture. We have a load balancer that takes all requests from the browser. The solution follows the following steps:

User uploads a picture using a browser.
The request goes to the load balancer.
Load balancer designates the request to an application server.
The application server handling that request calls the desired micro-service.
The micro-service invokes the appropriate Cloud API that has the pre-trained model that converts handwritten image to digitized text and returns the output back to the UI.

Handwriting Recognition - Microservices Architecture

The pre-trained models that convert handwritten images to digitized text are artificial intelligence models that are running in the cloud. The Cazton team has created similar AI models that can run on premise and the architecture we have used here is a multi-cloud architecture. That means it can run on-premise, in a hybrid scenario (which means on a cloud as well as on-premise), and in a multi cloud scenario. However, the point to note is that we can run this anywhere without making changes in the code or the deployment model.

We will be comparing the following three services for their accuracy to convert handwritten documents correctly:

Amazon Textract
Microsoft Azure Vision
Google Vision

Note: We will be counting word mistakes (i.e. a word, if not ignored, is counted as one mistake if one or more characters in it are incorrectly transcribed by service).

For comparison, we take three sample images found on the internet:

English handwriting sample, 193 words, 1-page.
English handwriting sample, 116 words, 1-page.
French handwriting sample, 341 words, 2-page.

Sample Image #1

Microsoft makes the least number of mistakes on this image, followed by Google and then Amazon. We will be comparing Microsoft and Google since their mistakes were comparable and not include Amazon in this analysis.

Side-by-side comparison for Microsoft Azure Vision digitized text (zoomed lower half of actual input image to the micro-service):

Handwriting Recognition - English Text Sample with Microsoft Azure Vision

Here are certain results we have chosen to ignore due to the errors of the original drafter:

se7eN is spelt incorrectly.
Signatures have been ignored.
The numbers (1 to 7) are within a circle. This has surely confused all the AI services.
twist guesteditorbhp@gmail.com when reading from left-to-right seems to be in a line and so in digitized text it appears likewise.

Assuming the above points, in the above image snippets, it made 11 word-mistakes. Looking at the document overall, Microsoft Azure Vision made 12 word-mistakes.

Side-by-side comparison for Google Vision digitized text (zoomed center of actual input image to the micro-service):

Handwriting Recognition - English Text Sample with Google Vision

Here are certain results we have chosen to ignore:

No. 1 to 3 above are the same for Google.
Some words, which were strikethrough like "contained" and "as" were not interpreted and are penalized as mistakes here.

Assuming the above points, in the above image snippets, it made 18-word mistakes. Looking at the document overall, Google Vision made 20-word mistakes.

Sample Image #2

Side-by-side comparison for Microsoft Azure Vision digitized text (zoomed section of actual input image to the micro-service):

Handwriting Recognition - Cursive English Text Sample with Microsoft Azure Vision

Result:

From the above zoomed section, Microsoft Azure Vision made 2 mistakes.
Looking at the document overall, it made 2 mistakes.

Side-by-side comparison for Google Vision digitized text (zoomed section of actual input image to the micro-service):

Handwriting Recognition - Cursive English Text Sample with Google Vision

Result:

In the above section, Google Vision made 31 mistakes.
Looking at the document overall, it made 31 mistakes.

In the above image, if we look at the word "Jelicitations", Google Vision API almost gets the entire word correct except for the first character. The closest word that matches the digitized text "Jelicitations" is "Felicitations". A possible solution for such an error would be to check the probability along with natural language processing.

For example, for the alphabet "J" if the AI engine gives a probability of 0.9 out of 1 and for the alphabet "F", it gives 0.8 out of 1. We could then use natural language processing to get the nearest correct word that matches a word in our dictionary. In the above image, it would be giving more priority to the alphabet "F" over "J".

Sample Image #3

Google makes the least number of mistakes on this image, followed by Microsoft and then Amazon. We will be comparing Microsoft and Google since their mistakes were comparable and not include Amazon in this analysis.

Interesting Observations:

Google looks at the first page and then the second page of the image for recognition.
Microsoft looks at both pages as a single page and recognized sentences from left to right across both images.

Digitized text from Google Vision compared to actual image:

Handwriting Recognition - French Text Sample with Microsoft Azure Vision

Result:

Google Vision made 4 mistakes overall.

Digitized text from Microsoft Azure Vision compared to Actual image:

Handwriting Recognition - French Text Sample with Google Vision

Result:

Microsoft made 19 mistakes overall.

Out of all the services, Microsoft Azure Vision performed the best for English language, while Google Vision performed the best for French language. Amazon Textract is clearly a distant third.

Disclaimer: These results are based on the state of the AI models on May 20, 2020, and have been examined using a limited dataset. The results can change over time with newer models or better training of existing models.

If your company is building applications using Big Data and Artificial Intelligence or creating web or mobile applications on cloud, microservices or on-premise, please check our services to find out why Fortune500 companies use Cazton for building multi-billion dollar revenue generating applications.

Credits:

Image #1: https://burninghousepressblog.files.wordpress.com/2019/02/3-2019-guest-editors-guidelines-elytron-frass-bhp.webp?w=1400
Image #2: https://fiverr-res.cloudinary.com/images/q_auto,f_auto/gigs/101199593/original/720acb453a66b48c43227c2221db3291d3d4957e/create-a-custom-handwritten-letter.webp
Image #3: https://pictures.abebooks.com/LASCAR/30243072533.webp

Cazton is composed of technical professionals with expertise gained all over the world and in all fields of the tech industry and we put this expertise to work for you. We serve all industries, including banking, finance, legal services, life sciences & healthcare, technology, media, and the public sector. Check out some of our services:

Cazton has expanded into a global company, servicing clients not only across the United States, but in Oslo, Norway; Stockholm, Sweden; London, England; Berlin, Germany; Frankfurt, Germany; Paris, France; Amsterdam, Netherlands; Brussels, Belgium; Rome, Italy; Sydney, Melbourne, Australia; Quebec City, Toronto Vancouver, Montreal, Ottawa, Calgary, Edmonton, Victoria, and Winnipeg as well. In the United States, we provide our consulting and training services across various cities like Austin, Dallas, Houston, New York, New Jersey, Irvine, Los Angeles, Denver, Boulder, Charlotte, Atlanta, Orlando, Miami, San Antonio, San Diego, San Francisco, San Jose, Stamford and others. Contact us today to learn more about what our experts can do for you.

Top Services

Artificial Intelligence

Cazton offers a comprehensive suite of services encompassing custom software development, consult... Read more

Big Data Development

Cazton has been at the forefront of Big Data innovation. Our diverse team, including but not limi... Read more

Web Development

Cazton specializes in full-stack development, DevOps, and comprehensive training across the lates... Read more

Mobile Development

Cazton, a renowned leader in mobile application development and consulting, stands at the forefro... Read more

Desktop Development

Cazton, an industry leader in desktop application development and consulting, stands as a beacon ... Read more

API Development

API development has evolved as a fascinating and challenging architectural paradigm over the year... Read more

Database Development

Selecting the right database solution is a pivotal business decision, as data volume grows, intro... Read more

Cloud

Navigating the vast landscape of cloud frameworks is a critical endeavor, and at Cazton, we stand... Read more

DevOps

In the dynamic landscape of IT operations, DevOps has emerged as a transformative force, and at C... Read more

Enterprise Search

Cazton, a trusted name in technology solutions, takes pride in its team of Enterprise Search expe... Read more

Enterprise Architecture

Cazton stands as a distinguished authority in enterprise architecture, with a team of seasoned ex... Read more

Blockchain

Cazton stands as a premier provider of top-notch Blockchain consulting and training services, spe... Read more

Latest Articles

OpenAI Agents API

Discover how OpenAI's latest Agents API is transforming AI development with...

Voice AI

Imagine a world where interacting with technology feels as natural as havin...

AI Voice Assistant

In the ever-evolving landscape of artificial intelligence, the need for sea...

Voice RAG

At Cazton, we specialize in creating AI systems that have high accuracy and...

vCore-based Azure Cosmos DB for MongoDB vs MongoDB Atlas

This benchmark study provides a comprehensive comparison of Azure Cosmos DB...

OpenAI Case Studies

We delve into a collection of real-world case studies, exploring the transf...

ChatGPT for Business

Cazton team has solved all the major challenges and can help you create a f...

OpenAI vs CaztonAI

ChatGPT is the fastest product in history to have acquired more than a mill...

Ten Keynotes in Four Continents

Ten Keynotes in Four Continents in One Month

Our CEO, Chander Dhall, successfully delivered ten keynote speeches in four...

Cosmos DB vs MongoDB

Just a few years ago, it was acceptable to have offline data processing tha...

Cost, Quality and Time

Cost, Quality and Time: Yes, You Can Have It All

Time and again, we hear this in the tech industry, "Cost, quality or time -...

How Costly Is Cheap Code?

At times, software development can be easy, but to make your software perfo...

Azure vs AWS vs GCP

Handwriting Recognition

Architecture

Comparing Images