Azure AI Search

  • Azure AI Search is a managed cloud service to search unstructured data using artificial intelligence and natural language processing powered by the infinite scale of Microsoft Azure cloud.
  • It allows you to enrich data and encrypt information in the indexing pipeline by providing Lucene powered search capabilities in more than 50 languages with 99.9% availability.
  • Top features include full-text search, autocomplete, facets, filters, spell-checking, hit-highlighting, paging and boosting.
  • At Cazton, we help Fortune 500, large and mid-size companies with Azure AI Search development, consulting, recruiting services and hands-on training services.
 

How do you decide which search engine to pick? Do you look which languages it supports, compatibility with devices or the ease of use? How does a search engine work? Can a search engine be more than a full-text search? Would you like to get more information and analytics out of your mammoth size data? Will the search engine look for relationships in the data? How does your search engine fare against competitors? Wait no further, here we present to you an all-in-one compilation of the Azure AI Search from Microsoft.

Azure AI Search, is referred to the use of cognitive skills and AI processing in core operations. It uses the natural language stack which was being used in Bing and Office and the AI services across vision, language, and speech. It allows you to add search operations to your web/mobile/enterprise applications even if you are not a search engine expert. It is comparatively easy to search a structured data, but Azure AI Search can search through un-structured data too, and extract meaning out of it.

Azure AI Search offers a multitude of great features:

  • Search-as-a-service cloud solution
  • Auto-complete
  • Geo-spatial search
  • Filters
  • Facets (or aggregations)
  • Built-in Artificial Intelligence (AI) for:
    • Optical Character Recognition (OCR)
    • Key-phrase extraction and
    • Named entity extraction to unlock insights

It provides fully managed cloud search service through simple REST API or through .NET SDK and provides services such as scoring, geo-search, faceting, auto-completion, and synonym search. At Cazton, we help clients with debugging index corruption, monitoring for service availability and scaling the indexing process. However, Azure AI Search automates all that for you. That's a value add and helps save time and enhances productivity.

The Azure AI package can turn raw, unstructured information into searchable content using its aptness in vision, speech, and language processing. All these intelligent information support can be enabled within the search configuration with ease and is accessible all around the world in more than 50 languages with 99.9% availability. Additionally, there exists ways to filter out search information specific to the industry and business requirements. Not only this, all this power is encrypted throughout the indexing pipeline from malicious activities. Cognitive search works with several file formats including not only Microsoft Word, PowerPoint, Excel, but Adobe PDF, PNG, RTF, JSON, HTML and XML, Cosmos DB or Azure Blob Storage.

Continue reading to dig deeper into Azure AI Search. Contact us today to learn more about what our experts can do for you.

How does the search happen behind the scenes?

Azure AI Search uses Lucene for full text search through four stages of Lucene query execution: query parsing, lexical analysis, document retrieval and scoring. Understanding about what they are can help us in knowing what is happening behind the scenes.

  • Query Parsing: Here the query terms are separated from query operators, and a query tree is formed to send it to the search engine.
  • Analyzers: Lexical analysis is performed on the query terms which typically includes transforming, removing, or expanding of query terms.
  • Document Retrieval: A suitable data structure is used to store and organize the searchable terms from the index and retrieve them.
  • Scoring: The documents retrieved are now matched based on the inverted index (where the terms are mapped to document number containing the terms) and scored for ranking them.

When do I use Azure AI Search?

It can be applied on:

  • Heterogeneous Data: When you have lots of heterogeneous data, Azure AI Search can be used to compile it to a single private searchable index. The index can be created either by using streams of JSON documents or by using Azure "indexer" on the content hosted on Azure cloud to pull the data into an index. An "indexer" is basically a crawler that extracts searchable data and metadata from content and populates the index with field to field mappings between index and content.
  • Raw Content: This contains image files, application files or any undifferentiated text where you want to apply Azures cognitive skills during indexing to extract some meaning out of the data or add a structure to the data.
  • Search Features: Azure AI Search provide APIs for faceted navigation on your data, filters including geo-spatial searches, synonym mapping, relevance tuning, typeahead queries and even to simplify query construction for your application.
  • Image and Text Documents: This can be used to extract entity recognition tags, key phrases over a large document, to identify the language of document, translate it to a new language, or sentiment analysis over scanned documents.
  • Linguistic Documents: Custom and language analyzers of Azure can be configured to filter out the diacritics – a sign which when written above or below a letter indicates a difference in pronunciation, for example, élevàtor ôperàtor, etc.

What features do I get on the data?

  • Free Form Text Search: Full-text search is the primary function of a search service and can be used in following syntaxes:
    • Simple Query Syntax: It provides precedence operators, suffix operators, logical operators, and phrase search operators.
    • Lucene Query Syntax: It includes all operations in simple syntax along with regular expressions, proximity search, term boosting and fuzzy search.
  • Relevance: The search results can be a model of scoring profiles as a function of values in the document. What this means is to give personalized scoring based on customer search preferences they have searched and tracked.
  • Geo-Search: As the name implies, it lets the user to explore data based on the proximity of the physical location.
  • Filters and Facets: By using faceted navigation, one can direct a user to category-based search or apply a user / developer-based criterion. This gives a capability to the user to better drill into the data he/she cares about.
  • Improve user experience:
    • Autocomplete could be used for type-ahead queries in a search bar to suggest users some options.
    • Spelling mistakes occur frequently while typing, so user should not be penalized for that typo.
    • Synonym associated equivalent term search which the user never typed in search bar.
    • Hit highlighting to highlight matching keywords in search results.
    • Paging and throttling the search options
  • AI skills such as Natural Language Processing (include Entity Recognition, language detection, key-phrase extraction, text manipulation, sentiment detection and PII detection) and Image processing (OCR, image recognition, facial detection, and image interpretation). The skills provided by the Azure AI Search are described below:
Sr No. Skills Description
1 Custom Entity Looks for a user-defined set of words and phrases and supports fuzzy matching.
2 Key Phrases It uses a pretrained model to extract data which detects important phrases based on how unusual a term is within the document, term placement in a document, proximity to other terms and linguistic rules. It is useful when you would like to know the main points in the record (max length 50000 characters).
3 Language Detection It detects the language of each document. It is useful when you need to provide the language of the text as input to other skills like Sentiment Analysis, or Text Split skill.
4 Merge Combines text from multiple fields to a single field. A common use case is to extract a caption from the OCR skill and then merge it with the content field of a document.
5 Entity Recognition Identify people, location, organization, emails, datetime, URLs fields.
6 PII Detection Extract personally identifiable information from a given text.
7 Sentiment Analysis Score (0 to 1) positive or negative on a record by record skill or a neutral score if the sentiment could not be identified.
8 Text Split Splits the text into sentences or pages of a specific length. to enrich or augment content incrementally.
9 Translate Translate a text into a variety of languages for normalizing or localizing use case. It is useful when you know that all the documents are not in a same language.
10 Image Analysis Identify content of image and generate text description, generate tags or identify celebrities or landmarks.
11 OCR Optical Character Recognition supports a maximum width and height of 10000 pixels for English and 4200 pixels for other languages. OCR API is used for non-English document, a new API ‘Read' is used for English documents for the same purpose.
12 Conditional Allow filtering, merging, or assigning a default value based on a condition, like searching only for Spanish documents, or setting a default value for a value that doesn't exist.
13 Document Extraction Extracts content from a document.
14 Shaper Maps output to a complex form which could then be used as a combination for search. It basically allows you to create a structure, define the name of the members of that structure, and assign values to each member.
15 Web API Extension of AI enrichment pipeline by making a HTTP call to custom web API.

What is AI enrichment in Azure AI Search?

AI enrichment is a capability Azure AI Search indexing which can be used to extract text from images, blobs or any other unstructured data. The enrichment store makes a data more searchable in an index or in a knowledge store. It passed through the following pipeline:

  • Document Cracking Phase: It means to extract or create text-based representation of non-structured data during the indexing phase.
  • Enrichment Phase: The AI skillset (collection of skills) that we discussed in the above table are applied here based on the requirement. All the data extracted is now enriched with the original data in the pipeline. These enrichments can now be safely stored in a knowledge store.
  • Search Index and Query-based Access: At this step in the process, we get access to a fully enriched data with full text search enabled.

A knowledge store collects the information about how the data is connected internally and is projected in form of Table Storage (tables), or Blob Storage (JSON objects, and images extracted from documents called files). This representation can now be used to create a data visualization in a tool like PowerBI with say, Power Query. It can generate relationships within and across different projection types. Any tool or process that can connect to Azure Storage like PowerBI, Azure Storage Explorer, or Azure Data Factory can now consume the contents of this knowledge store. Also, this can be accessed through a REST API. Pretty cool, right!

Ok, how does it compare to Elasticsearch?

Azure AI Search Elasticsearch
Free version is limited and modeled for Commercial use. Free and open source standalone software. (Need to pay for Elastic cloud)
Supports about 50 languages. Supports about 35 languages.
Supported in large number of devices. Supported in a smaller number of devices than Azure AI Search.
No in-memory capability. Memcached and Redis in-memory integration.
All shown features of Elasticsearch along with cognitive (AI) search capabilities. Features like full-text search, auto-complete, geo-search, bucket-aggregation for faceted navigation, and relevance.

How can Cazton help you with Azure AI Search Consulting?

While developing a search functionality, it is essential to have the right team and understand how to manage them. Expertise, experience and our company's history of success is crucial in making a project successful. Delay in projects not only reduces the competitive edge of companies, but can also result in massive layoffs. We, at Cazton, work with you ensure you are successful both as an individual by rising higher in your career and as a company by staying innovative and ahead of the competition.

Our experts can consult you with best practices and implementation strategies for Azure AI Search. We can help you improve your search experience by implementing various features including autocomplete, spell-checking, hit-highlighting, paging and throttling and AI skills such as Natural Language Processing (including Entity Recognition, language detection, key-phrase extraction, text manipulation, sentiment detection and PII detection). We have the expertise in implementing multi-dimensional search algorithms. We also provide on-demand Azure AI Search training. Contact us today to learn more about what our experts can do for you.

We help you make the right decision to achieve your business goals. We have the expertise to understand your requirements and tackle your data problems. Learn more about database, cloud, big data, artificial intelligence and other consulting and training services.