ElasticSearch-Part2-Populate and Search

In the last post we discussed how we would configure the indices for ChanderYoga. This post will describe the process of actually creating the indices, populating the indices, and creating ElasticSearch requests that fulfill the feature requirements.

Creating an index in ElasticSearch is a simple process. If the feature requirements are satisfied by the default configuration, then you simply add a document and the index will be automatically created.

For ChanderYoga I will be creating an index per resource. This will allow me to change the analysis and settings for each index individually and reindex the data if necessary without effecting the other resources.

In the downloadable code base at Application_Start() the indices are created and populated if they don't already exist.

First, lets create the blog index using the configuration we created in the Part 1. For the sake of brevity I have removed the body of the analysis and the mappings in the example below.

POST http://localhost:9200/blog

 { "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 0 }, "analysis": { ... } }, "mappings": {...} }

The next step will be populating the indices using the bulk api. The bulk api accepts multiple types of actions, the most useful for initial population of indices is the "index" action. The actual body of the request has specific format requirements. The first line declares the type of action, index, type, and the document id (optional, except in udpate/delete actions). The second line (not required for a delete action) is the actual contents of the action. Each line must end in a newline character.

 { "index": { "_index": "asanas", "_type": "asana", "Id": "1" }} { "field1": "value1", "field2": "value2" }

I have constructed some C# classes to handle creating a bulk api request. Below are the "BulkRequest", "BulkActionBase", and "IndexBulkAction" classes.

 public class BulkRequest { private readonly string _actionsJson;  public readonly IElasticSearchUriProvider UriProvider; public readonly IEnumerable<BulkActionBase> Actions;  private BulkRequest(IElasticSearchUriProvider uriProvider) { if (uriProvider == null) { throw new ArgumentNullException("uriProvider"); }  UriProvider = uriProvider; }  public BulkRequest(IElasticSearchUriProvider uriProvider, IEnumerable<BulkActionBase> actions) : this(uriProvider) { if (actions == null || !actions.Any()) { throw new ArgumentNullException("actions"); }  Actions = actions; }  /// <summary> /// Only use this constructor if you are sure that you have the correctly formatted request in string form already. /// </summary> /// <param name="actionsJson"></param> public BulkRequest(IElasticSearchUriProvider uriProvider, string actionsJson) : this(uriProvider) { if (string.IsNullOrWhiteSpace(actionsJson)) { throw new ArgumentNullException("actionsJson"); }  // Trim and add a newline character at the end of the string just incase the user forgot. _actionsJson = actionsJson.Trim() + "\r\n"; }  public override string ToString() { StringBuilder builder = new StringBuilder(_actionsJson); if (builder.Length == 0) { foreach (BulkActionBase action in Actions) { builder.AppendLine(action.ToString()); } }  return builder.ToString(); } }  public abstract class BulkActionBase { protected abstract string Action { get; }  public readonly string Index; public readonly string Type; public readonly string DocumentId;  protected BulkActionBase(string index, string type) { if (string.IsNullOrWhiteSpace(index)) { throw new ArgumentNullException("index"); }  if (string.IsNullOrWhiteSpace(type)) { throw new ArgumentNullException("type"); }  Index = index; Type = type; }  protected BulkActionBase(string index, string type, string documentId) : this(index, type) { if (string.IsNullOrWhiteSpace(documentId)) { throw new ArgumentNullException("documentId"); }  DocumentId = documentId; }  public override string ToString() { string result = null; if (string.IsNullOrWhiteSpace(DocumentId)) { result = string.Format("{{\"{0}\":{{\"_index\":\"{1}\",\"_type\":\"{2}\"}}}}", Action, Index, Type); } else { result = string.Format("{{\"{0}\":{{\"_index\":\"{1}\",\"_type\":\"{2}\",\"Id\":\"{3}\"}}}}", Action, Index, Type, DocumentId); }  return result; } }  public class IndexBulkAction<T> : BulkActionBase where T : class { protected override string Action { get { return "index"; } }  public readonly T Document;  public IndexBulkAction(string index, string type, T document) : base(index, type) { if (Document == default(T)) { throw new ArgumentNullException("document"); }  Document = document; }  public IndexBulkAction(string index, string type, string documentId, T document) : base(index, type, documentId) { if (Document == default(T)) { throw new ArgumentNullException("document"); }  Document = document; }  public override string ToString() { StringBuilder builder = new StringBuilder(); builder.AppendLine(base.ToString()); builder.Append(JsonConvert.SerializeObject(Document));  return builder.ToString(); } }

The override of ToString() in each of these classes is an easy way to create the necessary json for a bulk api request.

After constructing the body of the request we are ready to perform a POST request against the cluster, which will look something like this:

POST http://localhost:9200/_bulk

 { "index": { "_index": "asanas", "_type": "asana", "Id": "1" }} { "field1": "value1", "field2": "value2" } { "index": { "_index": "asanas", "_type": "asana", "Id": "2" }} { "field1": "value1", "field2": "value2" }

It is finally time to craft search requests to find our data. Let's review our requirements to make sure we satisfy them with the search requests.

Feature: A single text box through which a user can search over asanas, mudras, pranayamas, and blog posts.

All data visible to a user should be searchable.
The user should be able to see the type of resource for each search result.
Fields like Name and Title should be more important to the search results than fields like Tags or Categories, and the least important fields should be large text fields.

To simplify the process of creating and maintaining search requests as well as handling the response from the server I will be making a search request per resource.

First, I will tackle the most complex resource, the Asana resource.

Model Property	Json Field	Analyzed Search Field
Id	id	N/A
Author	author	author
SanskritName	sanskrit_name	name
EnglishName	english_name	name
Description	description	text
Sequence	sequence	text
Benefits	benefits	text
Breathing	breathing	text
Chakra	chakra	text
Categories	categories	tags
AnatomyFocuses	anatomy_focus	tags
Contraindications	contras	tags
Therapeutics	therapeutics	tags

Many of the fields use "copy_to" to have their values analyzed and searchable through the name, text, or tags fields. This leaves of with three value categories of searchable content in the Asana index (name > tags/author > text). Using the multi_match query we can search against multiple fields at once all using their own configured analysis, and apply unique "boost" values per field by adding a "^n" to the field name, like "name^10". Boosting makes any match on that field worth approximately n times more than a match on a field without a specified boost value.

 { "from":0, "size":5, "query":{ "multi_match":{ "query":"chander", "fields":[ "name^10", "name.starts_with^7", "name.contains_shingle^3", "tags^8", "tags.starts_with^6", "tags.contains_shingle^2", "text^7", "text.starts_with^4", "text.contains_shingle", "author^8", "author.starts_with^5", "author.contains_shingle^2" ] } } }

This query searches for full, starts with, and contains matches making each worth more than the next. So if the user has searched for "chander" and "Chander Dhall" is the author of a blog post, then it is very likely that document will receive a high relevancy score in relation to other asanas. If the word "chander" also exists in the name of the asana that match would be worth even more.

Lets apply this to the other resources:

Mudras:

Model Property	Json Field	Analyzed Search Field
Id	id	N/A
Author	author	author
Name	name	name
Description	description	text
Sequence	sequence	text
Benefits	benefits	text

 { "from":0, "size":5, "query":{ "multi_match":{ "query":"chander", "fields":[ "name^10", "name.starts_with^7", "name.contains_shingle^3", "text^7", "text.starts_with^4", "text.contains_shingle", "author^8", "author.starts_with^5", "author.contains_shingle^2" ] } } }

Pranayamas:

Model Property	Json Field	Analyzed Search Field
Id	id	N/A
Author	author	author
Name	name	name
Description	description	text
Sequence	sequence	text
Benefits	benefits	text

 { "from":0, "size":5, "query":{ "multi_match":{ "query":"chander", "fields":[ "name^10", "name.starts_with^7", "name.contains_shingle^3", "text^7", "text.starts_with^4", "text.contains_shingle", "author^8", "author.starts_with^5", "author.contains_shingle^2" ] } } }

Blog Posts:

Model Property	Json Field	Analyzed Search Field
Id	id	N/A
Author	author	author
Title	title	title
Text	text	text
Tags	tags	tags
Benefits	benefits	text

 { "from":0, "size":5, "query":{ "multi_match":{ "query":"chander", "fields":[ "title^10", "title.starts_with^7", "title.contains_shingle^3", "tags^8", "tags.starts_with^6", "tags.contains_shingle^2", "text^7", "text.starts_with^4", "text.contains_shingle", "author^8", "author.starts_with^5", "author.contains_shingle^2" ] } } }

In part 3 we will cover how to utilize these search requests, and handle the ElasticSearch responses in a C# .NET environment.

Top Services

Artificial Intelligence

Cazton offers a comprehensive suite of services encompassing custom software development, consult... Read more

Big Data Development

Cazton has been at the forefront of Big Data innovation. Our diverse team, including but not limi... Read more

Web Development

Cazton specializes in full-stack development, DevOps, and comprehensive training across the lates... Read more

Mobile Development

Cazton, a renowned leader in mobile application development and consulting, stands at the forefro... Read more

Desktop Development

Cazton, an industry leader in desktop application development and consulting, stands as a beacon ... Read more

API Development

API development has evolved as a fascinating and challenging architectural paradigm over the year... Read more

Database Development

Selecting the right database solution is a pivotal business decision, as data volume grows, intro... Read more

Cloud

Navigating the vast landscape of cloud frameworks is a critical endeavor, and at Cazton, we stand... Read more

DevOps

In the dynamic landscape of IT operations, DevOps has emerged as a transformative force, and at C... Read more

Enterprise Search

Cazton, a trusted name in technology solutions, takes pride in its team of Enterprise Search expe... Read more

Enterprise Architecture

Cazton stands as a distinguished authority in enterprise architecture, with a team of seasoned ex... Read more

Blockchain

Cazton stands as a premier provider of top-notch Blockchain consulting and training services, spe... Read more

Latest Articles

HNSW vs DiskANN

Searching large datasets effectively is a challenge, especially when each d...

OpenAI Agents API

Discover how OpenAI's latest Agents API is transforming AI development with...

Voice AI

Imagine a world where interacting with technology feels as natural as havin...

AI Voice Assistant

In the ever-evolving landscape of artificial intelligence, the need for sea...

Voice RAG

At Cazton, we specialize in creating AI systems that have high accuracy and...

vCore-based Azure Cosmos DB for MongoDB vs MongoDB Atlas

This benchmark study provides a comprehensive comparison of Azure Cosmos DB...

Fine-tuning vs RAG vs RAFT

Fine-tuning, retrieval-augmented generation (RAG), and retrieval-augmented ...

Snowflake Experts

Snowflake emerges as a versatile and powerful solution for organizations se...

Retrieval-Augmented Fine-Tuning

RAFT

The article introduces RAFT, an acronym for Retrieval-Augmented Fine-Tuning...

Advanced RAG Techniques

The landscape of artificial intelligence (AI) has been constantly evolving,...

Create Your AI Team

AI Agents

The realm of artificial intelligence has birthed a transformative evolution...

.NET 8 and Azure OpenAI

In this blog post, I will showcase how to build a simple yet powerful chatb...