Voice RAG

Using NoSQL API? Please refer to our our related guide on creating an AI Assistant powered by Azure OpenAI services integrated with Azure Cosmos DB Core (NoSQL) API. Learn how to leverage Azure Cosmos DB's flexible schema, powerful querying capabilities, and global distribution while building intelligent conversational experiences.
Voice RAG: Real-time intelligent communication between human and AI, utilizing human-like voices for seamless interactions.
Real-time AI Assistant: Delivers intelligent responses on your own data instantly by using Azure OpenAI services, seamlessly integrated with Azure Cosmos DB for MongoDB.
Microsoft and Cazton: We work closely with OpenAI, Azure OpenAI and many other Microsoft teams. Thanks to Microsoft for providing us very early access to critical technologies. We are fortunate to have been working on GPT-3 since 2020, a couple years before ChatGPT was launched.
Top clients: At Cazton, we help Fortune 500, large, mid-size and startup companies with Big Data, AI and ML, GenAI, OpenAI and open source AI models, custom software development, deployment (MLOps), consulting, recruiting services and hands-on training services. Our clients include Microsoft, Broadcom, Thomson Reuters, Bank of America, Macquarie, Dell and more.

We are excited to announce the open-sourcing of our Real-Time AI Assistant, leveraging Azure OpenAI GPT-4o Realtime API and Cosmos DB for rapid data queries and intelligent, real-time interactions. By building on Microsoft’s original framework, we’ve adapted the solution to use Azure Cosmos DB for MongoDB instead of Azure AI Search, ensuring optimized data management and retrieval.

Our solution integrates human-like AI voice responses, allowing users to interact naturally using voice commands. This release includes both the source code and a video demo, showcasing how the assistant provides real-time answers to spoken queries with AI-generated speech.

This embedded video showcases the AI assistant in action, featuring:

Voice-based conversations between a user and the AI assistant.
AI-generated responses that sound remarkably human.
Real-time query execution with results retrieved from Cosmos DB.

Our solution features multiple key components, all working together to provide efficient, real-time query capabilities. Let’s explore the major files that power the system.

The Foundation of the AI System (app.py)

Loads environment variables and connects to Azure services like Cosmos DB and OpenAI.
Configures the AI assistant’s behavior for processing incoming queries.
Serves both static files and dynamic API endpoints.

The AI assistant uses retrieved knowledge to answer questions in real-time, making it ideal for dynamic business applications.

Advanced Document Retrieval and Processing (ragtools.py)

Chunking and Overlap: Large documents are broken into manageable 1,000-character chunks with 150-character overlaps to preserve context across sections.
HNSW Vector Search: Uses HNSW (Hierarchical Navigable Small World) indexing for nearest-neighbor searches. This ensures quick and accurate search performance across large datasets.
Azure OpenAI Embeddings: Vector-based similarity search is conducted by comparing a query to the document vectors, finding the most relevant documents. This allows for highly accurate retrieval based on semantic meaning rather than keywords alone. Documents are retrieved with metadata to help contextualize the results, which are formatted for AI-driven responses.

Tip: If you prefer not to use HNSW, update the search type in ragtools.py to CosmosDBVectorSearchType.VECTOR_IVF for compatibility with CosmosDB.

The Backbone of Real-Time Communication (rtmt.py)

- WebSocket Management: Uses aiohttp to maintain persistent connections between the client and server, supporting live queries.
- Tool-Based Interaction: Defines tools (e.g., search, reporting) that can be dynamically invoked during conversations to provide specific insights.
- Authentication & Token Management: Integrates Azure token management, ensuring secure communication and continuous session management.
- Real-Time Responses: Messages are processed dynamically, forwarding results to the client or triggering tool-based interactions as needed.

Pre-requisites: Before starting, ensure you have access to the following:

Azure OpenAI Access: Deployed GPT-4o and embedding models on Azure.
Azure Cosmos DB for MongoDB: Used to store document embeddings.

Step 1: Set up the Environment:

Create a .env file in the backend folder and configure the following variables:

AZURE_OPENAI_ENDPOINT=wss://<your-instance-name>.openai.azure.com

AZURE_OPENAI_DEPLOYMENT=gpt-4o-realtime-preview

AZURE_OPENAI_API_KEY=<your-api-key>

MONGO_CONNECTION_STRING=<your-mongo-connection-string>

MONGO_DB_NAME=<your-database-name>

MONGO_COLLECTION_NAME=<your-collection-name>

Step 2: Install Dependencies:

Make sure Node.js and npm are installed:

You can download Node.js from https://nodejs.org/en/download/package-manager
Windows users: Ensure PowerShell is available for running scripts.

Step 3: Run the Application:

Use the provided startup script to launch the app:

Windows: .\scripts\start.ps1

Linux/Mac: ./scripts/start.sh

Place any relevant documents in the ./data/ directory for processing. On the first run, the system will index documents and store the embeddings in Cosmos DB.

Step 4: Access and Interact with the App:

After launching, the app will be available at http://localhost:8765. From here:

Speak directly to the AI using the voice interface.
Query documents and receive real-time responses.

Our open-source AI assistant offers a powerful blend of Azure OpenAI services and Azure Cosmos DB, providing fast, intelligent responses through real-time voice interactions. With features like document chunking, HNSW vector search, and tool-based interactions, this solution is highly scalable and adaptable to various business needs.

We look forward to your feedback and contributions! Check out the repository, try the demo, and let us know how you’ve customized the solution for your use case.

Visit our GitHub: https://github.com/cazton/CaztonVoiceRag

Cazton is composed of technical professionals with expertise gained all over the world and in all fields of the tech industry and we put this expertise to work for you. We serve all industries, including banking, finance, legal services, life sciences & healthcare, technology, media, and the public sector. Check out some of our services:

Cazton has expanded into a global company, servicing clients not only across the United States, but in Oslo, Norway; Stockholm, Sweden; London, England; Berlin, Germany; Frankfurt, Germany; Paris, France; Amsterdam, Netherlands; Brussels, Belgium; Rome, Italy; Sydney, Melbourne, Australia; Quebec City, Toronto Vancouver, Montreal, Ottawa, Calgary, Edmonton, Victoria, and Winnipeg as well. In the United States, we provide our consulting and training services across various cities like Austin, Dallas, Houston, New York, New Jersey, Irvine, Los Angeles, Denver, Boulder, Charlotte, Atlanta, Orlando, Miami, San Antonio, San Diego, San Francisco, San Jose, Stamford and others. Contact us today to learn more about what our experts can do for you.

Top Services

Artificial Intelligence

Cazton offers a comprehensive suite of services encompassing custom software development, consult... Read more

Big Data Development

Cazton has been at the forefront of Big Data innovation. Our diverse team, including but not limi... Read more

Web Development

Cazton specializes in full-stack development, DevOps, and comprehensive training across the lates... Read more

Mobile Development

Cazton, a renowned leader in mobile application development and consulting, stands at the forefro... Read more

Desktop Development

Cazton, an industry leader in desktop application development and consulting, stands as a beacon ... Read more

API Development

API development has evolved as a fascinating and challenging architectural paradigm over the year... Read more

Database Development

Selecting the right database solution is a pivotal business decision, as data volume grows, intro... Read more

Cloud

Navigating the vast landscape of cloud frameworks is a critical endeavor, and at Cazton, we stand... Read more

DevOps

In the dynamic landscape of IT operations, DevOps has emerged as a transformative force, and at C... Read more

Enterprise Search

Cazton, a trusted name in technology solutions, takes pride in its team of Enterprise Search expe... Read more

Enterprise Architecture

Cazton stands as a distinguished authority in enterprise architecture, with a team of seasoned ex... Read more

Blockchain

Cazton stands as a premier provider of top-notch Blockchain consulting and training services, spe... Read more

Latest Articles

AI Agent Best Practices

AI agents are transforming digital experiences, offering intelligent, conte...

FreshDiskANN: Revolutionizing Real-Time Similarity Search

In today's data-driven world, similarity search has become a cornerstone te...

HNSW vs DiskANN

Searching large datasets effectively is a challenge, especially when each d...

OpenAI Agents API

Discover how OpenAI's latest Agents API is transforming AI development with...

Voice AI

Imagine a world where interacting with technology feels as natural as havin...

AI Voice Assistant

In the ever-evolving landscape of artificial intelligence, the need for sea...

Voice RAG

At Cazton, we specialize in creating AI systems that have high accuracy and...

vCore-based Azure Cosmos DB for MongoDB vs MongoDB Atlas

This benchmark study provides a comprehensive comparison of Azure Cosmos DB...

Fine-tuning vs RAG vs RAFT

Fine-tuning, retrieval-augmented generation (RAG), and retrieval-augmented ...

Snowflake Experts

Snowflake emerges as a versatile and powerful solution for organizations se...

Retrieval-Augmented Fine-Tuning

RAFT

The article introduces RAFT, an acronym for Retrieval-Augmented Fine-Tuning...

Advanced RAG Techniques

The landscape of artificial intelligence (AI) has been constantly evolving,...

Voice RAG

Introduction:

Video Demo: AI-Driven Voice Interaction

How It Works

Tutorial: Running the Application

Conclusion: