AI Voice Assistant

Voice RAG with Azure Cosmos DB and Azure OpenAI

  • Using MongoDB? Please refer to our companion article on building an AI Assistant using Azure OpenAI services with Azure Cosmos DB for MongoDB, offering similar intelligent response capabilities with MongoDB's familiar document model and query language.
  • Real-time Voice Interaction: We can help you create voice assistants based on your data that speak exactly like humans, fully customized and integrated with all your existing systems. 
  • Vector Search Enhancement: Utilization of both DiskANN and QuantizedFlat embedding types for optimized vector search performance, enabling efficient handling of large-scale similarity searches while managing memory constraints.
  • Automated Knowledge Base Management: Integration of comprehensive document handling system, automatically processing and chunking documents while maintaining detailed metadata for context-rich responses.
  • Microsoft and Cazton: We work closely with OpenAI, Azure OpenAI and many other Microsoft teams. Thanks to Microsoft for providing us very early access to critical technologies. We are fortunate to have been working on GPT-3 since 2020, a couple years before ChatGPT was launched.
  • Top clients: At Cazton, we help Fortune 500, large, mid-size and startup companies with Big Data, AI and ML, GenAI, OpenAI and open source AI models, custom software development, deployment (MLOps), consulting, recruiting services and hands-on training services. Our clients include Microsoft, Google, Broadcom, Thomson Reuters, Bank of America, Macquarie, Dell and more.
 

Imagine an AI assistant crafted specifically to understand your business, adapt to your needs, and learn continuously from your unique data - one you can simply ask questions and receive insightful answers from. This assistant isn't confined to any single industry; it's tailored to serve diverse sectors, from finance and healthcare to technology, adapting seamlessly to the specifics of each field. Built using advanced platforms like Azure Cosmos DB, this AI solution integrates deeply with your organization's data, enabling it to deliver precise, context-rich responses instantly.

Starting the day with a quick question to an AI Voice Assistant - "Can you summarize the latest insights for our project?" - professionals across industries can now imagine a system that delivers instant, customized answers based on their organization's unique data. From providing insights on market trends and patient treatments to summarizing campaign metrics, this AI solution turns routine tasks into quick, conversational exchanges, reducing the need for manual searches and complex data analysis. With our custom AI solutions, which go beyond mere conversation simulation to deeply integrate with your data, your team gains a transformative new way to access information and make fast, informed decisions.

Our Real-Time AI Voice Assistant, harnesses the power of Azure Cosmos DB's cutting-edge NoSQL vector search capabilities to deliver lightning-fast, context-rich responses. With enhanced integration and optimized search performance, this assistant can provide the insights you need exactly when you need them, drawing directly from your data to offer relevant, actionable information in real time. Here are some key features of this solution:

  • Azure Cosmos DB NoSQL Integration: Instead of relying on Azure AI Search, we've migrated to Azure Cosmos DB's NoSQL vector search functionality. This provides a more flexible and scalable data storage solution for our knowledge base.
  • Vector Indexing and Embedding Policies: The code now includes custom indexing and embedding policies for the Azure Cosmos DB container, optimizing the vector search performance based on the data type and distance function.
  • Robust Document Handling: The code now includes a more comprehensive process for handling PDF documents, including chunking the text into manageable segments and attaching metadata to each chunk.
  • Improved Search and Grounding Tools: The search and grounding tools have been refined to provide better-formatted results, making the AI assistant's responses more informative and user-friendly.

At the core of this solution lies Azure Cosmos DB, Microsoft's globally distributed, multi-model NoSQL database service. Azure Cosmos DB's unparalleled flexibility, scalability, and low-latency performance make it an ideal fit for powering the knowledge base of our AI assistant. Unlike traditional relational databases, Azure Cosmos DB is designed to handle large volumes of unstructured data with ease. By adopting a NoSQL approach, we can seamlessly store and retrieve the diverse range of documents, multimedia, and other data that comprise our assistant's comprehensive knowledge base. The standout feature of this solution is the integration of Azure Cosmos DB's advanced vector search capabilities. Vector search, a form of similarity-based query processing, allows our AI assistant to go beyond simple keyword matching and truly understand the semantic meaning of user queries.

Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model NoSQL database service provided by Microsoft. Unlike traditional relational databases, Azure Cosmos DB is designed to handle large volumes of unstructured data with ease. It offers several key benefits that make it an ideal choice for powering the knowledge base of our Real-Time AI Assistant:

  • Flexibility: Azure Cosmos DB supports multiple data models, including document, key-value, wide-column, and graph, allowing us to store and manage a diverse range of data types within the same database.
  • Scalability: Azure Cosmos DB is built to scale seamlessly, both in terms of storage and throughput, ensuring that our AI assistant's knowledge base can grow and accommodate increasing demands without performance degradation.
  • Global Distribution: Azure Cosmos DB's global distribution capabilities enable us to deploy our AI assistant's knowledge base across multiple regions, ensuring low-latency access and high availability for users around the world.
  • Automatic Indexing: Azure Cosmos DB automatically indexes all the data stored within its containers, allowing for lightning-fast query processing and retrieval, which is essential for real-time AI interactions.

DiskANN and QuantizedFlat Embeddings

To harness the full potential of vector search, we've implemented two specialized embedding types within our Azure Cosmos DB vector indexing and embedding policies: DiskANN and QuantizedFlat.

  • DiskANN (Disk-Aware Approximate Nearest Neighbors): is a state-of-the-art vector search algorithm developed by Microsoft Research. DiskANN is designed to efficiently perform large-scale similarity searches, even when the index data exceeds the available RAM. By intelligently managing the data between disk and memory, DiskANN delivers blazing-fast query response times, making it an ideal choice for our real-time AI assistant. The key advantages of DiskANN include:
    • Efficient disk-based storage and retrieval of high-dimensional vector data.
    • Ability to handle index sizes that exceed available RAM.
    • Optimized query processing that leverages both disk and memory resources.
    • Superior performance compared to traditional disk-based nearest neighbor search algorithms
  • QuantizedFlat: on the other hand, is a vector embedding technique that compresses the high-dimensional document representations into a more compact form. This compression allows for more efficient storage and faster query processing, without sacrificing too much accuracy. The QuantizedFlat approach is particularly well-suited for scenarios where memory or storage constraints are a concern. The benefits of QuantizedFlat include:
    • Reduced storage requirements for the vector embeddings.
    • Faster query processing times due to the smaller data footprint.
    • Minimal loss of accuracy compared to the original high-dimensional representations.
    • Improved scalability and performance for knowledge bases with large volumes of data.

By offering both DiskANN and QuantizedFlat as options for vector search and embedding, our AI assistant can adapt its strategy to the specific needs of the deployment environment. This flexibility ensures optimal performance regardless of the hardware constraints or the size of the knowledge base, making the solution more versatile and suitable for a wider range of use cases.

Comprehensive Document Handling and Knowledge Management

Beyond the vector search capabilities, our updated Real-Time AI Assistant features a robust document handling and knowledge management system. We leverage the `PDFPlumberLoader` from the `langchain-community` library to extract text from a wide range of file formats, including PDF, Microsoft Office documents, and more. To ensure our knowledge base is well-organized and easily searchable, we divide the extracted text into manageable chunks, each with its own unique metadata. This metadata, which includes the source filename, chunk index, and other relevant information, allows our AI assistant to provide context-rich responses and seamlessly cite the appropriate sources when presenting information to users.

Improved Search and Grounding Tools

The core of the AI assistant's functionality lies in its ability to search the knowledge base and provide relevant, well-grounded responses to user queries. The `_search_tool` function now includes the full content of the most relevant documents, rather than just their titles. This change allows the AI assistant to provide more detailed and contextual information in its responses, empowering users to better understand the reasoning and sources behind the provided answers. Similarly, the `_report_grounding_tool` has been updated to include the complete content of the cited sources. This enhancement enables users to delve deeper into the knowledge base and validate the information presented by the AI assistant, fostering a more transparent and trust-building interaction.

Unlocking the Future of Real-Time AI Interactions

By seamlessly integrating Azure Cosmos DB's advanced vector search capabilities, our Real-Time AI Assistant delivers a new level of intelligent, responsive, and context-aware communication. Whether addressing complex queries, providing in-depth analysis, or generating personalized insights, this solution empowers users to engage with AI in a more natural and productive manner. As we continue to push the boundaries of what's possible in the realm of human-AI collaboration, we're excited to see how this updated assistant will transform the way organizations and individuals interact with artificial intelligence. We invite you to explore the code, experiment with the capabilities, and share your feedback as we collectively shape the future of real-time AI interactions.

 

Video Demo: AI-Driven Voice Interaction

This embedded video showcases the AI assistant in action, featuring:

  • Voice-based conversations between a user and the AI assistant.
  • AI-generated responses that sound remarkably human.
  • Real-time query execution with results retrieved from Azure Cosmos DB - NoSQL database.
 

How it works?

Front-End:
  • Real-Time Voice Interaction (App.tsx): The core interaction happens in App.tsx, where the user can toggle between starting and stopping voice recording. Key features include:
    • Recording Toggle: Users can press a button to start or stop recording, activating or deactivating real-time voice input.
    • Grounding Files Display: The AI assistant returns document "grounding files" for context, which are displayed dynamically for users to explore.
    • Status Indicator: The recording button changes color to indicate recording status, enhancing the user experience by providing real-time feedback.
    • Integration with Azure Cosmos DB: The backend utilizes Cosmos DB’s NoSQL capabilities for quick vector searches, and the frontend fetches and displays these contextually relevant files.
  • WebSocket Communication (useRealTime Hook): The useRealTime hook manages the WebSocket connection between the front end and backend, ensuring a stable, low-latency pipeline for voice data and response messages.
    • Connection Setup: Depending on configuration, the WebSocket either connects directly to Azure OpenAI’s API or to a custom real-time endpoint.
    • Session Control: A session is started with parameters that define transcription models and Voice Activity Detection (VAD) settings, allowing flexibility in response handling.
    • Real-Time Response Handling: The hook defines callback handlers for various response types, including:
      • response.audio.delta: Updates the UI with audio data in real time, enhancing responsiveness.
      • response.done: Indicates that the AI response has completed, which can trigger the end of playback or display of final results.
      • input_audio_buffer.speech_started: Signals that voice input has been detected and that the system should pause audio playback if ongoing.
  • Grounding Files Management (App.tsx): In conjunction with the real-time responses, the application provides contextually relevant document files, or "grounding files," from the backend.
    • Grounding Files Display: The GroundingFiles component lists the received files, while GroundingFileView displays details of a selected file.
    • File Processing: The backend processes large documents into manageable chunks, attaches metadata, and uses vector embeddings (DiskANN and QuantizedFlat) to retrieve the most relevant chunks. The frontend appends these grounding files in real time, giving users context for the assistant’s responses.
  • Audio Playback (useAudioPlayer Hook and player.ts): The useAudioPlayer hook and Player class handle the playback of audio responses from the backend, enabling real-time audio feedback for users.
    • Initialization and Playback:
      • The init method sets up an AudioContext and an AudioWorkletNode to handle playback.
      • The play method receives base64-encoded audio data, decodes it to PCM format, and sends it to the AudioWorkletNode, playing the audio in real time.
    • Stopping Playback: The stop method halts playback, signaling the end of an audio response or preparing for the next recording session.
 
Backend:
  • Azure Cosmos DB Integration: The code now uses the `AzureCosmosDBNoSqlVectorSearch` class from the `langchain-community` library to interface with the Azure Cosmos DB database. This class provides a seamless integration between the AI assistant's knowledge base and the Azure Cosmos DB NoSQL data store, enabling efficient storage and retrieval of document embeddings.
  • Vector Indexing and Embedding Policies: The code defines custom indexing and embedding policies for the Azure Cosmos DB container, tailored to the specific requirements of the AI assistant's knowledge base. The `get_vector_indexing_policy` and `get_vector_embedding_policy` functions specify the indexing mode, included and excluded paths, and the vector embedding configuration (data type, dimensions, and distance function). These policies ensure that the Azure Cosmos DB container is optimized for fast and accurate vector-based searches, which are crucial for the AI assistant's real-time query processing. The policies can be further customized based on the specific requirements of the knowledge base, such as the data type, embedding dimensions, and distance function used for similarity calculations.
  • Automated Database and Container Management: The `check_and_create_cosmosdb_database_container` function checks for the existence of the Azure Cosmos DB database and container, and creates them if necessary. This automation simplifies the deployment process and ensures that the AI assistant's knowledge base is properly set up in the Azure Cosmos DB environment.
  • Robust Document Handling: The `add_pdf_documents` function processes PDF files from a specified directory, extracting the text and breaking it into manageable chunks. Each chunk is assigned metadata, including a unique title, to aid in the retrieval and contextualization of the information. This document handling process ensures that the AI assistant has access to a comprehensive and well-organized knowledge base, which is essential for providing accurate and relevant responses.
  • Improved Search and Grounding Tools: The `_search_tool` and `_report_grounding_tool` functions have been updated to provide better-formatted results. The search tool now includes the content of the most relevant documents, rather than just their titles, making the AI assistant's responses more informative. The grounding tool now includes the full content of the cited sources, allowing the user to better understand the context and reasoning behind the AI assistant's responses.

The updated Real-Time AI Assistant leverages the power of Azure Cosmos DB's vector search capabilities to deliver more efficient, accurate, and user-friendly interactions. The combination of Azure Cosmos DB's scalable data storage and the AI assistant's natural language processing abilities enables a seamless and intelligent communication experience for users. 

We encourage you to explore the code, try out the AI assistant, and provide feedback to help us further enhance this exciting solution.

Visit our GitHub: https://github.com/cazton/CaztonVoiceRagNoSqlApi