Serverless RAG on AWS

Introduction

In the context of rapidly advancing artificial intelligence (AI), businesses are seeking solutions to integrate up-to-date information into large language models (LLMs) to create AI applications that deliver accurate and contextually relevant responses. A startup faced the challenge of developing a smart Q&A application capable of providing answers based on internal data without managing complex infrastructure. They chose to implement a Retrieval Augmented Generation (RAG) serverless solution on Amazon Web Services (AWS).

Challenge

The goal was to build a Q&A application allowing employees to query information from internal documents, such as HR policies, product guidelines, or internal reports. However, they encountered the following issues:
  • Outdated Information: Traditional LLMs are trained on static data, unable to incorporate the latest information from internal documents.
  • High Operational Costs: Maintaining server infrastructure for storing and processing vector data for RAG was costly and complex for a startup.
  • Complex Integration: Building a complete RAG pipeline, from document ingestion to response generation, required integrating multiple technologies and services.

Solution

The startup adopted the serverless RAG solution described in the AWS repository, utilizing AWS services such as Amazon Lambda, Amazon Bedrock, Amazon S3, and LanceDB to create a fully serverless RAG pipeline. The process included the following key steps:

1. Document Ingestion and Processing

  • Amazon S3: The company stored internal documents (PDFs, HTML, text) in an S3 bucket. Documents were automatically ingested upon upload through an event-driven mechanism.
  • Amazon Lambda: A Lambda function was triggered to process documents, extracting text content and converting it into vector embeddings using the Amazon Titan Text Embeddings v2 model.
  • LanceDB: The embeddings were stored in LanceDB, a serverless vector database backed by S3, ensuring efficiency and low cost for storage and retrieval.

2. Retrieval and Response Generation

  • When a user submitted a query through the application interface, the query was converted into a vector embedding using the Amazon Titan model.
  • LanceDB performed a similarity search to identify the most relevant documents based on the query’s embedding.
  • The relevant documents were combined with the original query to form an augmented prompt, which was then sent to the Anthropic Claude v2 model on Amazon Bedrock to generate accurate and contextually relevant responses.

3. User Interface

  • The application used Vite React for the front-end interface, hosted on Amazon CloudFront for fast delivery. The appconfig.json file contained public information such as the Lambda URL, WebSocket, and credentials from Amazon Cognito for backend communication.
  • Users could access the application via a provided URL (e.g., https://dxxxxxxxxxxx.cloudfront.net) and log in using Amazon Cognito, with strict security requirements, including a minimum 8-character password policy requiring numbers, uppercase, lowercase, and special characters.

Architecture

Results

After implementing the serverless RAG solution, the startup achieved significant outcomes:
  • Accurate and Up-to-Date Responses: The application could answer questions about HR policies, such as “How many leave days are employees entitled to annually?”, by retrieving relevant documents from S3 and providing accurate responses based on the latest internal data.
  • Cost Efficiency: With the pay-per-use pricing of Amazon Bedrock and LanceDB, the company only paid for the resources used. Processing a 1MB document cost less than half a cent, significantly reducing expenses compared to maintaining traditional infrastructure.
  • Rapid Deployment: Using AWS CloudFormation, the entire infrastructure was deployed in minutes, allowing the business to focus on application development rather than server management.
  • Scalability: The serverless solution automatically scaled with demand, ensuring consistent performance even as the number of documents or users increased.

Features

  • Chat Playground Interact with LLMs and inspect retrieved documents.
  • Chat History Management Manage your chat history, select which messages are to be forwarded or add messages to test and debug your prompts.
  • Serverless Knowledge Base This sample makes use of LanceDB and S3 as vector database. With this configuration, you'll only pay for the storage you use and you won't have to manage additional infrastructure.
  • Dynamic Prompt Management Users can override the default system prompt by specifying new prompts in the settings.

Conclusion

By adopting the serverless RAG solution from the AWS Samples repository, the startup successfully built a cost-effective, scalable, and intelligent Q&A application. This solution not only improved the accuracy of AI responses but also alleviated the burden of infrastructure management, enabling the company to focus on innovation and product development. The project demonstrates the power of AWS’s serverless architecture and AI services in empowering startups to build advanced AI applications.
6/12/2025
bia.png
NEW
AI for Procurement: Optimizing Vietnam's Largest IT Giant
Apply AI-Model to optimize the identification and classification of products (based on analysis, product details, unit price, unit of measurement, etc.)
thumb.png
NEW
Synthesis Data
"Use a large language model (LLM) to identify descriptive features from a reference image. Based on these features, users can customize specific attributes and use the customized information to search for or generate images with similar characteristics."
thumb.png
NEW
Crawler and Extract Information
Crawl data from websites and use a large language model (LLM) to extract and summarize information that aligns with the user's needs.
QaiDora Products
QaiDora draws inspiration from the myth of Pandora’s box—a symbol of unexpected possibilities and hope. For us, AI models are like modern Pandora’s boxes, holding untapped potential to turn challenges into opportunities. At QAI, QaiDora serves as an ecosystem of AI products designed to drive innovation and deliver competitive advantages.
Trusted by
Contact us
Copyright by qaidora.com