Spaces:

agnedil
/

rag-demo-with-gradio

Sleeping

App Files Files Community

Andrew commited on Feb 23, 2024

Commit

30eced7

1 Parent(s): e8fc33c

Initial commit

Browse files

Files changed (6) hide show

.gitignore +1 -0
README.md +42 -13
advanced_rag.py +124 -0
app.py +92 -0
packages.txt +2 -0
requirements.txt +20 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ **/.DS_Store

README.md CHANGED Viewed

@@ -1,13 +1,42 @@
----
-title: Rag Demo With Gradio
-emoji: 👀
-colorFrom: yellow
-colorTo: green
-sdk: gradio
-sdk_version: 4.19.2
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Advanced RAG System
+This repository contains the code for a Gradio web app that demoes a Retrieval-Augmented Generation (RAG) system. This app is designed to allow users to load multiple documents of their choice into a vector database, submit queries, and receive answers generated by a sophisticated RAG system that leverages the latest advancements in natural language processing and information retrieval technologies.
+## Features
+#### 1. Dynamic Processing
+- Users can load multiple source documents of their choice into a vector store in real-time.
+- Users can submit queries which are processed in real-time for enhanced retrieval and generation.
+#### 2. PDF Integration
+- The system allows for the loading of multiple PDF documents into a vector store, enabling the RAG system to retrieve information from a vast corpus.
+#### 3. Advanced RAG System
+Integrates various components, including:
+- **UI**: Allows users to input URLs for documents and then input user queries; displays the LLM response.
+- **Document Loader**: Loads documents from URLs.
+- **Text Splitter**: Chunks loaded documents.
+- **Vector Store**: Embeds text chunks and adds them to a FAISS vector store; embeds user queries.
+- **Retrievers**: Uses an ensemble of BM25 and FAISS retrievers, along with a Cohere reranker, to retrieve relevant document chunks based on user queries.
+- **Language Model**: Utilizes a Llama 2 large language model for generating responses based on the user query and retrieved context.
+#### 4. PDF and Query Error Handling
+- Validates PDF URLs and queries to ensure that they are not empty and that they are valid.
+- Displays error messages for empty queries or issues with the RAG system.
+#### 5. Refresh Mechanism
+- Instructs users to refresh the page to clear / reset the RAG system.
+## Installation
+To run this application, you need to have Python and Gradio installed. Follow these steps:
+1. Clone this repository to your local machine.
+2. Create and activate a virtual environment of your choice (venv, conda, etc.).
+3. Install dependencies from the requirements.txt file by running `pip install -r requirements.txt`.
+4. Set up environment variables REPLICATE_API_TOKEN (for a Llama 2 model hosted on replicate.com) and COHERE_API_KEY (for embeddings and reranking service on cohere.com)
+4. Start the Gradio app by running `python rag_gradio_app.py`.
+## Licence
+MIT license

advanced_rag.py ADDED Viewed

	@@ -0,0 +1,124 @@

+import os
+os.environ["TOKENIZERS_PARALLELISM"] = "false"
+from typing import List
+from langchain_community.llms import Replicate    # importing from langchain depricated; use langchain_community for several modules here
+from langchain_community.document_loaders import OnlinePDFLoader
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.vectorstores import FAISS
+from langchain_community.embeddings import CohereEmbeddings
+from langchain_community.retrievers import BM25Retriever
+from langchain.retrievers import EnsembleRetriever
+from langchain.retrievers import ContextualCompressionRetriever
+from langchain.retrievers.document_compressors import CohereRerank
+from langchain.prompts import ChatPromptTemplate
+from langchain.schema import StrOutputParser
+from langchain_core.runnables import RunnableParallel, RunnablePassthrough
+class ElevatedRagChain:
+    '''
+    Class ElevatedRagChain integrates various components from the langchain library to build
+    an advanced retrieval-augmented generation (RAG) system designed to process documents
+    by reading in, chunking, embedding, and adding their chunk embeddings to FAISS vector store
+    for efficient retrieval. It uses the embeddings to retrieve relevant document chunks
+    in response to user queries.
+    The chunks are retrieved using an ensemble retriever (BM25 retriever + FAISS retriver)
+    and passed through a Cohere reranker before being used as context
+    for generating answers using a Llama 2 large language model (LLM).
+    '''
+    def __init__(self) -> None:
+        '''
+        Initialize the class with predefined model, embedding function, weights, and top_k value
+        '''
+        self.llama2_70b   = 'meta/llama-2-70b-chat:2d19859030ff705a87c746f7e96eea03aefb71f166725aee39692f1476566d48'
+        self.embed_func   = CohereEmbeddings(model="embed-english-light-v3.0")
+        self.bm25_weight  = 0.6
+        self.faiss_weight = 0.4
+        self.top_k        = 5
+    def add_pdfs_to_vectore_store(
+            self,
+            pdf_links: List,
+            chunk_size: int=1500,
+            ) -> None:
+        '''
+        Processes PDF documents by loading, chunking, embedding, and adding them to a FAISS vector store.
+        Build an advanced RAG system
+        Args:
+            pdf_links (List): list of URLs pointing to the PDF documents to be processed
+            chunk_size (int, optional): size of text chunks to split the documents into, defaults to 1500
+        '''
+        # load pdfs
+        self.raw_data = [ OnlinePDFLoader(doc).load()[0] for doc in pdf_links ]
+        # chunk text
+        self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=100)
+        self.split_data    = self.text_splitter.split_documents(self.raw_data)
+        # add chunks to BM25 retriever
+        self.bm25_retriever   = BM25Retriever.from_documents(self.split_data)
+        self.bm25_retriever.k = self.top_k
+        # embed and add chunks to vectore store
+        self.vector_store     = FAISS.from_documents(self.split_data, self.embed_func)
+        self.faiss_retriever  = self.vector_store.as_retriever(search_kwargs={"k": self.top_k})
+        print("All PDFs processed and added to vectore store.")
+        # build advanced RAG system
+        self.build_elevated_rag_system()
+        print("RAG system is built successfully.")
+    def build_elevated_rag_system(self) -> None:
+        '''
+        Build an advanced RAG system from different components:
+        * BM25 retriever
+        * FAISS vector store retriever
+        * Llama 2 model
+        '''
+        # combine BM25 and FAISS retrievers into an ensemble retriever
+        self.ensemble_retriever = EnsembleRetriever(
+            retrievers=[self.bm25_retriever, self.faiss_retriever],
+            weights=[self.bm25_weight, self.faiss_weight]
+        )
+        # use reranker to improve retrieval quality
+        self.reranker = CohereRerank(top_n=5)
+        self.rerank_retriever = ContextualCompressionRetriever(    # combine ensemble retriever and reranker
+            base_retriever=self.ensemble_retriever,
+            base_compressor=self.reranker,
+        )
+        # define prompt template for the language model
+        RAG_PROMPT_TEMPLATE = """\
+        Use the following context to provide a detailed technical answer the user's question.
+        Do not use an introduction similar to "Based on the provided documents, ...", just answer the question.
+        If you don't know the answer, please respond with "I don't know".
+        Context:
+        {context}
+        User's question:
+        {question}
+        """
+        self.rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT_TEMPLATE)
+        self.str_output_parser = StrOutputParser()
+        # parallel execution of context retrieval and question passing
+        self.entry_point_and_elevated_retriever = RunnableParallel(
+            {
+                "context" : self.rerank_retriever,
+                "question" : RunnablePassthrough()
+            }
+        )
+        # initialize Llama 2 model with specific parameters
+        self.llm = Replicate(
+            model=self.llama2_70b,
+            model_kwargs={"temperature": 0.5,"top_p": 1, "max_new_tokens":1000}
+        )
+        # chain components to form final elevated RAG system using LangChain Expression Language (LCEL)
+        self.elevated_rag_chain = self.entry_point_and_elevated_retriever | self.rag_prompt | self.llm #| self.str_output_parser

app.py ADDED Viewed

	@@ -0,0 +1,92 @@

+import gradio as gr
+from advanced_rag import ElevatedRagChain
+rag_chain = ElevatedRagChain()
+def load_pdfs(pdf_links):
+    if not pdf_links:
+        gr.Warning("Please enter non-empty URLs")
+        return "Please enter non-empty URLs"
+    try:
+        pdf_links = pdf_links.split("\n")  # get individual PDF links
+        rag_chain.add_pdfs_to_vectore_store(pdf_links)
+        gr.Info("PDFs loaded successfully into a new vector store. If you had an old one, it was overwritten.")
+        return "PDFs loaded successfully into a new vector store. If you had an old one, it was overwritten."
+    except Exception as e:
+        gr.Warning("Could not load PDFs. Are URLs valid?")
+        print(e)
+        return "Could not load PDFs. Are URLs valid?"
+def submit_query(query):
+    if not query:
+        gr.Warning("Please enter a non-empty query")
+        return "Please enter a non-empty query"
+    if hasattr(rag_chain, 'elevated_rag_chain'):
+        try:
+            response = rag_chain.elevated_rag_chain.invoke(query)
+            return response
+        except Exception as e:
+            gr.Warning("LLM error. Please re-submit your query")
+            print(e)
+            return "LLM error. Please re-submit your query"
+    else:
+        gr.Warning("Please load PDFs before submitting a query")
+        return "Please load PDFs before submitting a query"
+def reset_app():
+    global rag_chain
+    rag_chain = ElevatedRagChain()  # Re-initialize the ElevatedRagChain object
+    gr.Info("App reset successfully. You can now load new PDFs")
+    return "App reset successfully. You can now load new PDFs"
+# custom css for different age elements
+custom_css = """
+// customize button
+button {
+    background-color: grey !important;
+    font-family: Arial !important;
+    font-weight: bold !important;
+    color: blue !important;
+}
+// customize background color and use it as "app = gr.Blocks(css=custom_css)"
+//.gradio-container {background-color: #E0F7FA}
+"""
+# Define the Gradio app using Blocks for a flexible layout
+app = gr.Blocks(css=custom_css)    # theme=gr.themes.Base(), Soft(), Default(), Glass(), Monochrome(): https://www.gradio.app/guides/theming-guide
+with app:
+    gr.Markdown('''# Query your own data
+## Llama 2 RAG
+- Type in one or more URLs for PDF files - one per line and click on Load PDFs. Wait until the RAG system is built.
+- Type your query and click on Submit Query. Once the LLM sends back a reponse, it will be displayed in the Reponse field.
+- The system "remembers" the source documents, but has no memory of past user queries.
+- Click on Reset App to clear / reset the RAG system
+    ''')
+    with gr.Row():
+        with gr.Column():
+            pdf_input = gr.Textbox(label="Enter your PDF URLs (one per line)", placeholder="Enter one URL per line", lines=4)
+            load_button = gr.Button("Load PDF")
+        with gr.Column():
+            query_input = gr.Textbox(label="Enter your query here", placeholder="Type your query", lines=4)
+            submit_button = gr.Button("Submit")
+    response_output = gr.Textbox(label="Response", placeholder="Response will appear here", lines=4)
+    reset_button = gr.Button("Reset App")
+    load_button.click(load_pdfs, inputs=pdf_input, outputs=response_output)
+    submit_button.click(submit_query, inputs=query_input, outputs=response_output)
+    reset_button.click(reset_app, inputs=None, outputs=response_output)
+# Run the app
+app.launch()

packages.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ libgl1
2	+ poppler-utils

requirements.txt ADDED Viewed

	@@ -0,0 +1,20 @@

+langchain==0.1.6
+langchain-community==0.0.19
+langchain_core==0.1.22
+langchain-openai==0.0.5
+faiss-cpu==1.7.3
+huggingface-hub==0.20.1
+google-generativeai==0.3.2
+cohere==4.46
+openai==1.11.1
+opencv-python==4.9.0.80
+pdf2image==1.17.0
+pdfminer-six==20221105
+pikepdf==8.12.0
+pypdf==4.0.1
+rank-bm25==0.2.2
+replicate==0.23.1
+tiktoken==0.5.2
+unstructured==0.12.3
+unstructured-pytesseract==0.3.12
+unstructured-inference==0.7.23