In the world of artificial intelligence, the integration of advanced language models and information databases is critical to creating powerful and intelligent applications. In this article we will see a tutorial on how to develop a Retrieval-Augmented Generation (RAG) system using LlamaIndex, OpenAI and Qdrant.
Prerequisites
We have already discussed in previous articles about:
We will therefore start this tutorial on LlamaIndex assuming that we know the concepts behind the framework and that Qdrant is already installed locally (necessary for the second part of the tutorial).
LlamaIndex is distributed in two versions for Python and Node. In our tutorial we will use the Python version so it is important to have the language interpreter already installed before starting. If it is not already present, you can download it at the following link: https://www.python.org/downloads/
Installation
LlamaIndex can be installed via the Python package manager by running the command in the console:
pip install llama-index
Furthermore, since we will use OpenAi as LLM, it is necessary to enhance the environment variable with the relevant API key:
export OPENAI_API_KEY=XXXXX
The key can be generated from the OpenAI console:
LlamaIndex tutorial with OpenAI
Once the framework installation and OpenAI configuration are complete, we can start developing a first elementary use case.
As an example we will try to create a RAG Agent that is able to provide answers to the user based on the information contained in a PDF document. More specifically, we will use a PDF containing the export of our previous article on the RAG architecture (which you can download using the appropriate browser functionality).
First we will create a folder on the filesystem that contains:
- A “docs” folder with the PDF file inside
- A “test.py” file that will contain the Python code of our program.
The code of our first program will be as follows:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
docs = SimpleDirectoryReader("docs").load_data()
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
print('Enter your question:')
q = input()
response = query_engine.query(q)
print(response)
Let’s now try to comment it to explain how the framework works:
- First of all, a document extractor of the SimpleDirectoryReader type is initialized which reads all the documents present in the “docs” folder. The documents are then loaded into memory and normalized.
- The VectorStoreIndex is initialized which transforms the documents loaded in the previous step into vectors (embeddings). These vectors are saved in memory together with a set of metadata which can then be used to perform searches.
- The index is then transformed into a Q&A engine which is the heart of the RAG agent.
The subsequent instructions do nothing more than receive a question via user input and process it through the Q&A engine to obtain an answer to be returned in output.
As you can see, with very few instructions we have created a program capable of providing contextualized answers based on one or more documents provided as input. This simple example allows us to fully appreciate the power of LlamaIndex.
To test the program simply run it in the console using the command:
python3 test.py
Of course, if you are using a version of Python other than 3 you will need to modify the command accordingly.
Trying to ask the agent “what is a RAG architecture?” should be able to provide a satisfactory answer.
Save embeddings in memory
The code in the previous example presents a fairly obvious problem in that at each execution it will recalculate the vectors of all the documents passed as input. Fortunately LlamaIndex allows you to save embeddings in a vectorstore to avoid having to recalculate them every time. By doing this you can save a lot of tokens.
To use the in-memory vectorstore we will create a “test2.py” program as follows:
import os.path
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
StorageContext,
load_index_from_storage,
)
VECTORSTORE_DIR = "data";
if not os.path.exists(VECTORSTORE_DIR):
documents = SimpleDirectoryReader("docs").load_data()
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir=VECTORSTORE_DIR)
else:
storage_context = StorageContext.from_defaults(persist_dir=VECTORSTORE_DIR)
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()
print('Enter your question:')
q = input()
response = query_engine.query(q)
print(response)
By executing it we will notice that a new folder will be created within our work environment in which we will find the JSON files that make up the generated index.
The main change we made to the code was the use of the “index.storage_context.persist(…)” statement which allows you to save the index in a local folder. The program then recalculates the embeddings only if it does not find this folder already present in the filesystem.
Integration with vector databases
The approach of saving data in memory can be convenient and allows the rapid development of RAG applications but when the data starts to become a lot it is necessary to adopt a better structured solution. Luckily LlamaIndex supports integration with numerous vectorstores.
In this tutorial we will use Qdrant which we have already seen on other occasions. In order to use it, you need to install the appropriate Python package that acts as a connector:
pip install llama-index-vector-stores-qdrant
To save the vectors inside the vectorstore, simply tell the VectorStoreIndex that it must use the QdrantVectorStore to persist the documents:
import qdrant_client
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
StorageContext,
)
from llama_index.vector_stores.qdrant import QdrantVectorStore
client = qdrant_client.QdrantClient(
host="localhost",
port=6333
)
vector_store = QdrantVectorStore(client=client, collection_name="llamaindex_collection")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
documents = SimpleDirectoryReader("docs").load_data()
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
)
Once again the code is very simple and intuitive. Obviously we had to define a QdrantClient indicating the server host and the connection port (6333 by default). We have also defined a collection “llamaindex_collection” to save the embeddings generated by the program.
By running this program and accessing the Qdrant console (http://localhost:6333/dashboard) you will be able to verify the loading of the index.
Once the documents have been saved in the vectorstore you can use them as in this last example:
import qdrant_client
from llama_index.core import (
VectorStoreIndex,
)
from llama_index.vector_stores.qdrant import QdrantVectorStore
client = qdrant_client.QdrantClient(
host="localhost",
port=6333
)
vector_store = QdrantVectorStore(client=client, collection_name="llamaindex_collection")
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
query_engine = index.as_query_engine()
print('Enter your question:')
q = input()
response = query_engine.query(q)
print(response)
At the end of the tutorial we can say that we have put into practice everything needed to implement a RAG Agent. The simplicity of the code and the speed with which we developed it is truly amazing and confirms that LlamaIndex is a very powerful and versatile framework.