RAGatouille
RAGatouille makes it as simple as can be to use
ColBERT
! ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.See the ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction paper.
There are multiple ways that we can use RAGatouille.
Setup
The integration lives in the ragatouille
package.
pip install -U ragatouille
from ragatouille import RAGPretrainedModel
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
[Jan 10, 10:53:28] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
``````output
/Users/harrisonchase/.pyenv/versions/3.10.1/envs/langchain/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py:125: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn(
Retriever
We can use RAGatouille as a retriever. For more information on this, see the RAGatouille Retriever
Document Compressor
We can also use RAGatouille off-the-shelf as a reranker. This will allow us to use ColBERT to rerank retrieved results from any generic retriever. The benefits of this are that we can do this on top of any existing index, so that we don't need to create a new idex. We can do this by using the document compressor abstraction in LangChain.