1. Indexing
Indexing prepares the knowledge base for efficient retrieval by processing and storing documents in a structured format.
Load
- Purpose: Ingest raw data from various sources and normalize it for further processing.
- Common Data Sources:
- Structured: Databases, APIs
- Semi-structured: JSON, XML, CSV
- Unstructured: PDFs, DOCX, HTML, scraped web pages
- Best Practices:
- Extract metadata (e.g., author, timestamps) to enhance retrieval.
- Normalize text (remove special characters, fix encoding issues).
- Detect language and preprocess accordingly (stemming, lemmatization).
Split
- Purpose: Break documents into manageable chunks for improved retrieval accuracy.
- Splitting Strategies:
- Fixed-size chunking (e.g., 512 tokens per chunk).
- Overlapping chunks (ensures context continuity).
- Semantic chunking (splitting based on topic shifts or sentence boundaries).
- Best Practices:
- Ensure chunks are not too large (to avoid irrelevant retrieval) or too small (to retain context).
- Use metadata tags to track chunk origin and relationships.
Store
- Purpose: Convert text into vector embeddings and store them in a vector database for efficient similarity search.
- Key Components:
- Embeddings Model: OpenAI, Cohere, Sentence-BERT, etc.
- Vector Store: FAISS, Pinecone, Weaviate, ChromaDB, Qdrant.
- Best Practices:
- Use dense embeddings (transformer-based models) for semantic search.
- Enable hybrid search (combine vector-based and keyword-based search).
- Perform periodic re-indexing to keep data up to date.
2. Retrieval and Generation
This stage retrieves relevant information and generates context-aware responses.
Retrieve
- Purpose: Fetch the most relevant documents from the indexed knowledge base.
- Retrieval Strategies:
- Dense Retrieval (Vector Search)
- Uses embeddings and similarity search (e.g., cosine similarity, dot product).
- Works well for semantic understanding of queries.
- Sparse Retrieval (Keyword Search)
- Uses traditional NLP-based methods (e.g., BM25, TF-IDF).
- Effective for precise keyword-based matching.
- Hybrid Retrieval (Dense + Sparse)
- Combines keyword-based and vector search for more robust results.
- Example: Re-rank BM25 results using embeddings.
- Cross-encoder Re-ranking
- Uses an additional transformer model to re-rank retrieved documents.
- Ensures higher relevance before passing results to the LLM.
- Dense Retrieval (Vector Search)
- Best Practices:
- Use query expansion (synonyms, paraphrasing) to improve recall.
- Implement multi-stage retrieval (fast initial retrieval → fine-grained re-ranking).
- Optimize for latency vs. accuracy (re-ranking adds computation time).
Understanding Vector Search and Distance Metrics in Vector Search
Generate
- Purpose: Generate a context-aware response by combining retrieved information with the user query.
- Steps:
- Construct a prompt that integrates retrieved documents.
- Pass the prompt to the LLM for response generation.
- Apply post-processing (hallucination detection, formatting).
- Best Practices:
- Use prompt engineering (e.g., retrieval-augmented templates).
- Implement response validation (checking for factual consistency).
- Provide citations (linking generated content back to sources).
Commonly used Embedding Models
Model | Organization | Dimensions | Context Length | Link |
---|---|---|---|---|
text-embedding-3-large | OpenAI | 3072 | 8191 | OpenAI Documentation |
text-embedding-3-small | OpenAI | 1536 | 8191 | OpenAI Documentation |
text-embedding-ada-002 | OpenAI | 1536 | 8191 | OpenAI Documentation |
sentence-transformers | Hugging Face | Varies | Varies | GitHub |
BERT | 768/1024 | 512 | GitHub | |
E5 | Microsoft | 1024 | 512 | GitHub |
GTE | Alibaba | 768 | 512 | GitHub |
BGE | BAAI | 768/1024 | 512 | GitHub |
INSTRUCTOR | Hugging Face | 768 | 512 | GitHub |
SGPT | Hugging Face | 768/1024 | 512 | GitHub |
MPNet | Microsoft | 768 | 512 | Hugging Face |
SBERT | UKP Lab | 768 | 512 | GitHub |
SimCSE | Princeton NLP | 768 | 512 | GitHub |
CLIP | OpenAI | 512 | 77 | GitHub |
Claude Embeddings | Anthropic | 1024/12288 | 8K/200K | Anthropic Documentation |
Cohere Embeddings | Cohere | 1024/4096 | 512/8192 | Cohere Documentation |