RAG vs. Fine-Tuning Cover.

Retrieval Augmented Generation (RAG) Pipeline

1. Indexing Indexing prepares the knowledge base for efficient retrieval by processing and storing documents in a structured format. Load Purpose: Ingest raw data from various sources and normalize it for further processing. Common Data Sources: Structured: Databases, APIs Semi-structured: JSON, XML, CSV Unstructured: PDFs, DOCX, HTML, scraped web pages Best Practices: Extract metadata (e.g., author, timestamps) to enhance retrieval. Normalize text (remove special characters, fix encoding issues). Detect language and preprocess accordingly (stemming, lemmatization). Split Purpose: Break documents into manageable chunks for improved retrieval accuracy. Splitting Strategies: Fixed-size chunking (e.g., 512 tokens per chunk). Overlapping chunks (ensures context continuity). Semantic chunking (splitting based on topic shifts or sentence boundaries). Best Practices: Ensure chunks are not too large (to avoid irrelevant retrieval) or too small (to retain context). Use metadata tags to track chunk origin and relationships. Documents chunking strategies and best practices ...

May 10, 2023 路 3 min 路 Da Zhang

How to Choose a Vector Database?

Introduction Vector databases (or vector stores) are specialized database systems designed to store, manage, and efficiently query high-dimensional vector embeddings. Unlike traditional databases that excel at exact matching, vector databases are optimized for similarity search鈥攆inding items that are conceptually similar rather than identical. These databases have become a critical component in the machine learning ecosystem, particularly for applications like semantic search, recommendation systems, and Retrieval Augmented Generation (RAG) pipelines with large language models. ...

April 12, 2023 路 3 min 路 Da Zhang

Essential LLM Concepts

Concept Description AGI Artificial General Intelligence (AGI), the point at which AI matches or exceeds the intelligence of humans. Generative AI AI systems that create new content rather than just analyzing existing data. Foundation Models Large pre-trained models that serve as the base for various applications. Architecture Structural design of the model. Most modern LLMs use Transformer architectures with attention mechanisms. Attention Mechanisms Components allowing models to weigh importance of different words when generating text. Tokens Basic units LLMs process; can be words, parts of words, or characters. Tokenization The process of breaking text into tokens. Parameters Learnable weights in the neural network that determine model capabilities. More parameters (measured in billions) generally mean more knowledge and abilities. Context Window Maximum amount of text (measured in tokens) an LLM can consider at once. .safetensors Secure file format for storing model weights that prevents arbitrary code execution during loading. Completion/Response Text generated by the LLM in response to a prompt. Temperature Setting that controls randomness in responses鈥攈igher values produce more creative outputs. Prompt Input text given to an LLM to elicit a response. Prompt Engineering Skill of crafting effective prompts to get desired results from LLMs. Few-shot Learning Providing examples within a prompt to guide the model toward specific response formats. Instruction Tuning Training models to follow specific instructions rather than just predicting next words. Hallucination Hallucination in Large Language Models (LLMs) refers to when the model generates false, misleading, or non-factual information that sounds plausible but is incorrect. Embeddings Vector representations of words/text that capture semantic meaning and relationships. RAG (Retrieval-Augmented Generation) Enhancing LLM responses by retrieving relevant information from external sources. Training The process of teaching an AI model by feeding it data and adjusting its parameters. Inference Process of generating text from the model (as opposed to training). Fine-tuning Process of adapting pre-trained models to specific tasks using additional training data. RLHF (Reinforcement Learning from Human Feedback) Training technique to align LLMs with human preferences and improve safety. Epoch The number of times a model training process looked through a full data set of images. E.g. The 5th Epoch of a Checkpoint model looked five times through the same data set of images. float16 Half Precision, 16-bit float32 Full Precision, 32-bit

February 3, 2023 路 2 min 路 Da Zhang

What are LLMs?

Introduction OpenAI released an early demo of ChatGPT on November 30, 2022, bringing sophisticated artificial intelligence into the hands of everyday users. These releases are powered by Large Language Models (LLMs). What exactly are LLMs, and why should we care? What are LLMs? Large Language Models (LLMs) are a type of artificial intelligence designed to understand, generate, and manipulate human language. LLMs are trained on massive amounts of text data鈥攂ooks, websites, and other written content. They鈥檝e learned to recognize patterns in language and can generate new text that is not only contextually relevant but also surprisingly coherent and creative. ...

January 26, 2023 路 3 min 路 Da Zhang

A better 1 Cup V60 Technique

Recipe 15g ground coffee 250g soft, filtered water, freshly boiled (for lighter roasts) Grind: medium-fine 0m00s: Pour 50g of water to bloom 0m10s - 0m15s: Gently Swirl 0m00s - 0m45s: Bloom 0m45s - 1m00s: Pour up to 100g total (40% total weight) 1m00s - 1m10s: Pause 1m10s - 1m20s: Pour up to 150g total (60% total weight) 1m20s - 1m30s: Pause 1m30s - 1m40s: Pour up to 200g total (80% total weight) ...

December 3, 2022 路 1 min 路 Da Zhang