Posted on:

Google Colab is a free, browser-based coding environment and a real asset for those, like me, who have scarce GPU resources. I use Colab and its Jupyter Notebook platform to run AI pipelines. You'll find my Colab Notebook cells, written in Python, on GitHub, plus some test documents.

Features

While this script is similar to the one I wrote about in Gemini RAG Pipeline (Basic), I improved the original pipeline by adding these features:

Prerequisites

You need a Google account with access to Google Colab, a Gemini API key, and source documents (DITA, Markdown, HTML) stored in Google Drive.

Note: The GitHub repo has sample documents that you can upload to your Google Drive.

Getting Started

Follow these steps to set up and run your notebook.

Update: As of November 13, 2025, Google provides a VS Code extension that allows you to run Colab from within a local instance of VS Code. As VS Code is my IDE of choice, I took the extension for a test spin with high hopes. I found it to be buggy, so I'll still use the browser-based Colab platform for now, but I expect I'll be running my Colab notebook pipelines from VS Code when the extension matures.

1. Create Your Google Colab Notebook

Go to Google Colab, then create a notebook via the File menu.

2. Set Up API Key

The code requires a Gemini API Key to communicate with the model. Store this key in Colab's Secrets tool.

  1. Create your key in Google AI Studio.
  2. Copy your key for use later.
  3. In your Colab notebook, look for the Secrets tab in the left sidebar.
  4. Click the + icon to add a new secret.
  5. Set the Name to: GEMINI_API_KEY.
  6. Set the Value to the key you copied.
  7. Ensure the "Notebook access" toggle is ON for this secret.

3. Prepare Source Folders and Documents

  1. Create this directory structure in your Google Drive:

    • My Drive/gemini-source-index/rag_docs_structured

      This is where you will upload DITA, Markdown, and HTML files.

    • My Drive/gemini-source-index/rag_index_gemini_faiss

      This is where the FAISS vector index will be saved.

    Important: Update the paths in Cell 2 if you use different folder names:

    DOCS_DIR = '/content/drive/[your custom directory path]'
    FAISS_INDEX_PATH = '/content/drive/MyDrive/[your custom directory path]'

4. Upload source documents

Upload source files (i.e., .dita, .md, .html) to My Drive/gemini-source-index/rag_docs_structured in Google Drive.

5. Run Cells

Run each cell in your notebook sequentially. Alternately, run the entire Python script as a single cell. (I prefer to run each functional block of code separately for troubleshooting purposes.)

Important: Prior to running the script, uncomment lines 5 and 6 of Cell 1A if your notebook does not already have these Python libraries installed.

# !pip install -q faiss-cpu sentence-transformers google-genai numpy llama-index lxml ipywidgets
# print("All dependencies installed successfully. Please proceed to Cell 1B.")

Step 5: Run notebook.

Execute cells in order:

  1. Cell 1A: Installs dependencies.
  2. Cell 1B: Imports libraries and validate API key.
  3. Cell 2: Mounts Google Drive and configures paths.
  4. Cell 3: LlamaIndex structure-aware document loading and chunking.
  5. Cell 4: Generates embeddings and builds FAISS index.
  6. Cell 5: Launches interactive query interface.

Step 6: Enter and submit your query.

Note: The LLM will only return responses based on the source you provide and not from any external sources. This ensures that you control the "source of truth."

Cell Descriptions

Cell 1A

Purpose: Installs all required Python packages before importing them.

What it does:

Important: Prior to running the script, uncomment lines 5 and 6 of Cell 1A if your notebook does not already have these Python libraries loaded.


Cell 1B

Purpose: Imports all installed packages and validates your Gemini API key.

What it does:


Cell 2

Purpose: Connects to Google Drive and creates source and indexing directories or verifies that these directories exist.

What it does:


Cell 3

Purpose: Loads your documents, parses DITA map structure, and splits content into chunks.

What it does:

  1. DITA Map Parsing:

    • Recursively parses .ditamap files to extract document hierarchy.
    • Builds a path map (e.g., "Manual > Chapter 1 > Topic Name").
    • Preserves structural context for better retrieval.
  2. Document Loading:

    • Scans for .md, .dita, and .html files.
    • Recursively searches subdirectories.
    • Loads full document content.
  3. Metadata Enhancement:

    • Adds DITA map paths to document metadata.
    • Includes file paths and filenames for citation.
  4. Chunking:

    • Splits documents into 128-token chunks with 20-token overlap.
    • Creates TextNode objects with preserved metadata.
    • Generates final chunk list for embedding.

Cell 4

Purpose: Converts text chunks into vector embeddings and builds a searchable FAISS index.

What it does:

  1. Embedding Model Loading:

    • Loads multi-qa-distilbert-cos-v1 transformer model.
    • Optimized for question-answering tasks.
  2. Vector Generation:

    • Embeds all chunk texts into 768-dimensional vectors.
    • Uses float32 format for FAISS compatibility.
  3. FAISS Index Creation:

    • Builds an L2 (Euclidean distance) index.
    • Adds all vectors to the index.
  4. Persistence:

    • Saves FAISS index (my_faiss_index.bin) to Google Drive.
    • Saves chunk metadata (chunk_data.pkl) using pickle.
    • Preserves the link between vectors and original text.

Note: This cell only needs to run once. After the index is saved, you can skip to Cell 5 in future sessions.


Cell 5

Purpose: Provides an interactive interface for querying documents.

What it does:

  1. Component Loading:

    • Loads the saved FAISS index from Google Drive.
    • Loads chunk metadata from pickle file.
    • Initializes the embedding model and Gemini client.
  2. Retrieval Function (retrieve_context):

    • Embeds user queries into vectors.
    • Searches FAISS index for similar chunks.
    • Filters results by similarity threshold.
    • Formats context with metadata for LLM.
  3. Interactive UI:

    • Query Input: Text area for queries.
    • Top K Chunks: Controls number of retrieved chunks (1-20).
    • Similarity Threshold: Filters for relevance (0-50, lower = stricter).
    • Temperature: Controls response creativity (0-1, lower = focused, higher = creative).
    • Max Tokens: Limits response length (64-4096).
  4. RAG Process:

    • Retrieves relevant context from your documents.
    • Constructs a grounded prompt with citations.
    • Sends to Gemini API for generation.
    • Displays response with source provenance.

Configuration Options

Document Processing (Cell 3)

DITA_SUBFOLDER = 'Model_T_DITA'  # Change to your DITA folder name
chunk_size=128                     # Tokens per chunk
chunk_overlap=20                   # Overlap between chunks
target_extensions = ['.md', '.dita', '.html']  # Supported file types

To Do: The source directory is hardcoded, so the value of DITA_SUBFOLDER needs to be generated dynamically.

Embedding Model (Cell 4)

model_name = "sentence-transformers/multi-qa-distilbert-cos-v1"

Gemini Model (Cell 5)

GEMINI_MODEL = "gemini-2.5-flash"

Usage Examples

Example Query 1: Factual Question

Query: "Which car was named after a breed of horse?"
Top K: 5
Threshold: 10.0
Temperature: 0.2

Example Query 2: Complex Analysis

Query: "Compare the engine specifications across different Model T variants"
Top K: 10
Threshold: 15.0
Temperature: 0.4

Troubleshooting

"No module named 'faiss'" Error

"GEMINI_API_KEY not found" Error

"Could not load FAISS index" Error

"No chunks met the similarity threshold" Warning

Empty Response from Gemini

main Components


Thank you Keith Schengili-Roberts for providing the Model T Manual transformation to DITA.


Tagged with:

More posts: