Moving from Pinecone to SQLite for Static LLM Applications

When building LLM-powered search applications, vector databases like Pinecone are often the go-to solution for storing and querying embeddings. However, for static datasets that are only written once and don’t require real-time updates, paying for a hosted vector database service might be overkill.

In our case, we had a document search application that processes specification documents, generates embeddings using OpenAI’s text-embedding-3-large model, and allows semantic search through the content. Initially, we used Pinecone’s serverless offering (as seen in our original upserter script):

if __name__ == "__main__":
    # Get paths and index name from UI
    folder_path, move_to_folder, log_file_path, PINECONE_INDEX_NAME = setup_folders()
    
    # Initialize clients - simplified initialization
    pinecone_client = Pinecone(api_key=PINECONE_API_KEY)
    client = OpenAI(api_key=OPENAI_API_KEY)

The Solution: SQLite for Vector Storage

We decided to switch to SQLite for storing our vectors. SQLite offers several advantages for our use case:

  1. Zero hosting costs
  2. Simple deployment (single file database)
  3. No network latency
  4. Built-in data persistence
  5. Familiar SQL interface

Database Schema

Our SQLite implementation uses a simple but effective schema:

CREATE TABLE vectors (
    id TEXT PRIMARY KEY,
    vector BLOB NOT NULL,
    text TEXT,
    page TEXT,
    enh TEXT
);

The vector column stores the embeddings as binary data using Python’s struct module for efficient serialization:

def serialize_vector(vector):
    return struct.pack("%sf" % len(vector), *vector)

def deserialize_vector(blob):
    return np.array(struct.unpack('f' * (len(blob)//4), blob))

Vector Search Implementation

The similarity search functionality remains unchanged – we still use cosine similarity to find the most relevant documents. Here’s how we implemented it with SQLite:

def main():
    # Get database path
    db_path = select_db_file()
    if not db_path:
        print("No database selected. Exiting...")
        return

    # Connect to the database
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Get prompt embedding
    prompt = "ceiling grid supports"
    prompt_embedding = get_embedding(prompt)

    # Fetch all vectors from database
    cursor.execute("SELECT id, vector, text, page, enh FROM vectors")
    results = cursor.fetchall()

    # Calculate similarities and store results
    similarities = []
    for row in results:
        vector_id, vector_blob, text, page, enh = row
        vector = deserialize_vector(vector_blob)
        similarity = cosine_similarity(prompt_embedding, vector)
        similarities.append((similarity, vector_id, text, page, enh))

    # Sort by similarity (highest first)
    similarities.sort(reverse=True)

    # Print top 5 results
    print(f"\nTop 5 results for prompt: '{prompt}'\n")
    for i, (similarity, vector_id, text, page, enh) in enumerate(similarities[:5], 1):
        print(f"\n{i}. Similarity: {similarity:.4f}")
        print(f"Page: {page}")
        print(f"Enhancement: {enh}")
        print(f"Text preview: {text}...")
        print("-" * 80)

This code:

  1. Loads the query text
  2. Generates an embedding using OpenAI’s API
  3. Compares it against all vectors in the database
  4. Returns the top 5 most similar results

Results

Looking at the search results from our test query “ceiling grid supports”:

1. Similarity: 0.5149
Page: Comb_E_Specifications_page_26
Enhancement: 0
--------------------------------------------------------------------------------

2. Similarity: 0.5008
Page: Comb_E_Specifications_page_76
Enhancement: 0

Cost Comparison

Pinecone Serverless

  • Base cost: ~$0.08/hour
  • Monthly cost: ~$60 (assuming constant uptime)
  • Additional costs for storage and operations

SQLite

  • One-time development cost
  • Zero ongoing infrastructure costs
  • Free storage and queries
  • Minimal maintenance required

Implementation Notes

The migration process involved creating two main scripts:

1. Vector Upserter: Handles the initial data ingestion

def init_db(db_path):
    db = sqlite3.connect(db_path)
    db.execute("""
    CREATE TABLE IF NOT EXISTS vectors (
        id TEXT PRIMARY KEY,
        vector BLOB NOT NULL,
        text TEXT,
        page TEXT,
        enh TEXT
    );
    """)
    return db

def serialize_vector(vector):
    return struct.pack("%sf" % len(vector), *vector)

def insert_vector(db, vector_id, vector, text, page, enh):
    serialized_vector = serialize_vector(vector)
    db.execute(
        "INSERT OR REPLACE INTO vectors (id, vector, text, page, enh) VALUES (?, ?, ?, ?, ?)",
        (vector_id, serialized_vector, text, page, enh)
    )
    db.commit()

2. Vector Retrieval: Manages similarity searches

def deserialize_vector(blob):
    return np.array(struct.unpack('f' * (len(blob)//4), blob))

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def get_embedding(text, model="text-embedding-3-large"):
    client = OpenAI(api_key=OPENAI_API_KEY)
    response = client.embeddings.create(input=[text], model=model)
    return response.data[0].embedding

Conclusion

For static datasets that don’t require real-time updates, SQLite provides a cost-effective alternative to hosted vector databases. While it may not offer the same scalability and real-time capabilities as Pinecone, it’s perfectly suitable for many LLM applications, especially those with predetermined document sets.

The switch from Pinecone to SQLite resulted in cost savings while maintaining the same search functionality and performance for our use case. This approach might not suit every application, but for static document search implementations, it’s worth considering SQLite as a viable alternative to hosted vector databases.

Remember to evaluate your specific needs – if you require real-time updates, multi-user concurrent access, or need to handle massive scale, a dedicated vector database might still be the better choice.