Moving from Pinecone to SQLite for Static LLM Applications
When building LLM-powered search applications, vector databases like Pinecone are often the go-to solution for storing and querying embeddings. However, for static datasets that are only written once and don’t require real-time updates, paying for a hosted vector database service might be overkill.
In our case, we had a document search application that processes specification documents, generates embeddings using OpenAI’s text-embedding-3-large model, and allows semantic search through the content. Initially, we used Pinecone’s serverless offering (as seen in our original upserter script):
if __name__ == "__main__":
# Get paths and index name from UI
folder_path, move_to_folder, log_file_path, PINECONE_INDEX_NAME = setup_folders()
# Initialize clients - simplified initialization
pinecone_client = Pinecone(api_key=PINECONE_API_KEY)
client = OpenAI(api_key=OPENAI_API_KEY)
The Solution: SQLite for Vector Storage
We decided to switch to SQLite for storing our vectors. SQLite offers several advantages for our use case:
- Zero hosting costs
- Simple deployment (single file database)
- No network latency
- Built-in data persistence
- Familiar SQL interface
Database Schema
Our SQLite implementation uses a simple but effective schema:
CREATE TABLE vectors (
id TEXT PRIMARY KEY,
vector BLOB NOT NULL,
text TEXT,
page TEXT,
enh TEXT
);
The vector
column stores the embeddings as binary data using Python’s struct
module for efficient serialization:
def serialize_vector(vector):
return struct.pack("%sf" % len(vector), *vector)
def deserialize_vector(blob):
return np.array(struct.unpack('f' * (len(blob)//4), blob))
Vector Search Implementation
The similarity search functionality remains unchanged – we still use cosine similarity to find the most relevant documents. Here’s how we implemented it with SQLite:
def main():
# Get database path
db_path = select_db_file()
if not db_path:
print("No database selected. Exiting...")
return
# Connect to the database
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Get prompt embedding
prompt = "ceiling grid supports"
prompt_embedding = get_embedding(prompt)
# Fetch all vectors from database
cursor.execute("SELECT id, vector, text, page, enh FROM vectors")
results = cursor.fetchall()
# Calculate similarities and store results
similarities = []
for row in results:
vector_id, vector_blob, text, page, enh = row
vector = deserialize_vector(vector_blob)
similarity = cosine_similarity(prompt_embedding, vector)
similarities.append((similarity, vector_id, text, page, enh))
# Sort by similarity (highest first)
similarities.sort(reverse=True)
# Print top 5 results
print(f"\nTop 5 results for prompt: '{prompt}'\n")
for i, (similarity, vector_id, text, page, enh) in enumerate(similarities[:5], 1):
print(f"\n{i}. Similarity: {similarity:.4f}")
print(f"Page: {page}")
print(f"Enhancement: {enh}")
print(f"Text preview: {text}...")
print("-" * 80)
This code:
- Loads the query text
- Generates an embedding using OpenAI’s API
- Compares it against all vectors in the database
- Returns the top 5 most similar results
Results
Looking at the search results from our test query “ceiling grid supports”:
1. Similarity: 0.5149
Page: Comb_E_Specifications_page_26
Enhancement: 0
--------------------------------------------------------------------------------
2. Similarity: 0.5008
Page: Comb_E_Specifications_page_76
Enhancement: 0
Cost Comparison
Pinecone Serverless
- Base cost: ~$0.08/hour
- Monthly cost: ~$60 (assuming constant uptime)
- Additional costs for storage and operations
SQLite
- One-time development cost
- Zero ongoing infrastructure costs
- Free storage and queries
- Minimal maintenance required
Implementation Notes
The migration process involved creating two main scripts:
1. Vector Upserter: Handles the initial data ingestion
def init_db(db_path):
db = sqlite3.connect(db_path)
db.execute("""
CREATE TABLE IF NOT EXISTS vectors (
id TEXT PRIMARY KEY,
vector BLOB NOT NULL,
text TEXT,
page TEXT,
enh TEXT
);
""")
return db
def serialize_vector(vector):
return struct.pack("%sf" % len(vector), *vector)
def insert_vector(db, vector_id, vector, text, page, enh):
serialized_vector = serialize_vector(vector)
db.execute(
"INSERT OR REPLACE INTO vectors (id, vector, text, page, enh) VALUES (?, ?, ?, ?, ?)",
(vector_id, serialized_vector, text, page, enh)
)
db.commit()
2. Vector Retrieval: Manages similarity searches
def deserialize_vector(blob):
return np.array(struct.unpack('f' * (len(blob)//4), blob))
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def get_embedding(text, model="text-embedding-3-large"):
client = OpenAI(api_key=OPENAI_API_KEY)
response = client.embeddings.create(input=[text], model=model)
return response.data[0].embedding
Conclusion
For static datasets that don’t require real-time updates, SQLite provides a cost-effective alternative to hosted vector databases. While it may not offer the same scalability and real-time capabilities as Pinecone, it’s perfectly suitable for many LLM applications, especially those with predetermined document sets.
The switch from Pinecone to SQLite resulted in cost savings while maintaining the same search functionality and performance for our use case. This approach might not suit every application, but for static document search implementations, it’s worth considering SQLite as a viable alternative to hosted vector databases.
Remember to evaluate your specific needs – if you require real-time updates, multi-user concurrent access, or need to handle massive scale, a dedicated vector database might still be the better choice.