Prerequisites
- Basic understanding of programming concepts ๐
- Python installation (3.8+) ๐
- VS Code or preferred IDE ๐ป
What you'll learn
- Understand the concept fundamentals ๐ฏ
- Apply the concept in real projects ๐๏ธ
- Debug common issues ๐
- Write clean, Pythonic code โจ
๐ฏ Introduction
Welcome to this exciting tutorial on Full-Text Search with PostgreSQL and MySQL in Python! ๐ Have you ever wondered how Google finds exactly what youโre looking for among billions of web pages? Or how e-commerce sites instantly find products matching your search terms? The magic is in full-text search!
In this guide, weโll explore how to implement powerful search functionality in your Python applications using PostgreSQL and MySQL. Whether youโre building a blog ๐, an online store ๐, or a knowledge base ๐, understanding full-text search is essential for creating user-friendly applications that help people find what they need instantly.
By the end of this tutorial, youโll feel confident implementing sophisticated search features in your own projects! Letโs dive in! ๐โโ๏ธ
๐ Understanding Full-Text Search
๐ค What is Full-Text Search?
Full-text search is like having a super-smart librarian ๐ who not only finds books with exact titles but also understands what youโre looking for even with vague descriptions. Think of it as the difference between Ctrl+F (which finds exact matches) and Google Search (which understands context and meaning).
In database terms, full-text search allows you to search through large amounts of text data efficiently, finding relevant results even when the search terms donโt match exactly. This means you can:
- โจ Search for words in any order (โpython tutorialโ finds โtutorial for pythonโ)
- ๐ Handle typos and variations (searching โprogramingโ finds โprogrammingโ)
- ๐ก๏ธ Rank results by relevance (most relevant matches appear first)
๐ก Why Use Full-Text Search?
Hereโs why developers love full-text search:
- Natural Language Queries ๐ฃ๏ธ: Users can search the way they think
- Lightning Fast โก: Optimized for searching large text datasets
- Intelligent Ranking ๐: Most relevant results appear first
- Language Support ๐: Handle multiple languages and stemming
Real-world example: Imagine a recipe website ๐ณ. With full-text search, searching for โchocolate dessertโ would find recipes containing โchocolaty sweetsโ, โchocolate cakeโ, or โdesserts with cocoaโ!
๐ง Basic Syntax and Usage
๐ PostgreSQL Full-Text Search
Letโs start with PostgreSQLโs powerful full-text search capabilities:
import psycopg2
from psycopg2.extras import RealDictCursor
# ๐ Hello, PostgreSQL Full-Text Search!
def setup_postgresql_fts():
"""๐จ Set up PostgreSQL connection and create sample table"""
conn = psycopg2.connect(
host="localhost",
database="search_demo",
user="your_user",
password="your_password"
)
with conn.cursor() as cur:
# ๐ Create a articles table with full-text search
cur.execute("""
CREATE TABLE IF NOT EXISTS articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
author TEXT,
search_vector TSVECTOR, -- ๐ Special column for FTS
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- ๐ฏ Create index for fast searching
CREATE INDEX IF NOT EXISTS idx_search_vector
ON articles USING GIN(search_vector);
""")
# ๐ Create trigger to update search vector automatically
cur.execute("""
CREATE OR REPLACE FUNCTION update_search_vector()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector('english', COALESCE(NEW.title, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(NEW.content, '')), 'B');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS update_search_vector_trigger ON articles;
CREATE TRIGGER update_search_vector_trigger
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION update_search_vector();
""")
conn.commit()
print("โ
PostgreSQL FTS setup complete!")
return conn
# ๐ฎ Let's add some sample data!
def add_sample_articles(conn):
"""๐ Add sample articles for testing"""
articles = [
("Python Web Development", "Learn how to build web applications with Python using Flask and Django", "Alice"),
("Machine Learning Basics", "Introduction to machine learning with Python and scikit-learn", "Bob"),
("Data Analysis with Pandas", "Master data analysis using Python's powerful Pandas library", "Charlie"),
("Building REST APIs", "Create robust REST APIs with Python and FastAPI framework", "Alice"),
("Python for Beginners", "Start your Python programming journey with this comprehensive guide", "Diana")
]
with conn.cursor() as cur:
for title, content, author in articles:
cur.execute("""
INSERT INTO articles (title, content, author)
VALUES (%s, %s, %s)
""", (title, content, author))
conn.commit()
print(f"โ
Added {len(articles)} articles! ๐")
# ๐ Perform full-text search
def search_postgresql(conn, query):
"""๐ฏ Search articles using PostgreSQL full-text search"""
with conn.cursor(cursor_factory=RealDictCursor) as cur:
# ๐ Perform ranked full-text search
cur.execute("""
SELECT
id,
title,
content,
author,
ts_rank(search_vector, plainto_tsquery('english', %s)) AS rank,
ts_headline('english', title || ' ' || content,
plainto_tsquery('english', %s),
'StartSel=<mark>, StopSel=</mark>') AS highlight
FROM articles
WHERE search_vector @@ plainto_tsquery('english', %s)
ORDER BY rank DESC
LIMIT 10;
""", (query, query, query))
results = cur.fetchall()
return results
# ๐ฎ Test it out!
conn = setup_postgresql_fts()
add_sample_articles(conn)
print("\n๐ Searching for 'python web':")
results = search_postgresql(conn, "python web")
for result in results:
print(f"๐ {result['title']} (Rank: {result['rank']:.3f})")
print(f" by {result['author']} ๐๏ธ")
print(f" Preview: {result['highlight'][:100]}...")
print()
๐ก Explanation: PostgreSQL uses tsvector
for storing searchable text and tsquery
for search queries. The ranking function helps sort results by relevance!
๐ฏ MySQL Full-Text Search
Now letโs see MySQLโs full-text search in action:
import mysql.connector
from mysql.connector import Error
# ๐๏ธ Set up MySQL with full-text search
def setup_mysql_fts():
"""๐จ Create MySQL connection and set up FTS table"""
try:
conn = mysql.connector.connect(
host='localhost',
database='search_demo',
user='your_user',
password='your_password'
)
cursor = conn.cursor()
# ๐ Create products table with FULLTEXT index
cursor.execute("""
CREATE TABLE IF NOT EXISTS products (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
category VARCHAR(100),
price DECIMAL(10, 2),
tags TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FULLTEXT(name, description, tags) -- ๐ Full-text index
) ENGINE=InnoDB;
""")
conn.commit()
print("โ
MySQL FTS setup complete!")
return conn
except Error as e:
print(f"โ Error: {e}")
return None
# ๐ Add sample products
def add_sample_products(conn):
"""๐ฆ Add sample products for searching"""
products = [
("Gaming Laptop", "High-performance laptop for gaming and content creation", "Electronics", 1299.99, "gaming,laptop,computer,performance"),
("Wireless Mouse", "Ergonomic wireless mouse with precision tracking", "Electronics", 49.99, "mouse,wireless,computer,accessories"),
("Python Programming Book", "Complete guide to Python programming for beginners", "Books", 39.99, "python,programming,coding,book,education"),
("Coffee Maker", "Automatic coffee maker with programmable features", "Kitchen", 89.99, "coffee,kitchen,appliance,morning"),
("Gaming Chair", "Comfortable gaming chair with lumbar support", "Furniture", 299.99, "gaming,chair,furniture,comfort")
]
cursor = conn.cursor()
for name, desc, category, price, tags in products:
cursor.execute("""
INSERT INTO products (name, description, category, price, tags)
VALUES (%s, %s, %s, %s, %s)
""", (name, desc, category, price, tags))
conn.commit()
print(f"โ
Added {len(products)} products! ๐๏ธ")
# ๐ Search with MySQL full-text
def search_mysql(conn, query, mode='NATURAL'):
"""๐ฏ Search products using MySQL full-text search"""
cursor = conn.cursor(dictionary=True)
if mode == 'NATURAL':
# ๐ Natural language mode (default)
cursor.execute("""
SELECT *,
MATCH(name, description, tags) AGAINST(%s) AS relevance
FROM products
WHERE MATCH(name, description, tags) AGAINST(%s IN NATURAL LANGUAGE MODE)
ORDER BY relevance DESC
""", (query, query))
elif mode == 'BOOLEAN':
# ๐ฏ Boolean mode for advanced queries
cursor.execute("""
SELECT *,
MATCH(name, description, tags) AGAINST(%s IN BOOLEAN MODE) AS relevance
FROM products
WHERE MATCH(name, description, tags) AGAINST(%s IN BOOLEAN MODE)
ORDER BY relevance DESC
""", (query, query))
results = cursor.fetchall()
return results
# ๐ฎ Test MySQL search!
mysql_conn = setup_mysql_fts()
add_sample_products(mysql_conn)
print("\n๐ Natural Language Search for 'gaming':")
results = search_mysql(mysql_conn, "gaming")
for product in results:
print(f"๐๏ธ {product['name']} - ${product['price']}")
print(f" Relevance: {product['relevance']:.3f} โญ")
print(f" {product['description']}")
print()
print("\n๐ฏ Boolean Search for '+gaming -chair':")
results = search_mysql(mysql_conn, "+gaming -chair", mode='BOOLEAN')
for product in results:
print(f"๐๏ธ {product['name']} - ${product['price']}")
๐ก Practical Examples
๐ Example 1: E-Commerce Product Search
Letโs build a real-world product search engine:
import psycopg2
from datetime import datetime
import re
class ProductSearchEngine:
"""๐๏ธ Advanced product search with PostgreSQL"""
def __init__(self, db_config):
self.conn = psycopg2.connect(**db_config)
self.setup_database()
def setup_database(self):
"""๐ Create products table with advanced FTS"""
with self.conn.cursor() as cur:
cur.execute("""
CREATE TABLE IF NOT EXISTS products (
id SERIAL PRIMARY KEY,
sku VARCHAR(50) UNIQUE,
name TEXT NOT NULL,
description TEXT,
brand VARCHAR(100),
category VARCHAR(100),
price DECIMAL(10, 2),
rating DECIMAL(3, 2),
in_stock BOOLEAN DEFAULT true,
search_vector TSVECTOR,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- ๐ฏ Weighted search vector
CREATE OR REPLACE FUNCTION product_search_vector(
name TEXT, description TEXT, brand TEXT, category TEXT
) RETURNS TSVECTOR AS $$
BEGIN
RETURN
setweight(to_tsvector('english', COALESCE(name, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(brand, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(category, '')), 'B') ||
setweight(to_tsvector('english', COALESCE(description, '')), 'C');
END;
$$ LANGUAGE plpgsql;
-- ๐ Auto-update trigger
CREATE OR REPLACE FUNCTION update_product_search_vector()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector := product_search_vector(
NEW.name, NEW.description, NEW.brand, NEW.category
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS product_search_vector_trigger ON products;
CREATE TRIGGER product_search_vector_trigger
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION update_product_search_vector();
-- ๐ Create GIN index for fast searches
CREATE INDEX IF NOT EXISTS idx_products_search
ON products USING GIN(search_vector);
-- ๐ Create additional indexes
CREATE INDEX IF NOT EXISTS idx_products_category ON products(category);
CREATE INDEX IF NOT EXISTS idx_products_price ON products(price);
CREATE INDEX IF NOT EXISTS idx_products_rating ON products(rating);
""")
self.conn.commit()
print("โ
Product search engine ready!")
def add_product(self, product_data):
"""โ Add a product to the database"""
with self.conn.cursor() as cur:
cur.execute("""
INSERT INTO products (sku, name, description, brand, category, price, rating)
VALUES (%(sku)s, %(name)s, %(description)s, %(brand)s,
%(category)s, %(price)s, %(rating)s)
ON CONFLICT (sku) DO UPDATE SET
name = EXCLUDED.name,
description = EXCLUDED.description,
price = EXCLUDED.price,
rating = EXCLUDED.rating
""", product_data)
self.conn.commit()
def search(self, query, filters=None):
"""๐ Advanced product search with filters"""
# ๐งน Clean and prepare query
query = self._prepare_query(query)
# ๐๏ธ Build dynamic SQL with filters
sql = """
WITH search_results AS (
SELECT
p.*,
ts_rank_cd(search_vector, query) AS rank,
ts_headline('english',
name || ' ' || COALESCE(description, ''),
query,
'MaxWords=50, MinWords=20, StartSel=<b>, StopSel=</b>'
) AS highlight
FROM products p,
plainto_tsquery('english', %(query)s) query
WHERE search_vector @@ query
"""
params = {'query': query}
# ๐ฏ Apply filters
if filters:
if 'category' in filters:
sql += " AND category = %(category)s"
params['category'] = filters['category']
if 'min_price' in filters:
sql += " AND price >= %(min_price)s"
params['min_price'] = filters['min_price']
if 'max_price' in filters:
sql += " AND price <= %(max_price)s"
params['max_price'] = filters['max_price']
if 'min_rating' in filters:
sql += " AND rating >= %(min_rating)s"
params['min_rating'] = filters['min_rating']
if 'in_stock' in filters:
sql += " AND in_stock = %(in_stock)s"
params['in_stock'] = filters['in_stock']
sql += """
)
SELECT * FROM search_results
ORDER BY rank DESC, rating DESC
LIMIT 20
"""
with self.conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
cur.execute(sql, params)
results = cur.fetchall()
return self._format_results(results)
def _prepare_query(self, query):
"""๐งน Clean and enhance search query"""
# Remove special characters
query = re.sub(r'[^\w\s-]', ' ', query)
# Remove extra spaces
query = ' '.join(query.split())
return query
def _format_results(self, results):
"""๐จ Format search results nicely"""
formatted = []
for r in results:
formatted.append({
'id': r['id'],
'sku': r['sku'],
'name': r['name'],
'brand': r['brand'],
'category': r['category'],
'price': float(r['price']),
'rating': float(r['rating']) if r['rating'] else 0,
'in_stock': r['in_stock'],
'relevance': float(r['rank']),
'preview': r['highlight'],
'emoji': self._get_category_emoji(r['category'])
})
return formatted
def _get_category_emoji(self, category):
"""๐จ Get emoji for category"""
emojis = {
'Electronics': '๐ฑ',
'Books': '๐',
'Clothing': '๐',
'Home': '๐ ',
'Sports': 'โฝ',
'Toys': '๐ฎ',
'Food': '๐',
'Beauty': '๐'
}
return emojis.get(category, '๐ฆ')
def suggest_queries(self, partial_query):
"""๐ก Provide search suggestions"""
with self.conn.cursor() as cur:
cur.execute("""
SELECT DISTINCT
unnest(string_to_array(name || ' ' || brand, ' ')) AS term
FROM products
WHERE name ILIKE %s OR brand ILIKE %s
LIMIT 10
""", (f'%{partial_query}%', f'%{partial_query}%'))
suggestions = [row[0] for row in cur.fetchall()]
return suggestions
# ๐ฎ Let's test our search engine!
search_engine = ProductSearchEngine({
'host': 'localhost',
'database': 'search_demo',
'user': 'your_user',
'password': 'your_password'
})
# ๐ฆ Add sample products
products = [
{
'sku': 'LAPTOP001',
'name': 'UltraBook Pro 15',
'description': 'Powerful laptop with Intel i7 processor, 16GB RAM, perfect for programming and design',
'brand': 'TechCorp',
'category': 'Electronics',
'price': 1299.99,
'rating': 4.5
},
{
'sku': 'BOOK001',
'name': 'Python Deep Dive',
'description': 'Comprehensive guide to advanced Python programming techniques and best practices',
'brand': 'CodePress',
'category': 'Books',
'price': 49.99,
'rating': 4.8
},
{
'sku': 'LAPTOP002',
'name': 'Gaming Beast X1',
'description': 'Ultimate gaming laptop with RTX graphics and 144Hz display',
'brand': 'GameTech',
'category': 'Electronics',
'price': 1899.99,
'rating': 4.7
}
]
for product in products:
search_engine.add_product(product)
# ๐ Search examples
print("๐ Search for 'laptop programming':")
results = search_engine.search('laptop programming')
for result in results:
print(f"{result['emoji']} {result['name']} by {result['brand']}")
print(f" ๐ฐ ${result['price']} | โญ {result['rating']}")
print(f" ๐ {result['preview']}")
print(f" ๐ฏ Relevance: {result['relevance']:.3f}\n")
# ๐ฏ Search with filters
print("\n๐ฏ Search 'laptop' with price filter:")
results = search_engine.search('laptop', filters={
'min_price': 1000,
'max_price': 1500,
'min_rating': 4.0
})
for result in results:
print(f"{result['emoji']} {result['name']} - ${result['price']}")
๐ฏ Try it yourself: Add spell-checking functionality using PostgreSQLโs similarity functions!
๐ฎ Example 2: Blog Search with Highlighting
Letโs create a blog search engine with smart highlighting:
import mysql.connector
from datetime import datetime
import html
class BlogSearchEngine:
"""๐ Blog search engine with MySQL full-text search"""
def __init__(self, db_config):
self.conn = mysql.connector.connect(**db_config)
self.setup_database()
def setup_database(self):
"""๐ Create blog tables with FTS"""
cursor = self.conn.cursor()
# ๐ Blog posts table
cursor.execute("""
CREATE TABLE IF NOT EXISTS blog_posts (
id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE,
content LONGTEXT,
excerpt TEXT,
author VARCHAR(100),
status ENUM('draft', 'published') DEFAULT 'draft',
view_count INT DEFAULT 0,
published_at TIMESTAMP NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
FULLTEXT idx_title (title),
FULLTEXT idx_content (content),
FULLTEXT idx_full (title, content, excerpt)
) ENGINE=InnoDB;
""")
# ๐ท๏ธ Tags table
cursor.execute("""
CREATE TABLE IF NOT EXISTS tags (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50) UNIQUE,
slug VARCHAR(50) UNIQUE
);
""")
# ๐ Post-tag relationship
cursor.execute("""
CREATE TABLE IF NOT EXISTS post_tags (
post_id INT,
tag_id INT,
PRIMARY KEY (post_id, tag_id),
FOREIGN KEY (post_id) REFERENCES blog_posts(id) ON DELETE CASCADE,
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);
""")
self.conn.commit()
print("โ
Blog search engine ready!")
def add_post(self, post_data, tags=None):
"""โ Add a blog post with tags"""
cursor = self.conn.cursor()
# ๐ Insert post
cursor.execute("""
INSERT INTO blog_posts (title, slug, content, excerpt, author, status, published_at)
VALUES (%(title)s, %(slug)s, %(content)s, %(excerpt)s,
%(author)s, %(status)s, %(published_at)s)
""", post_data)
post_id = cursor.lastrowid
# ๐ท๏ธ Add tags
if tags:
for tag in tags:
# Get or create tag
cursor.execute("""
INSERT INTO tags (name, slug)
VALUES (%s, %s)
ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)
""", (tag, tag.lower().replace(' ', '-')))
tag_id = cursor.lastrowid
# Link tag to post
cursor.execute("""
INSERT INTO post_tags (post_id, tag_id) VALUES (%s, %s)
""", (post_id, tag_id))
self.conn.commit()
return post_id
def search(self, query, search_mode='all', limit=10):
"""๐ Search blog posts with different strategies"""
cursor = self.conn.cursor(dictionary=True)
# ๐ฏ Different search strategies
if search_mode == 'title':
# Search only in titles
sql = """
SELECT p.*,
MATCH(title) AGAINST(%s IN NATURAL LANGUAGE MODE) AS relevance,
GROUP_CONCAT(t.name SEPARATOR ', ') AS tags
FROM blog_posts p
LEFT JOIN post_tags pt ON p.id = pt.post_id
LEFT JOIN tags t ON pt.tag_id = t.id
WHERE MATCH(title) AGAINST(%s IN NATURAL LANGUAGE MODE)
AND status = 'published'
GROUP BY p.id
ORDER BY relevance DESC, published_at DESC
LIMIT %s
"""
params = (query, query, limit)
elif search_mode == 'phrase':
# Exact phrase search
sql = """
SELECT p.*,
MATCH(title, content, excerpt) AGAINST(%s IN BOOLEAN MODE) AS relevance,
GROUP_CONCAT(t.name SEPARATOR ', ') AS tags
FROM blog_posts p
LEFT JOIN post_tags pt ON p.id = pt.post_id
LEFT JOIN tags t ON pt.tag_id = t.id
WHERE MATCH(title, content, excerpt) AGAINST(%s IN BOOLEAN MODE)
AND status = 'published'
GROUP BY p.id
ORDER BY relevance DESC, published_at DESC
LIMIT %s
"""
# Wrap in quotes for phrase search
phrase_query = f'"{query}"'
params = (phrase_query, phrase_query, limit)
else: # 'all' - default
# Search in all fields
sql = """
SELECT p.*,
MATCH(title, content, excerpt) AGAINST(%s IN NATURAL LANGUAGE MODE) AS relevance,
GROUP_CONCAT(t.name SEPARATOR ', ') AS tags
FROM blog_posts p
LEFT JOIN post_tags pt ON p.id = pt.post_id
LEFT JOIN tags t ON pt.tag_id = t.id
WHERE MATCH(title, content, excerpt) AGAINST(%s IN NATURAL LANGUAGE MODE)
AND status = 'published'
GROUP BY p.id
ORDER BY relevance DESC, view_count DESC, published_at DESC
LIMIT %s
"""
params = (query, query, limit)
cursor.execute(sql, params)
results = cursor.fetchall()
# ๐จ Add highlighting
for result in results:
result['highlighted_content'] = self._highlight_text(
result['content'], query
)
result['emoji'] = self._get_post_emoji(result)
return results
def _highlight_text(self, text, query):
"""โจ Highlight search terms in text"""
if not text:
return ""
# ๐ฏ Simple highlighting (in production, use more sophisticated approach)
words = query.split()
highlighted = html.escape(text[:300]) # First 300 chars
for word in words:
# Case-insensitive replacement
import re
pattern = re.compile(re.escape(word), re.IGNORECASE)
highlighted = pattern.sub(f'<mark>{word}</mark>', highlighted)
return highlighted + "..."
def _get_post_emoji(self, post):
"""๐จ Get emoji based on post content"""
title_lower = post['title'].lower()
if 'python' in title_lower:
return '๐'
elif 'javascript' in title_lower:
return '๐'
elif 'tutorial' in title_lower:
return '๐'
elif 'tips' in title_lower:
return '๐ก'
else:
return '๐'
def get_related_posts(self, post_id, limit=5):
"""๐ Find related posts using FTS"""
cursor = self.conn.cursor(dictionary=True)
# Get current post
cursor.execute("""
SELECT title, content FROM blog_posts WHERE id = %s
""", (post_id,))
current_post = cursor.fetchone()
if not current_post:
return []
# Use title as search query for related posts
cursor.execute("""
SELECT p.id, p.title, p.slug, p.excerpt,
MATCH(title, content) AGAINST(%s) AS relevance
FROM blog_posts p
WHERE p.id != %s
AND MATCH(title, content) AGAINST(%s IN NATURAL LANGUAGE MODE)
AND status = 'published'
ORDER BY relevance DESC
LIMIT %s
""", (current_post['title'], post_id, current_post['title'], limit))
return cursor.fetchall()
def trending_searches(self, days=7):
"""๐ Get trending search terms"""
# In production, you'd track searches in a separate table
# This is a simplified example
cursor = self.conn.cursor(dictionary=True)
cursor.execute("""
SELECT title, view_count
FROM blog_posts
WHERE published_at >= DATE_SUB(NOW(), INTERVAL %s DAY)
AND status = 'published'
ORDER BY view_count DESC
LIMIT 10
""", (days,))
trending = cursor.fetchall()
return trending
# ๐ฎ Test the blog search engine!
blog_search = BlogSearchEngine({
'host': 'localhost',
'database': 'search_demo',
'user': 'your_user',
'password': 'your_password'
})
# ๐ Add sample blog posts
posts = [
{
'title': 'Getting Started with Python Web Development',
'slug': 'python-web-development-intro',
'content': 'Python is an excellent choice for web development. With frameworks like Django and Flask, you can build powerful web applications quickly and efficiently. This tutorial will guide you through the basics of Python web development.',
'excerpt': 'Learn the fundamentals of Python web development with Django and Flask',
'author': 'Alice Developer',
'status': 'published',
'published_at': datetime.now()
},
{
'title': 'Machine Learning for Beginners: A Python Tutorial',
'slug': 'ml-beginners-python',
'content': 'Machine learning is transforming the world, and Python is the go-to language for ML. This comprehensive tutorial covers the basics of machine learning using Python, scikit-learn, and TensorFlow.',
'excerpt': 'Start your machine learning journey with Python',
'author': 'Bob Data Scientist',
'status': 'published',
'published_at': datetime.now()
}
]
# Add posts with tags
blog_search.add_post(posts[0], tags=['Python', 'Web Development', 'Tutorial'])
blog_search.add_post(posts[1], tags=['Python', 'Machine Learning', 'Tutorial'])
# ๐ Search examples
print("๐ Search for 'python':")
results = blog_search.search('python')
for post in results:
print(f"{post['emoji']} {post['title']}")
print(f" by {post['author']} | ๐ท๏ธ {post['tags']}")
print(f" ๐ Relevance: {post['relevance']:.3f}")
print(f" ๐ {post['highlighted_content']}\n")
# ๐ฏ Title-only search
print("\n๐ฏ Title search for 'machine learning':")
results = blog_search.search('machine learning', search_mode='title')
for post in results:
print(f"{post['emoji']} {post['title']}")
# ๐ Phrase search
print("\n๐ Exact phrase search for 'Python web':")
results = blog_search.search('Python web', search_mode='phrase')
for post in results:
print(f"{post['emoji']} {post['title']}")
๐ Advanced Concepts
๐งโโ๏ธ Advanced Topic 1: Multi-Language Search
When youโre ready to support multiple languages:
import psycopg2
from typing import Dict, List
class MultilingualSearch:
"""๐ Multi-language full-text search system"""
def __init__(self, db_config):
self.conn = psycopg2.connect(**db_config)
self.setup_database()
# ๐ Language configurations
self.languages = {
'english': 'english',
'spanish': 'spanish',
'french': 'french',
'german': 'german',
'italian': 'italian',
'portuguese': 'portuguese',
'russian': 'russian',
'chinese': 'simple' # Simplified config
}
def setup_database(self):
"""๐ Create multilingual content table"""
with self.conn.cursor() as cur:
cur.execute("""
CREATE TABLE IF NOT EXISTS multilingual_content (
id SERIAL PRIMARY KEY,
content_id VARCHAR(50), -- Same content in different languages
language VARCHAR(20),
title TEXT,
content TEXT,
search_vector TSVECTOR,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(content_id, language)
);
-- ๐ฏ Create language-specific indexes
CREATE INDEX IF NOT EXISTS idx_search_english
ON multilingual_content USING GIN(search_vector)
WHERE language = 'english';
CREATE INDEX IF NOT EXISTS idx_search_spanish
ON multilingual_content USING GIN(search_vector)
WHERE language = 'spanish';
-- ๐ Dynamic search vector update
CREATE OR REPLACE FUNCTION update_multilingual_search_vector()
RETURNS TRIGGER AS $$
DECLARE
config_name TEXT;
BEGIN
-- Get language config
config_name := CASE NEW.language
WHEN 'english' THEN 'english'
WHEN 'spanish' THEN 'spanish'
WHEN 'french' THEN 'french'
WHEN 'german' THEN 'german'
ELSE 'simple'
END;
-- Update search vector with appropriate language config
NEW.search_vector :=
setweight(to_tsvector(config_name, COALESCE(NEW.title, '')), 'A') ||
setweight(to_tsvector(config_name, COALESCE(NEW.content, '')), 'B');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS multilingual_search_trigger ON multilingual_content;
CREATE TRIGGER multilingual_search_trigger
BEFORE INSERT OR UPDATE ON multilingual_content
FOR EACH ROW EXECUTE FUNCTION update_multilingual_search_vector();
""")
self.conn.commit()
print("โ
Multilingual search ready!")
def add_content(self, content_id: str, translations: Dict[str, Dict]):
"""โ Add content in multiple languages"""
with self.conn.cursor() as cur:
for language, data in translations.items():
cur.execute("""
INSERT INTO multilingual_content (content_id, language, title, content)
VALUES (%s, %s, %s, %s)
ON CONFLICT (content_id, language) DO UPDATE SET
title = EXCLUDED.title,
content = EXCLUDED.content
""", (content_id, language, data['title'], data['content']))
self.conn.commit()
def search(self, query: str, language: str = 'english',
search_all_languages: bool = False) -> List[Dict]:
"""๐ Search with language support"""
with self.conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
if search_all_languages:
# ๐ Search across all languages
cur.execute("""
SELECT
id, content_id, language, title, content,
ts_rank(search_vector,
plainto_tsquery('simple', %s)) AS rank
FROM multilingual_content
WHERE search_vector @@ plainto_tsquery('simple', %s)
ORDER BY rank DESC, language = %s DESC
LIMIT 20
""", (query, query, language))
else:
# ๐ฏ Language-specific search
config = self.languages.get(language, 'simple')
cur.execute("""
SELECT
id, content_id, language, title, content,
ts_rank(search_vector,
plainto_tsquery(%s, %s)) AS rank,
ts_headline(%s, title || ' ' || content,
plainto_tsquery(%s, %s),
'StartSel=<em>, StopSel=</em>') AS highlight
FROM multilingual_content
WHERE language = %s
AND search_vector @@ plainto_tsquery(%s, %s)
ORDER BY rank DESC
LIMIT 20
""", (config, query, config, config, query,
language, config, query))
return cur.fetchall()
def detect_language(self, text: str) -> str:
"""๐ Simple language detection based on common words"""
# In production, use a proper language detection library
language_indicators = {
'english': ['the', 'and', 'is', 'in', 'to'],
'spanish': ['el', 'la', 'de', 'en', 'y'],
'french': ['le', 'de', 'et', 'la', 'les'],
'german': ['der', 'die', 'und', 'in', 'das']
}
text_lower = text.lower()
scores = {}
for lang, words in language_indicators.items():
score = sum(1 for word in words if f' {word} ' in f' {text_lower} ')
scores[lang] = score
return max(scores, key=scores.get) if scores else 'english'
# ๐ Test multilingual search
ml_search = MultilingualSearch({
'host': 'localhost',
'database': 'search_demo',
'user': 'your_user',
'password': 'your_password'
})
# ๐ Add multilingual content
ml_search.add_content('article_001', {
'english': {
'title': 'Introduction to Python Programming',
'content': 'Python is a versatile programming language perfect for beginners and experts alike.'
},
'spanish': {
'title': 'Introducciรณn a la Programaciรณn en Python',
'content': 'Python es un lenguaje de programaciรณn versรกtil perfecto para principiantes y expertos.'
},
'french': {
'title': 'Introduction ร la Programmation Python',
'content': 'Python est un langage de programmation polyvalent parfait pour les dรฉbutants et les experts.'
}
})
# ๐ Search in different languages
print("๐ฌ๐ง English search for 'programming':")
results = ml_search.search('programming', 'english')
for r in results:
print(f" ๐ {r['title']} ({r['language']})")
print("\n๐ช๐ธ Spanish search for 'programaciรณn':")
results = ml_search.search('programaciรณn', 'spanish')
for r in results:
print(f" ๐ {r['title']} ({r['language']})")
print("\n๐ Cross-language search for 'Python':")
results = ml_search.search('Python', search_all_languages=True)
for r in results:
print(f" ๐ {r['title']} ({r['language']})")
๐๏ธ Advanced Topic 2: Search Analytics and Optimization
Track and optimize your search performance:
from datetime import datetime, timedelta
import json
class SearchAnalytics:
"""๐ Advanced search analytics and optimization"""
def __init__(self, db_config):
self.conn = psycopg2.connect(**db_config)
self.setup_analytics_tables()
def setup_analytics_tables(self):
"""๐ Create analytics tracking tables"""
with self.conn.cursor() as cur:
# ๐ Search queries log
cur.execute("""
CREATE TABLE IF NOT EXISTS search_queries (
id SERIAL PRIMARY KEY,
query TEXT NOT NULL,
normalized_query TEXT,
results_count INT DEFAULT 0,
execution_time_ms INT,
user_id VARCHAR(100),
session_id VARCHAR(100),
clicked_result_id INT,
search_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_search_timestamp
ON search_queries(search_timestamp);
CREATE INDEX idx_normalized_query
ON search_queries(normalized_query);
""")
# ๐ Query performance metrics
cur.execute("""
CREATE TABLE IF NOT EXISTS query_metrics (
query TEXT PRIMARY KEY,
search_count INT DEFAULT 0,
avg_results INT DEFAULT 0,
avg_execution_ms INT DEFAULT 0,
click_through_rate DECIMAL(5,2) DEFAULT 0,
last_searched TIMESTAMP,
is_popular BOOLEAN DEFAULT FALSE
);
""")
# ๐ฏ Search suggestions
cur.execute("""
CREATE TABLE IF NOT EXISTS search_suggestions (
id SERIAL PRIMARY KEY,
query TEXT NOT NULL,
suggestion TEXT NOT NULL,
score DECIMAL(5,2) DEFAULT 0,
suggestion_type VARCHAR(50), -- 'typo', 'synonym', 'related'
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(query, suggestion)
);
""")
self.conn.commit()
def log_search(self, query: str, results_count: int,
execution_time_ms: int, user_id: str = None,
session_id: str = None):
"""๐ Log search query for analytics"""
normalized = self._normalize_query(query)
with self.conn.cursor() as cur:
# Log the search
cur.execute("""
INSERT INTO search_queries
(query, normalized_query, results_count, execution_time_ms,
user_id, session_id)
VALUES (%s, %s, %s, %s, %s, %s)
RETURNING id
""", (query, normalized, results_count, execution_time_ms,
user_id, session_id))
search_id = cur.fetchone()[0]
# Update metrics
self._update_query_metrics(normalized, results_count, execution_time_ms)
self.conn.commit()
return search_id
def log_click(self, search_id: int, result_id: int):
"""๐ Log when user clicks a search result"""
with self.conn.cursor() as cur:
cur.execute("""
UPDATE search_queries
SET clicked_result_id = %s
WHERE id = %s
""", (result_id, search_id))
self.conn.commit()
def _normalize_query(self, query: str) -> str:
"""๐งน Normalize query for grouping"""
# Convert to lowercase, remove extra spaces
normalized = ' '.join(query.lower().split())
# Remove common words (in production, use proper stop words)
stop_words = {'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at'}
words = [w for w in normalized.split() if w not in stop_words]
return ' '.join(sorted(words)) # Sort for consistency
def _update_query_metrics(self, query: str, results: int, time_ms: int):
"""๐ Update query performance metrics"""
with self.conn.cursor() as cur:
cur.execute("""
INSERT INTO query_metrics
(query, search_count, avg_results, avg_execution_ms, last_searched)
VALUES (%s, 1, %s, %s, CURRENT_TIMESTAMP)
ON CONFLICT (query) DO UPDATE SET
search_count = query_metrics.search_count + 1,
avg_results = (query_metrics.avg_results * query_metrics.search_count + %s)
/ (query_metrics.search_count + 1),
avg_execution_ms = (query_metrics.avg_execution_ms * query_metrics.search_count + %s)
/ (query_metrics.search_count + 1),
last_searched = CURRENT_TIMESTAMP,
is_popular = (query_metrics.search_count + 1) >= 10
""", (query, results, time_ms, results, time_ms))
def get_popular_searches(self, days: int = 7, limit: int = 20) -> List[Dict]:
"""๐ฅ Get popular searches"""
with self.conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
cur.execute("""
SELECT
normalized_query as query,
COUNT(*) as search_count,
AVG(results_count) as avg_results,
COUNT(DISTINCT user_id) as unique_users
FROM search_queries
WHERE search_timestamp >= CURRENT_TIMESTAMP - INTERVAL '%s days'
GROUP BY normalized_query
ORDER BY search_count DESC
LIMIT %s
""", (days, limit))
return cur.fetchall()
def get_zero_result_queries(self, days: int = 7) -> List[Dict]:
"""๐ Get queries with no results"""
with self.conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
cur.execute("""
SELECT
query,
COUNT(*) as search_count,
MAX(search_timestamp) as last_searched
FROM search_queries
WHERE results_count = 0
AND search_timestamp >= CURRENT_TIMESTAMP - INTERVAL '%s days'
GROUP BY query
ORDER BY search_count DESC
""", (days,))
return cur.fetchall()
def calculate_click_through_rate(self, days: int = 30):
"""๐ Calculate CTR for queries"""
with self.conn.cursor() as cur:
cur.execute("""
UPDATE query_metrics qm
SET click_through_rate = (
SELECT
COALESCE(
COUNT(CASE WHEN clicked_result_id IS NOT NULL THEN 1 END) * 100.0 /
NULLIF(COUNT(*), 0),
0
)
FROM search_queries sq
WHERE sq.normalized_query = qm.query
AND sq.search_timestamp >= CURRENT_TIMESTAMP - INTERVAL '%s days'
)
""", (days,))
self.conn.commit()
def generate_suggestions(self):
"""๐ก Generate search suggestions based on analytics"""
with self.conn.cursor() as cur:
# Find related queries (queries often searched together)
cur.execute("""
WITH user_query_pairs AS (
SELECT
q1.normalized_query as query1,
q2.normalized_query as query2,
COUNT(*) as co_occurrence
FROM search_queries q1
JOIN search_queries q2
ON q1.user_id = q2.user_id
AND q1.session_id = q2.session_id
AND q1.id != q2.id
AND ABS(EXTRACT(EPOCH FROM (q2.search_timestamp - q1.search_timestamp))) < 300
WHERE q1.search_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
GROUP BY q1.normalized_query, q2.normalized_query
HAVING COUNT(*) >= 5
)
INSERT INTO search_suggestions (query, suggestion, score, suggestion_type)
SELECT
query1,
query2,
co_occurrence::DECIMAL / 100,
'related'
FROM user_query_pairs
ON CONFLICT (query, suggestion) DO UPDATE
SET score = EXCLUDED.score;
""")
self.conn.commit()
def get_search_insights(self) -> Dict:
"""๐ Get comprehensive search insights"""
insights = {}
# Popular searches
insights['popular_searches'] = self.get_popular_searches(7, 10)
# Zero results
insights['zero_results'] = self.get_zero_result_queries(7)
# Performance metrics
with self.conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
cur.execute("""
SELECT
AVG(execution_time_ms) as avg_search_time,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_time,
AVG(results_count) as avg_results,
COUNT(DISTINCT user_id) as unique_users,
COUNT(*) as total_searches
FROM search_queries
WHERE search_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
""")
insights['performance'] = cur.fetchone()
# Click-through rates
cur.execute("""
SELECT
AVG(click_through_rate) as avg_ctr,
COUNT(*) as queries_tracked
FROM query_metrics
WHERE search_count >= 10
""")
insights['engagement'] = cur.fetchone()
return insights
# ๐ฎ Test search analytics
analytics = SearchAnalytics({
'host': 'localhost',
'database': 'search_demo',
'user': 'your_user',
'password': 'your_password'
})
# ๐ Simulate search activity
import random
import time
queries = [
"python tutorial", "machine learning", "web development",
"data science", "python flask", "django tutorial",
"numpy pandas", "python async", "database optimization"
]
print("๐ Simulating search activity...")
for _ in range(50):
query = random.choice(queries)
results = random.randint(0, 100)
exec_time = random.randint(10, 200)
user_id = f"user_{random.randint(1, 10)}"
session_id = f"session_{random.randint(1, 20)}"
search_id = analytics.log_search(query, results, exec_time, user_id, session_id)
# Simulate clicks (30% CTR)
if random.random() < 0.3:
analytics.log_click(search_id, random.randint(1, 10))
# ๐ Calculate metrics
analytics.calculate_click_through_rate()
analytics.generate_suggestions()
# ๐ Get insights
insights = analytics.get_search_insights()
print("\n๐ SEARCH INSIGHTS DASHBOARD")
print("=" * 50)
print("\n๐ฅ Popular Searches:")
for search in insights['popular_searches'][:5]:
print(f" '{search['query']}' - {search['search_count']} searches")
print("\n๐ Queries with No Results:")
for search in insights['zero_results'][:3]:
print(f" '{search['query']}' - {search['search_count']} attempts")
print(f"\nโก Performance Metrics:")
perf = insights['performance']
print(f" Average Search Time: {perf['avg_search_time']:.1f}ms")
print(f" 95th Percentile: {perf['p95_time']:.1f}ms")
print(f" Total Searches: {perf['total_searches']}")
print(f"\n๐ Engagement:")
eng = insights['engagement']
print(f" Average CTR: {eng['avg_ctr']:.1f}%")
โ ๏ธ Common Pitfalls and Solutions
๐ฑ Pitfall 1: Slow Search Performance
# โ Wrong way - no indexes!
cursor.execute("""
SELECT * FROM articles
WHERE content LIKE %s
""", (f'%{search_term}%',)) # ๐ฅ This will be very slow!
# โ
Correct way - use full-text indexes!
# PostgreSQL
cursor.execute("""
CREATE INDEX idx_articles_fts ON articles USING GIN(to_tsvector('english', content));
SELECT * FROM articles
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
""", (search_term,))
# MySQL
cursor.execute("""
ALTER TABLE articles ADD FULLTEXT(content);
SELECT * FROM articles
WHERE MATCH(content) AGAINST(%s IN NATURAL LANGUAGE MODE)
""", (search_term,))
๐คฏ Pitfall 2: Ignoring Language and Stemming
# โ Dangerous - exact match only!
def bad_search(query):
# ๐ฅ Won't find "running" when searching "run"!
return db.execute("SELECT * FROM posts WHERE content = ?", (query,))
# โ
Safe - proper language processing!
def good_search(query):
# PostgreSQL with language support
return db.execute("""
SELECT *,
ts_headline('english', content, query) as highlight
FROM posts,
plainto_tsquery('english', %s) query
WHERE to_tsvector('english', content) @@ query
""", (query,))
# ๐ Even better - multi-language support!
def multilingual_search(query, language='english'):
config = {
'english': 'english',
'spanish': 'spanish',
'french': 'french'
}.get(language, 'simple')
return db.execute("""
SELECT * FROM posts
WHERE to_tsvector(%s, content) @@ plainto_tsquery(%s, %s)
""", (config, config, query))
๐ ๏ธ Best Practices
- ๐ฏ Choose the Right Search Type: Use natural language for general search, boolean for advanced
- ๐ Index Strategically: Create indexes on searchable columns but donโt over-index
- ๐ Support Multiple Languages: Consider your user base and implement proper language support
- โก Optimize Performance: Use connection pooling and query caching
- ๐ Monitor and Improve: Track search analytics to improve relevance
๐งช Hands-On Exercise
๐ฏ Challenge: Build a Smart Recipe Search Engine
Create a recipe search engine with these features:
๐ Requirements:
- โ Search by ingredients, recipe name, or cooking method
- ๐ท๏ธ Filter by dietary restrictions (vegan, gluten-free, etc.)
- ๐ค Search by cooking time and difficulty
- ๐ Rank by popularity and ratings
- ๐จ Highlight matching terms in results
๐ Bonus Points:
- Add fuzzy search for typos
- Implement โMore like thisโ functionality
- Create search suggestions as users type
- Add multi-language recipe support
๐ก Solution
๐ Click to see solution
import psycopg2
from psycopg2.extras import RealDictCursor
from datetime import datetime
from typing import List, Dict, Optional
class RecipeSearchEngine:
"""๐ณ Smart recipe search engine with full-text search"""
def __init__(self, db_config):
self.conn = psycopg2.connect(**db_config)
self.setup_database()
def setup_database(self):
"""๐ Create recipe database with FTS"""
with self.conn.cursor() as cur:
# ๐ฝ๏ธ Main recipes table
cur.execute("""
CREATE TABLE IF NOT EXISTS recipes (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
ingredients TEXT[],
instructions TEXT,
prep_time INT, -- minutes
cook_time INT, -- minutes
difficulty VARCHAR(20) CHECK (difficulty IN ('easy', 'medium', 'hard')),
servings INT,
calories INT,
rating DECIMAL(3,2) DEFAULT 0,
review_count INT DEFAULT 0,
created_by VARCHAR(100),
search_vector TSVECTOR,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- ๐ท๏ธ Dietary tags
CREATE TABLE IF NOT EXISTS dietary_tags (
id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE
);
-- ๐ Recipe-tag relationship
CREATE TABLE IF NOT EXISTS recipe_tags (
recipe_id INT REFERENCES recipes(id) ON DELETE CASCADE,
tag_id INT REFERENCES dietary_tags(id) ON DELETE CASCADE,
PRIMARY KEY (recipe_id, tag_id)
);
-- ๐ Create comprehensive search function
CREATE OR REPLACE FUNCTION recipe_search_vector(
name TEXT,
description TEXT,
ingredients TEXT[],
instructions TEXT
) RETURNS TSVECTOR AS $$
BEGIN
RETURN
setweight(to_tsvector('english', COALESCE(name, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(description, '')), 'B') ||
setweight(to_tsvector('english', COALESCE(array_to_string(ingredients, ' '), '')), 'A') ||
setweight(to_tsvector('english', COALESCE(instructions, '')), 'C');
END;
$$ LANGUAGE plpgsql;
-- ๐ Auto-update trigger
CREATE OR REPLACE FUNCTION update_recipe_search_vector()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector := recipe_search_vector(
NEW.name, NEW.description, NEW.ingredients, NEW.instructions
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS recipe_search_trigger ON recipes;
CREATE TRIGGER recipe_search_trigger
BEFORE INSERT OR UPDATE ON recipes
FOR EACH ROW EXECUTE FUNCTION update_recipe_search_vector();
-- ๐ Indexes for performance
CREATE INDEX IF NOT EXISTS idx_recipe_search ON recipes USING GIN(search_vector);
CREATE INDEX IF NOT EXISTS idx_recipe_difficulty ON recipes(difficulty);
CREATE INDEX IF NOT EXISTS idx_recipe_times ON recipes(prep_time, cook_time);
CREATE INDEX IF NOT EXISTS idx_recipe_rating ON recipes(rating DESC);
-- ๐ Search history for suggestions
CREATE TABLE IF NOT EXISTS search_history (
id SERIAL PRIMARY KEY,
query TEXT NOT NULL,
result_count INT DEFAULT 0,
clicked_recipe_id INT,
search_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
""")
self.conn.commit()
print("โ
Recipe search engine ready!")
def add_recipe(self, recipe_data: Dict, dietary_tags: List[str] = None):
"""โ Add a recipe with tags"""
with self.conn.cursor() as cur:
# Insert recipe
cur.execute("""
INSERT INTO recipes (
name, description, ingredients, instructions,
prep_time, cook_time, difficulty, servings, calories, created_by
) VALUES (
%(name)s, %(description)s, %(ingredients)s, %(instructions)s,
%(prep_time)s, %(cook_time)s, %(difficulty)s, %(servings)s,
%(calories)s, %(created_by)s
) RETURNING id
""", recipe_data)
recipe_id = cur.fetchone()[0]
# Add dietary tags
if dietary_tags:
for tag in dietary_tags:
# Get or create tag
cur.execute("""
INSERT INTO dietary_tags (name) VALUES (%s)
ON CONFLICT (name) DO UPDATE SET name = EXCLUDED.name
RETURNING id
""", (tag,))
tag_id = cur.fetchone()[0]
# Link to recipe
cur.execute("""
INSERT INTO recipe_tags (recipe_id, tag_id) VALUES (%s, %s)
""", (recipe_id, tag_id))
self.conn.commit()
return recipe_id
def search(self, query: str, filters: Dict = None) -> List[Dict]:
"""๐ Advanced recipe search"""
with self.conn.cursor(cursor_factory=RealDictCursor) as cur:
# Base query with FTS
sql = """
WITH search_results AS (
SELECT
r.*,
ts_rank_cd(search_vector, query) AS relevance,
ts_headline('english',
name || ' - ' || COALESCE(description, ''),
query,
'StartSel=<mark>, StopSel=</mark>, MaxWords=30'
) AS highlight,
array_agg(DISTINCT dt.name) FILTER (WHERE dt.name IS NOT NULL) AS dietary_tags,
prep_time + cook_time AS total_time
FROM recipes r
LEFT JOIN recipe_tags rt ON r.id = rt.recipe_id
LEFT JOIN dietary_tags dt ON rt.tag_id = dt.id,
plainto_tsquery('english', %(query)s) query
WHERE search_vector @@ query
"""
params = {'query': query}
# Apply filters
if filters:
if 'max_time' in filters:
sql += " AND (prep_time + cook_time) <= %(max_time)s"
params['max_time'] = filters['max_time']
if 'difficulty' in filters:
sql += " AND difficulty = %(difficulty)s"
params['difficulty'] = filters['difficulty']
if 'dietary_tags' in filters and filters['dietary_tags']:
# Must have ALL specified tags
sql += """
AND r.id IN (
SELECT recipe_id
FROM recipe_tags rt
JOIN dietary_tags dt ON rt.tag_id = dt.id
WHERE dt.name = ANY(%(dietary_tags)s)
GROUP BY recipe_id
HAVING COUNT(DISTINCT dt.name) = %(tag_count)s
)
"""
params['dietary_tags'] = filters['dietary_tags']
params['tag_count'] = len(filters['dietary_tags'])
if 'min_rating' in filters:
sql += " AND rating >= %(min_rating)s"
params['min_rating'] = filters['min_rating']
sql += """
GROUP BY r.id, query.query
)
SELECT * FROM search_results
ORDER BY relevance DESC, rating DESC, review_count DESC
LIMIT 20
"""
cur.execute(sql, params)
results = cur.fetchall()
# Log search
self._log_search(query, len(results))
return self._format_results(results)
def _format_results(self, results: List[Dict]) -> List[Dict]:
"""๐จ Format search results"""
formatted = []
for r in results:
formatted.append({
'id': r['id'],
'name': r['name'],
'description': r['description'],
'highlight': r['highlight'],
'ingredients': r['ingredients'],
'prep_time': r['prep_time'],
'cook_time': r['cook_time'],
'total_time': r['total_time'],
'difficulty': r['difficulty'],
'difficulty_emoji': self._get_difficulty_emoji(r['difficulty']),
'servings': r['servings'],
'calories': r['calories'],
'rating': float(r['rating']) if r['rating'] else 0,
'review_count': r['review_count'],
'dietary_tags': r['dietary_tags'] or [],
'relevance': float(r['relevance'])
})
return formatted
def _get_difficulty_emoji(self, difficulty: str) -> str:
"""๐จ Get emoji for difficulty level"""
return {
'easy': '๐ข',
'medium': '๐ก',
'hard': '๐ด'
}.get(difficulty, 'โช')
def _log_search(self, query: str, result_count: int):
"""๐ Log search for analytics"""
with self.conn.cursor() as cur:
cur.execute("""
INSERT INTO search_history (query, result_count)
VALUES (%s, %s)
""", (query, result_count))
self.conn.commit()
def get_suggestions(self, partial_query: str) -> List[str]:
"""๐ก Get search suggestions"""
with self.conn.cursor() as cur:
# Get suggestions from successful past searches
cur.execute("""
SELECT DISTINCT query
FROM search_history
WHERE query ILIKE %s
AND result_count > 0
ORDER BY search_timestamp DESC
LIMIT 5
""", (f'{partial_query}%',))
history_suggestions = [row[0] for row in cur.fetchall()]
# Get suggestions from ingredient names
cur.execute("""
SELECT DISTINCT unnest(ingredients) AS ingredient
FROM recipes
WHERE array_to_string(ingredients, ' ') ILIKE %s
LIMIT 5
""", (f'%{partial_query}%',))
ingredient_suggestions = [row[0] for row in cur.fetchall()]
# Combine and deduplicate
all_suggestions = list(dict.fromkeys(
history_suggestions + ingredient_suggestions
))
return all_suggestions[:10]
def find_similar_recipes(self, recipe_id: int, limit: int = 5) -> List[Dict]:
"""๐ Find similar recipes"""
with self.conn.cursor(cursor_factory=RealDictCursor) as cur:
# Get the recipe's search vector
cur.execute("""
SELECT search_vector, difficulty, prep_time + cook_time as total_time
FROM recipes
WHERE id = %s
""", (recipe_id,))
recipe = cur.fetchone()
if not recipe:
return []
# Find similar recipes using ts_rank
cur.execute("""
SELECT
r.*,
ts_rank(r.search_vector, %s) AS similarity,
array_agg(DISTINCT dt.name) AS dietary_tags
FROM recipes r
LEFT JOIN recipe_tags rt ON r.id = rt.recipe_id
LEFT JOIN dietary_tags dt ON rt.tag_id = dt.id
WHERE r.id != %s
AND r.difficulty = %s
AND ABS((r.prep_time + r.cook_time) - %s) <= 30
GROUP BY r.id
ORDER BY similarity DESC
LIMIT %s
""", (recipe['search_vector'], recipe_id,
recipe['difficulty'], recipe['total_time'], limit))
return self._format_results(cur.fetchall())
# ๐ฎ Test the recipe search engine!
recipe_search = RecipeSearchEngine({
'host': 'localhost',
'database': 'search_demo',
'user': 'your_user',
'password': 'your_password'
})
# ๐ณ Add sample recipes
recipes = [
{
'name': 'Spicy Python Pasta',
'description': 'A delicious pasta dish with a kick, perfect for Python developers',
'ingredients': ['pasta', 'tomatoes', 'garlic', 'chili flakes', 'olive oil', 'basil'],
'instructions': 'Boil pasta. Sautรฉ garlic in olive oil. Add tomatoes and chili. Mix with pasta. Garnish with basil.',
'prep_time': 10,
'cook_time': 20,
'difficulty': 'easy',
'servings': 4,
'calories': 350,
'created_by': 'Chef Alice'
},
{
'name': 'Machine Learning Smoothie',
'description': 'A brain-boosting smoothie packed with nutrients for data scientists',
'ingredients': ['blueberries', 'spinach', 'banana', 'almond milk', 'protein powder', 'chia seeds'],
'instructions': 'Blend all ingredients until smooth. Add ice if desired.',
'prep_time': 5,
'cook_time': 0,
'difficulty': 'easy',
'servings': 1,
'calories': 250,
'created_by': 'Chef Bob'
},
{
'name': 'Async Avocado Toast',
'description': 'Quick and healthy breakfast that processes while you code',
'ingredients': ['bread', 'avocado', 'lime', 'salt', 'pepper', 'eggs', 'cherry tomatoes'],
'instructions': 'Toast bread. Mash avocado with lime, salt, and pepper. Poach eggs. Assemble and enjoy!',
'prep_time': 5,
'cook_time': 10,
'difficulty': 'easy',
'servings': 2,
'calories': 300,
'created_by': 'Chef Charlie'
}
]
# Add recipes with tags
recipe_search.add_recipe(recipes[0], ['vegetarian', 'spicy'])
recipe_search.add_recipe(recipes[1], ['vegan', 'gluten-free', 'healthy'])
recipe_search.add_recipe(recipes[2], ['vegetarian', 'healthy', 'breakfast'])
# ๐ Test searches
print("๐ Search for 'pasta':")
results = recipe_search.search('pasta')
for recipe in results:
print(f"๐ฝ๏ธ {recipe['name']} {recipe['difficulty_emoji']}")
print(f" โฑ๏ธ {recipe['total_time']} min | โญ {recipe['rating']}")
print(f" ๐ {recipe['highlight']}")
print(f" ๐ท๏ธ {', '.join(recipe['dietary_tags'])}\n")
# ๐ฏ Search with filters
print("\n๐ฏ Quick vegan recipes:")
results = recipe_search.search('quick', filters={
'max_time': 15,
'dietary_tags': ['vegan']
})
for recipe in results:
print(f"๐ฝ๏ธ {recipe['name']} - {recipe['total_time']} minutes")
# ๐ก Get suggestions
print("\n๐ก Suggestions for 'av':")
suggestions = recipe_search.get_suggestions('av')
for suggestion in suggestions:
print(f" - {suggestion}")
# ๐ Find similar recipes
print("\n๐ Recipes similar to #1:")
similar = recipe_search.find_similar_recipes(1)
for recipe in similar:
print(f" ๐ฝ๏ธ {recipe['name']}")
๐ Key Takeaways
Youโve learned so much about full-text search! Hereโs what you can now do:
- โ Implement full-text search in PostgreSQL and MySQL with confidence ๐ช
- โ Optimize search performance with proper indexing and configuration ๐ก๏ธ
- โ Build sophisticated search features like highlighting and ranking ๐ฏ
- โ Handle multiple languages and search modes like a pro ๐
- โ Create amazing search experiences for your users! ๐
Remember: Great search functionality can make or break a userโs experience with your application! ๐ค
๐ค Next Steps
Congratulations! ๐ Youโve mastered full-text search with PostgreSQL and MySQL!
Hereโs what to do next:
- ๐ป Implement full-text search in your current project
- ๐๏ธ Build a search analytics dashboard
- ๐ Explore advanced features like fuzzy matching and faceted search
- ๐ Share your search implementation stories!
Remember: The best search engine is one that helps users find what they need quickly and accurately. Keep experimenting, keep measuring, and most importantly, keep your users happy! ๐
Happy searching! ๐๐โจ