Prerequisites
- Basic understanding of programming concepts ๐
- Python installation (3.8+) ๐
- VS Code or preferred IDE ๐ป
What you'll learn
- Understand the concept fundamentals ๐ฏ
- Apply the concept in real projects ๐๏ธ
- Debug common issues ๐
- Write clean, Pythonic code โจ
๐ฏ Introduction
Welcome to the fascinating world of Neo4j and graph databases! ๐ Ever wondered how social networks like Facebook or LinkedIn manage to show you โfriends of friendsโ or โpeople you may knowโ so quickly? The secret is graph databases!
In this tutorial, weโll explore Neo4j, the most popular graph database that makes working with connected data feel like magic. Whether youโre building a social network ๐ฅ, recommendation engine ๐ฏ, or knowledge graph ๐ง , Neo4j will transform how you think about data relationships.
By the end of this tutorial, youโll be creating and querying graph databases like a pro! Letโs embark on this exciting journey! ๐
๐ Understanding Neo4j and Graph Databases
๐ค What is a Graph Database?
A graph database is like a mind map on steroids! ๐ง Think of it as a digital version of those connection boards you see in detective movies ๐ต๏ธ - with strings connecting photos, documents, and clues.
In technical terms, a graph database stores data as:
- Nodes ๐ต: The entities (like people, products, or places)
- Relationships โก๏ธ: The connections between nodes
- Properties ๐ท๏ธ: Attributes of both nodes and relationships
This means you can:
- โจ Model real-world relationships naturally
- ๐ Query connected data lightning fast
- ๐ก๏ธ Maintain data integrity through relationships
๐ก Why Use Neo4j?
Hereโs why developers love Neo4j:
- Intuitive Data Modeling ๐จ: Your data model looks like a whiteboard sketch
- Cypher Query Language ๐ป: SQL-like syntax designed for graphs
- Blazing Fast Traversals โก: Find connections in milliseconds, not minutes
- ACID Compliance ๐: Enterprise-grade reliability
Real-world example: Imagine building a movie recommendation system ๐ฌ. With Neo4j, finding โmovies liked by people who also liked The Matrixโ is a simple query, not a complex join nightmare!
๐ง Basic Syntax and Usage
๐ Installing and Connecting to Neo4j
Letโs start with getting Neo4j up and running:
# ๐ฏ First, install the Neo4j Python driver
# pip install neo4j
from neo4j import GraphDatabase
import os
# ๐ Connection details (use environment variables in production!)
class Neo4jConnection:
def __init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def close(self):
self.driver.close()
def test_connection(self):
with self.driver.session() as session:
result = session.run("RETURN 'Hello, Neo4j! ๐' AS message")
return result.single()["message"]
# ๐ Connect to Neo4j
neo4j_conn = Neo4jConnection(
uri="bolt://localhost:7687", # ๐ Default Neo4j port
user="neo4j",
password="your-password" # ๐ Change this!
)
# ๐ Test the connection
print(neo4j_conn.test_connection())
๐ฏ Creating Your First Graph
Hereโs how to create nodes and relationships:
# ๐จ Creating nodes and relationships
def create_social_network(tx):
# ๐ฅ Create people nodes
tx.run("""
CREATE (alice:Person {name: 'Alice', age: 30, hobby: 'Photography ๐ธ'})
CREATE (bob:Person {name: 'Bob', age: 28, hobby: 'Gaming ๐ฎ'})
CREATE (charlie:Person {name: 'Charlie', age: 32, hobby: 'Cooking ๐ณ'})
CREATE (diana:Person {name: 'Diana', age: 29, hobby: 'Reading ๐'})
""")
# ๐ค Create friendships
tx.run("""
MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
CREATE (a)-[:FRIENDS_WITH {since: 2020}]->(b)
""")
tx.run("""
MATCH (a:Person {name: 'Alice'}), (c:Person {name: 'Charlie'})
CREATE (a)-[:FRIENDS_WITH {since: 2019}]->(c)
""")
tx.run("""
MATCH (b:Person {name: 'Bob'}), (d:Person {name: 'Diana'})
CREATE (b)-[:FRIENDS_WITH {since: 2021}]->(d)
""")
# ๐๏ธ Execute the creation
with neo4j_conn.driver.session() as session:
session.execute_write(create_social_network)
print("Social network created! ๐")
๐ก Practical Examples
๐ Example 1: E-commerce Recommendation Engine
Letโs build a product recommendation system:
# ๐๏ธ E-commerce graph database
class EcommerceGraph:
def __init__(self, driver):
self.driver = driver
def create_shopping_data(self):
with self.driver.session() as session:
# ๐ Create products
session.run("""
CREATE (laptop:Product {name: 'Gaming Laptop', price: 1299, emoji: '๐ป'})
CREATE (mouse:Product {name: 'Gaming Mouse', price: 79, emoji: '๐ฑ๏ธ'})
CREATE (keyboard:Product {name: 'Mechanical Keyboard', price: 149, emoji: 'โจ๏ธ'})
CREATE (headset:Product {name: 'Gaming Headset', price: 99, emoji: '๐ง'})
CREATE (monitor:Product {name: '4K Monitor', price: 599, emoji: '๐ฅ๏ธ'})
""")
# ๐ค Create customers
session.run("""
CREATE (john:Customer {name: 'John', id: 'C001'})
CREATE (sarah:Customer {name: 'Sarah', id: 'C002'})
CREATE (mike:Customer {name: 'Mike', id: 'C003'})
""")
# ๐๏ธ Create purchases
session.run("""
MATCH (john:Customer {name: 'John'}), (laptop:Product {name: 'Gaming Laptop'})
CREATE (john)-[:PURCHASED {date: '2024-01-15', rating: 5}]->(laptop)
""")
session.run("""
MATCH (john:Customer {name: 'John'}), (mouse:Product {name: 'Gaming Mouse'})
CREATE (john)-[:PURCHASED {date: '2024-01-16', rating: 4}]->(mouse)
""")
session.run("""
MATCH (sarah:Customer {name: 'Sarah'}), (laptop:Product {name: 'Gaming Laptop'})
CREATE (sarah)-[:PURCHASED {date: '2024-02-01', rating: 5}]->(laptop)
""")
session.run("""
MATCH (sarah:Customer {name: 'Sarah'}), (keyboard:Product {name: 'Mechanical Keyboard'})
CREATE (sarah)-[:PURCHASED {date: '2024-02-02', rating: 5}]->(keyboard)
""")
def get_recommendations(self, customer_name):
with self.driver.session() as session:
# ๐ฏ Find products bought by similar customers
result = session.run("""
MATCH (c:Customer {name: $customer_name})-[:PURCHASED]->(p:Product)
<-[:PURCHASED]-(other:Customer)-[:PURCHASED]->(rec:Product)
WHERE NOT (c)-[:PURCHASED]->(rec)
RETURN rec.name AS product, rec.emoji AS emoji,
COUNT(DISTINCT other) AS popularity
ORDER BY popularity DESC
LIMIT 3
""", customer_name=customer_name)
recommendations = []
for record in result:
recommendations.append({
'product': record['product'],
'emoji': record['emoji'],
'popularity': record['popularity']
})
return recommendations
# ๐ฎ Let's use it!
ecommerce = EcommerceGraph(neo4j_conn.driver)
ecommerce.create_shopping_data()
# ๐ฏ Get recommendations for John
recommendations = ecommerce.get_recommendations("John")
print("\n๐ Recommendations for John:")
for rec in recommendations:
print(f" {rec['emoji']} {rec['product']} (bought by {rec['popularity']} similar customers)")
๐ฎ Example 2: Knowledge Graph Explorer
Letโs create a knowledge graph for exploring topics:
# ๐ง Knowledge Graph System
class KnowledgeGraph:
def __init__(self, driver):
self.driver = driver
def create_tech_knowledge_graph(self):
with self.driver.session() as session:
# ๐ Create technology topics
session.run("""
CREATE (python:Topic {name: 'Python', emoji: '๐', level: 'Beginner'})
CREATE (neo4j:Topic {name: 'Neo4j', emoji: '๐ท', level: 'Intermediate'})
CREATE (ml:Topic {name: 'Machine Learning', emoji: '๐ค', level: 'Advanced'})
CREATE (web:Topic {name: 'Web Development', emoji: '๐', level: 'Beginner'})
CREATE (api:Topic {name: 'REST APIs', emoji: '๐', level: 'Intermediate'})
CREATE (docker:Topic {name: 'Docker', emoji: '๐ณ', level: 'Intermediate'})
CREATE (k8s:Topic {name: 'Kubernetes', emoji: 'โธ๏ธ', level: 'Advanced'})
""")
# ๐ Create relationships
session.run("""
MATCH (python:Topic {name: 'Python'}), (ml:Topic {name: 'Machine Learning'})
CREATE (python)-[:PREREQUISITE_FOR]->(ml)
""")
session.run("""
MATCH (python:Topic {name: 'Python'}), (web:Topic {name: 'Web Development'})
CREATE (python)-[:USEFUL_FOR]->(web)
""")
session.run("""
MATCH (web:Topic {name: 'Web Development'}), (api:Topic {name: 'REST APIs'})
CREATE (web)-[:LEADS_TO]->(api)
""")
session.run("""
MATCH (docker:Topic {name: 'Docker'}), (k8s:Topic {name: 'Kubernetes'})
CREATE (docker)-[:PREREQUISITE_FOR]->(k8s)
""")
def find_learning_path(self, start_topic, end_topic):
with self.driver.session() as session:
# ๐ค๏ธ Find shortest learning path
result = session.run("""
MATCH path = shortestPath(
(start:Topic {name: $start})-[*]->(end:Topic {name: $end})
)
RETURN [node IN nodes(path) | node.name + ' ' + node.emoji] AS learning_path
""", start=start_topic, end=end_topic)
record = result.single()
if record:
return record['learning_path']
return None
def get_related_topics(self, topic_name):
with self.driver.session() as session:
# ๐ Find all related topics
result = session.run("""
MATCH (t:Topic {name: $topic})-[r]-(related:Topic)
RETURN type(r) AS relationship,
related.name AS topic,
related.emoji AS emoji,
related.level AS level
""", topic=topic_name)
related = []
for record in result:
related.append({
'relationship': record['relationship'],
'topic': record['topic'],
'emoji': record['emoji'],
'level': record['level']
})
return related
# ๐ Create and explore the knowledge graph
knowledge = KnowledgeGraph(neo4j_conn.driver)
knowledge.create_tech_knowledge_graph()
# ๐ค๏ธ Find learning path
path = knowledge.find_learning_path("Python", "Machine Learning")
if path:
print("\n๐ Learning path from Python to Machine Learning:")
print(" โ ".join(path))
# ๐ Explore related topics
related = knowledge.get_related_topics("Python")
print("\n๐ Topics related to Python:")
for rel in related:
print(f" {rel['emoji']} {rel['topic']} ({rel['level']}) - {rel['relationship']}")
๐ Advanced Concepts
๐งโโ๏ธ Advanced Cypher Queries
When youโre ready to level up, try these advanced patterns:
# ๐ฏ Advanced query patterns
class AdvancedQueries:
def __init__(self, driver):
self.driver = driver
def pattern_matching_with_collections(self):
with self.driver.session() as session:
# ๐จ Complex pattern matching
result = session.run("""
// Find people who share multiple interests
MATCH (p1:Person)-[:LIKES]->(interest:Interest)<-[:LIKES]-(p2:Person)
WHERE p1 <> p2
WITH p1, p2, COLLECT(interest.name) AS shared_interests
WHERE SIZE(shared_interests) >= 2
RETURN p1.name AS person1, p2.name AS person2,
shared_interests, SIZE(shared_interests) AS count
ORDER BY count DESC
""")
return list(result)
def graph_algorithms(self):
with self.driver.session() as session:
# ๐ PageRank-style influence scoring
session.run("""
// Calculate influence scores
MATCH (p:Person)
SET p.influence = 0.0
""")
# ๐ Iterative influence calculation
for _ in range(5): # 5 iterations
session.run("""
MATCH (follower:Person)-[:FOLLOWS]->(influencer:Person)
WITH influencer, COUNT(follower) AS follower_count
SET influencer.influence = influencer.influence + follower_count * 0.1
""")
def temporal_queries(self):
with self.driver.session() as session:
# โฐ Time-based analysis
result = session.run("""
// Find trending topics in the last 7 days
MATCH (u:User)-[p:POSTED]->(post:Post)-[:ABOUT]->(topic:Topic)
WHERE p.timestamp > datetime() - duration('P7D')
RETURN topic.name AS topic, topic.emoji AS emoji,
COUNT(DISTINCT post) AS post_count,
COUNT(DISTINCT u) AS user_count
ORDER BY post_count DESC
LIMIT 5
""")
return list(result)
๐๏ธ Graph Data Modeling Best Practices
For enterprise-grade graph applications:
# ๐๏ธ Advanced data modeling
class GraphDataModeling:
def __init__(self, driver):
self.driver = driver
def create_versioned_nodes(self):
with self.driver.session() as session:
# ๐
Versioned data pattern
session.run("""
// Create versioned product catalog
CREATE (p:Product {id: 'PROD001'})
CREATE (v1:ProductVersion {
version: 1,
name: 'Laptop Pro',
price: 999,
valid_from: datetime('2024-01-01'),
valid_to: datetime('2024-06-30')
})
CREATE (v2:ProductVersion {
version: 2,
name: 'Laptop Pro',
price: 1099,
valid_from: datetime('2024-07-01'),
valid_to: datetime('9999-12-31')
})
CREATE (p)-[:HAS_VERSION]->(v1)
CREATE (p)-[:HAS_VERSION]->(v2)
CREATE (v1)-[:NEXT_VERSION]->(v2)
""")
def create_meta_graph(self):
with self.driver.session() as session:
# ๐ฏ Meta-graph pattern for dynamic schemas
session.run("""
// Define entity types and their relationships
CREATE (person:EntityType {name: 'Person', emoji: '๐ค'})
CREATE (company:EntityType {name: 'Company', emoji: '๐ข'})
CREATE (project:EntityType {name: 'Project', emoji: '๐'})
CREATE (works_for:RelationshipType {name: 'WORKS_FOR'})
CREATE (manages:RelationshipType {name: 'MANAGES'})
CREATE (assigned_to:RelationshipType {name: 'ASSIGNED_TO'})
// Define allowed relationships
CREATE (person)-[:CAN_HAVE]->(works_for)-[:WITH]->(company)
CREATE (person)-[:CAN_HAVE]->(manages)-[:WITH]->(project)
CREATE (person)-[:CAN_HAVE]->(assigned_to)-[:WITH]->(project)
""")
โ ๏ธ Common Pitfalls and Solutions
๐ฑ Pitfall 1: The Cartesian Product Trap
# โ Wrong way - creates cartesian product!
def bad_query():
with neo4j_conn.driver.session() as session:
result = session.run("""
MATCH (a:Person), (b:Person)
WHERE a.age > 25 AND b.age > 25
RETURN a, b
""")
# ๐ฅ This will return nยฒ results!
# โ
Correct way - use pattern matching!
def good_query():
with neo4j_conn.driver.session() as session:
result = session.run("""
MATCH (a:Person)-[:FRIENDS_WITH]-(b:Person)
WHERE a.age > 25 AND b.age > 25
RETURN a, b
""")
# โ
Only returns connected people!
๐คฏ Pitfall 2: Missing Indexes
# โ Slow queries without indexes
def slow_lookups():
with neo4j_conn.driver.session() as session:
# ๐ This will be slow on large datasets
result = session.run("""
MATCH (p:Person {email: '[email protected]'})
RETURN p
""")
# โ
Create indexes for better performance!
def create_indexes():
with neo4j_conn.driver.session() as session:
# ๐ Create index for fast lookups
session.run("CREATE INDEX person_email IF NOT EXISTS FOR (p:Person) ON (p.email)")
session.run("CREATE INDEX product_name IF NOT EXISTS FOR (p:Product) ON (p.name)")
print("Indexes created! ๐ฏ")
๐ ๏ธ Best Practices
- ๐ฏ Model for Queries: Design your graph based on how youโll query it
- ๐ Use Meaningful Relationships:
FRIENDS_WITH
not justRELATED_TO
- ๐ก๏ธ Add Constraints: Ensure data integrity with uniqueness constraints
- ๐จ Keep It Simple: Start simple, evolve as needed
- โจ Use Parameters: Always parameterize queries to prevent injection
# ๐ Best practices in action
class Neo4jBestPractices:
def __init__(self, driver):
self.driver = driver
def setup_constraints(self):
with self.driver.session() as session:
# ๐ Ensure data integrity
session.run("CREATE CONSTRAINT person_id IF NOT EXISTS FOR (p:Person) REQUIRE p.id IS UNIQUE")
session.run("CREATE CONSTRAINT product_sku IF NOT EXISTS FOR (p:Product) REQUIRE p.sku IS UNIQUE")
def parameterized_query(self, name, min_age):
with self.driver.session() as session:
# โ
Always use parameters!
result = session.run("""
MATCH (p:Person)
WHERE p.name = $name AND p.age >= $min_age
RETURN p
""", name=name, min_age=min_age)
return list(result)
๐งช Hands-On Exercise
๐ฏ Challenge: Build a Movie Recommendation System
Create a movie recommendation system with the following features:
๐ Requirements:
- โ Movies with title, genre, year, and rating
- ๐ฌ Directors and actors connected to movies
- ๐ค Users who rate and watch movies
- ๐ฏ Recommendation algorithm based on similar users
- ๐จ Each movie needs an emoji based on genre!
๐ Bonus Points:
- Add genre-based filtering
- Implement collaborative filtering
- Create a โSix Degrees of Kevin Baconโ finder
๐ก Solution
๐ Click to see solution
# ๐ฌ Movie Recommendation System
class MovieRecommendationSystem:
def __init__(self, driver):
self.driver = driver
self.genre_emojis = {
'Action': '๐ฅ', 'Comedy': '๐', 'Drama': '๐ญ',
'Horror': '๐ป', 'Sci-Fi': '๐', 'Romance': '๐'
}
def create_movie_database(self):
with self.driver.session() as session:
# ๐ฌ Create movies
session.run("""
CREATE (matrix:Movie {
title: 'The Matrix',
year: 1999,
genre: 'Sci-Fi',
emoji: '๐',
rating: 8.7
})
CREATE (inception:Movie {
title: 'Inception',
year: 2010,
genre: 'Sci-Fi',
emoji: '๐',
rating: 8.8
})
CREATE (godfather:Movie {
title: 'The Godfather',
year: 1972,
genre: 'Drama',
emoji: '๐ญ',
rating: 9.2
})
""")
# ๐ฌ Create people
session.run("""
CREATE (keanu:Actor {name: 'Keanu Reeves', emoji: '๐ญ'})
CREATE (leo:Actor {name: 'Leonardo DiCaprio', emoji: '๐ญ'})
CREATE (nolan:Director {name: 'Christopher Nolan', emoji: '๐ฌ'})
CREATE (wachowski:Director {name: 'The Wachowskis', emoji: '๐ฌ'})
""")
# ๐ฅ Create users
session.run("""
CREATE (alice:User {name: 'Alice', id: 'U001'})
CREATE (bob:User {name: 'Bob', id: 'U002'})
CREATE (charlie:User {name: 'Charlie', id: 'U003'})
""")
# ๐ Connect movies to people
session.run("""
MATCH (matrix:Movie {title: 'The Matrix'}),
(keanu:Actor {name: 'Keanu Reeves'}),
(wachowski:Director {name: 'The Wachowskis'})
CREATE (keanu)-[:ACTED_IN {role: 'Neo'}]->(matrix)
CREATE (wachowski)-[:DIRECTED]->(matrix)
""")
# โญ Create ratings
session.run("""
MATCH (alice:User {name: 'Alice'}), (matrix:Movie {title: 'The Matrix'})
CREATE (alice)-[:RATED {score: 5, timestamp: datetime()}]->(matrix)
""")
def get_recommendations(self, user_id):
with self.driver.session() as session:
# ๐ฏ Collaborative filtering
result = session.run("""
// Find similar users based on shared movie ratings
MATCH (u:User {id: $user_id})-[r1:RATED]->(m:Movie)<-[r2:RATED]-(other:User)
WHERE abs(r1.score - r2.score) <= 1
WITH other, COUNT(DISTINCT m) AS shared_movies
WHERE shared_movies >= 2
// Find movies they liked that our user hasn't seen
MATCH (other)-[r:RATED]->(rec:Movie)
WHERE r.score >= 4 AND NOT EXISTS {
MATCH (u)-[:RATED]->(rec)
}
RETURN rec.title AS title, rec.emoji AS emoji,
rec.genre AS genre, rec.rating AS rating,
AVG(r.score) AS avg_user_score,
COUNT(DISTINCT other) AS recommended_by
ORDER BY recommended_by DESC, avg_user_score DESC
LIMIT 5
""", user_id=user_id)
return list(result)
def find_bacon_number(self, actor_name):
with self.driver.session() as session:
# ๐ฅ Six Degrees of Kevin Bacon!
result = session.run("""
MATCH path = shortestPath(
(actor:Actor {name: $actor_name})-[:ACTED_IN*]-(bacon:Actor {name: 'Kevin Bacon'})
)
RETURN length(path)/2 AS bacon_number,
[node IN nodes(path) WHERE node:Movie | node.title] AS movies
""", actor_name=actor_name)
record = result.single()
if record:
return {
'bacon_number': record['bacon_number'],
'movies': record['movies']
}
return None
# ๐ฎ Test the system!
movie_system = MovieRecommendationSystem(neo4j_conn.driver)
movie_system.create_movie_database()
# ๐ฏ Get recommendations
recommendations = movie_system.get_recommendations("U001")
print("\n๐ฌ Movie Recommendations:")
for movie in recommendations:
print(f" {movie['emoji']} {movie['title']} ({movie['genre']}) - โญ {movie['rating']}")
๐ Key Takeaways
Youโve learned so much! Hereโs what you can now do:
- โ Create graph databases with nodes and relationships ๐ช
- โ Write Cypher queries to find patterns and connections ๐ก๏ธ
- โ Build recommendation systems using graph algorithms ๐ฏ
- โ Model complex relationships naturally and efficiently ๐
- โ Optimize graph queries for lightning-fast performance! ๐
Remember: Neo4j transforms connected data from a challenge into an opportunity! Itโs your secret weapon for building intelligent, relationship-aware applications. ๐ค
๐ค Next Steps
Congratulations! ๐ Youโve mastered Neo4j and graph databases!
Hereโs what to do next:
- ๐ป Practice with the movie recommendation exercise above
- ๐๏ธ Build a social network or knowledge graph using Neo4j
- ๐ Explore Neo4jโs graph algorithms library for advanced analytics
- ๐ Move on to our next tutorial on MongoDB for document databases!
Remember: Every data relationship tells a story. With Neo4j, youโre now equipped to discover and leverage those stories in your applications. Keep exploring, keep connecting, and most importantly, have fun with graphs! ๐
Happy graphing! ๐๐โจ