+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Part 398 of 541

๐Ÿ“˜ Recommendation Systems: Collaborative Filtering

Master recommendation systems: collaborative filtering in Python with practical examples, best practices, and real-world applications ๐Ÿš€

๐Ÿš€Intermediate
25 min read

Prerequisites

  • Basic understanding of programming concepts ๐Ÿ“
  • Python installation (3.8+) ๐Ÿ
  • VS Code or preferred IDE ๐Ÿ’ป

What you'll learn

  • Understand the concept fundamentals ๐ŸŽฏ
  • Apply the concept in real projects ๐Ÿ—๏ธ
  • Debug common issues ๐Ÿ›
  • Write clean, Pythonic code โœจ

๐ŸŽฏ Introduction

Welcome to the fascinating world of recommendation systems! ๐ŸŽ‰ Have you ever wondered how Netflix knows what movie youโ€™ll love next? Or how Amazon suggests products that seem perfect for you? Thatโ€™s the magic of collaborative filtering!

In this tutorial, weโ€™ll build our own recommendation system from scratch. Youโ€™ll learn how to analyze user preferences and predict what theyโ€™ll enjoy next. By the end, youโ€™ll be creating recommendations like the big tech companies! Letโ€™s dive in! ๐ŸŠโ€โ™‚๏ธ

๐Ÿ“š Understanding Collaborative Filtering

๐Ÿค” What is Collaborative Filtering?

Collaborative filtering is like asking your friends for movie recommendations! ๐ŸŽฌ Think of it as finding people who like the same things you do, then discovering what else they enjoy that you havenโ€™t tried yet.

In Python terms, collaborative filtering analyzes patterns in user behavior to make predictions. This means you can:

  • โœจ Predict user ratings for items they havenโ€™t seen
  • ๐Ÿš€ Find similar users or items automatically
  • ๐Ÿ›ก๏ธ Build personalized experiences at scale

๐Ÿ’ก Why Use Collaborative Filtering?

Hereโ€™s why developers love collaborative filtering:

  1. No Domain Knowledge Required ๐Ÿ”’: Works with any type of data
  2. Discovers Hidden Patterns ๐Ÿ’ป: Finds connections humans might miss
  3. Improves Over Time ๐Ÿ“–: Gets smarter with more data
  4. Battle-Tested ๐Ÿ”ง: Powers Netflix, Amazon, Spotify, and more!

Real-world example: Imagine building a music app ๐ŸŽต. With collaborative filtering, you can recommend songs based on what similar listeners enjoy - no music theory degree required!

๐Ÿ”ง Basic Syntax and Usage

๐Ÿ“ Simple Example

Letโ€™s start with a friendly example using NumPy and Pandas:

import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

# ๐Ÿ‘‹ Hello, Collaborative Filtering!
print("Welcome to Recommendation Systems! ๐ŸŽ‰")

# ๐ŸŽจ Creating a simple ratings matrix
ratings_dict = {
    'Alice': {'Movie A': 5, 'Movie B': 3, 'Movie C': 4, 'Movie D': 4},
    'Bob': {'Movie A': 3, 'Movie B': 1, 'Movie C': 2, 'Movie D': 3, 'Movie E': 3},
    'Charlie': {'Movie A': 4, 'Movie B': 3, 'Movie C': 4, 'Movie D': 3, 'Movie E': 5},
    'Diana': {'Movie A': 3, 'Movie B': 3, 'Movie C': 1, 'Movie D': 5, 'Movie E': 4},
    'Eve': {'Movie A': 1, 'Movie B': 5, 'Movie C': 5, 'Movie D': 2}
}

# ๐Ÿ—๏ธ Convert to DataFrame
ratings = pd.DataFrame(ratings_dict).T
ratings = ratings.fillna(0)  # ๐ŸŽฏ Fill missing ratings with 0
print("User Ratings Matrix:")
print(ratings)

๐Ÿ’ก Explanation: We create a ratings matrix where rows are users and columns are items. Missing ratings (NaN) are filled with 0 for now!

๐ŸŽฏ Common Patterns

Here are patterns youโ€™ll use daily:

# ๐Ÿ—๏ธ Pattern 1: Calculate User Similarity
def calculate_user_similarity(ratings_matrix):
    # ๐ŸŽจ Using cosine similarity - like finding angle between vectors!
    user_similarity = cosine_similarity(ratings_matrix)
    user_similarity_df = pd.DataFrame(
        user_similarity,
        index=ratings_matrix.index,
        columns=ratings_matrix.index
    )
    return user_similarity_df

# ๐ŸŽจ Pattern 2: Find Similar Users
def find_similar_users(user_similarity_df, user, n=3):
    # ๐Ÿ”„ Sort by similarity and exclude the user themselves
    similar_users = user_similarity_df[user].sort_values(ascending=False)[1:n+1]
    print(f"Users most similar to {user}: ๐Ÿค")
    for similar_user, score in similar_users.items():
        print(f"  {similar_user}: {score:.3f} similarity")
    return similar_users

# ๐Ÿ”„ Pattern 3: Make Predictions
def predict_rating(ratings, user_similarity, user, item):
    # ๐Ÿš€ Weighted average based on similarity
    similar_users = user_similarity[user].drop(user)
    
    numerator = 0
    denominator = 0
    
    for other_user, similarity in similar_users.items():
        if ratings.loc[other_user, item] > 0:  # ๐Ÿ‘€ Only consider actual ratings
            numerator += similarity * ratings.loc[other_user, item]
            denominator += abs(similarity)
    
    if denominator == 0:
        return 0
    
    return numerator / denominator

๐Ÿ’ก Practical Examples

๐ŸŽฌ Example 1: Movie Recommendation System

Letโ€™s build something real:

# ๐Ÿ›๏ธ Create a movie recommendation system
class MovieRecommender:
    def __init__(self):
        self.ratings = None
        self.user_similarity = None
        self.item_similarity = None
        print("๐ŸŽฌ Movie Recommender initialized!")
    
    # โž• Load ratings data
    def load_ratings(self, ratings_data):
        self.ratings = ratings_data
        print(f"โœจ Loaded ratings for {len(self.ratings)} users!")
        
    # ๐Ÿ’ฐ Calculate similarities
    def calculate_similarities(self):
        # ๐ŸŽฏ User-based similarity
        self.user_similarity = cosine_similarity(self.ratings)
        self.user_similarity = pd.DataFrame(
            self.user_similarity,
            index=self.ratings.index,
            columns=self.ratings.index
        )
        
        # ๐ŸŽจ Item-based similarity
        self.item_similarity = cosine_similarity(self.ratings.T)
        self.item_similarity = pd.DataFrame(
            self.item_similarity,
            index=self.ratings.columns,
            columns=self.ratings.columns
        )
        print("๐Ÿš€ Similarities calculated!")
    
    # ๐Ÿ“‹ Get recommendations
    def recommend_movies(self, user, n_recommendations=5):
        print(f"\n๐ŸŽฏ Generating recommendations for {user}...")
        
        # Find unrated movies
        user_ratings = self.ratings.loc[user]
        unrated_movies = user_ratings[user_ratings == 0].index
        
        # Predict ratings for unrated movies
        predictions = {}
        for movie in unrated_movies:
            pred = self._predict_rating(user, movie)
            if pred > 0:
                predictions[movie] = pred
        
        # Sort and return top N
        recommendations = sorted(predictions.items(), 
                               key=lambda x: x[1], 
                               reverse=True)[:n_recommendations]
        
        print(f"๐ŸŽฌ Top {n_recommendations} recommendations:")
        for movie, score in recommendations:
            print(f"  โญ {movie}: {score:.2f} predicted rating")
        
        return recommendations
    
    def _predict_rating(self, user, movie):
        # ๐ŸŽจ Weighted average of similar users' ratings
        similar_users = self.user_similarity[user].drop(user)
        
        numerator = 0
        denominator = 0
        
        for other_user, similarity in similar_users.items():
            rating = self.ratings.loc[other_user, movie]
            if rating > 0:
                numerator += similarity * rating
                denominator += abs(similarity)
        
        if denominator == 0:
            return 0
        
        return numerator / denominator

# ๐ŸŽฎ Let's use it!
recommender = MovieRecommender()
recommender.load_ratings(ratings)
recommender.calculate_similarities()

# Get recommendations for Alice
recommender.recommend_movies('Alice')

๐ŸŽฏ Try it yourself: Add a method to find similar movies based on item similarity!

๐ŸŽต Example 2: Music Playlist Generator

Letโ€™s make it fun with music recommendations:

# ๐Ÿ† Music recommendation with more features
class MusicRecommender:
    def __init__(self):
        self.user_songs = {}
        self.song_features = {}
        self.collaborative_scores = {}
        print("๐ŸŽต Music Recommender ready to rock!")
    
    # ๐ŸŽฎ Add user listening history
    def add_user_history(self, user, songs_with_ratings):
        self.user_songs[user] = songs_with_ratings
        print(f"๐ŸŽง Added {user}'s music taste!")
        
    # ๐ŸŽฏ Add song features (genre, tempo, mood)
    def add_song_features(self, song, features):
        self.song_features[song] = features
        
    # ๐ŸŽŠ Generate playlist
    def generate_playlist(self, user, playlist_size=10, mood=None):
        print(f"\n๐ŸŽต Creating {mood or 'personalized'} playlist for {user}...")
        
        # Get all songs
        all_songs = set()
        for user_songs in self.user_songs.values():
            all_songs.update(user_songs.keys())
        
        # Filter out already listened songs
        listened = set(self.user_songs.get(user, {}).keys())
        candidates = all_songs - listened
        
        # Score each candidate
        scores = {}
        for song in candidates:
            score = self._calculate_song_score(user, song, mood)
            if score > 0:
                scores[song] = score
        
        # Create playlist
        playlist = sorted(scores.items(), 
                         key=lambda x: x[1], 
                         reverse=True)[:playlist_size]
        
        print(f"๐ŸŽถ Your {mood or 'custom'} playlist:")
        for i, (song, score) in enumerate(playlist, 1):
            emoji = self._get_mood_emoji(song)
            print(f"  {i}. {emoji} {song} (score: {score:.2f})")
        
        return [song for song, _ in playlist]
    
    def _calculate_song_score(self, user, song, mood):
        # ๐Ÿš€ Combine collaborative filtering with content features
        collab_score = self._get_collaborative_score(user, song)
        
        if mood and song in self.song_features:
            mood_match = self.song_features[song].get('mood', '') == mood
            if mood_match:
                collab_score *= 1.5  # ๐ŸŽฏ Boost mood matches
        
        return collab_score
    
    def _get_collaborative_score(self, user, song):
        # ๐Ÿ’ซ Find users who liked this song
        score = 0
        count = 0
        
        for other_user, songs in self.user_songs.items():
            if other_user != user and song in songs:
                # Check similarity based on common songs
                similarity = self._calculate_user_similarity(user, other_user)
                if similarity > 0:
                    score += similarity * songs[song]
                    count += 1
        
        return score / count if count > 0 else 0
    
    def _calculate_user_similarity(self, user1, user2):
        # ๐ŸŽจ Simple Jaccard similarity
        songs1 = set(self.user_songs.get(user1, {}).keys())
        songs2 = set(self.user_songs.get(user2, {}).keys())
        
        if not songs1 or not songs2:
            return 0
        
        intersection = len(songs1 & songs2)
        union = len(songs1 | songs2)
        
        return intersection / union if union > 0 else 0
    
    def _get_mood_emoji(self, song):
        # ๐ŸŽญ Fun emoji based on song features
        if song in self.song_features:
            mood = self.song_features[song].get('mood', '')
            return {
                'happy': '๐Ÿ˜Š',
                'energetic': '๐Ÿš€',
                'chill': '๐Ÿ˜Ž',
                'romantic': '๐Ÿ’•',
                'sad': '๐Ÿ˜ข'
            }.get(mood, '๐ŸŽต')
        return '๐ŸŽต'

# ๐ŸŽฎ Test our music recommender!
music_rec = MusicRecommender()

# Add some users and their ratings
music_rec.add_user_history('Alex', {
    'Song A': 5, 'Song B': 4, 'Song C': 3
})
music_rec.add_user_history('Sam', {
    'Song A': 4, 'Song D': 5, 'Song E': 4
})
music_rec.add_user_history('Jordan', {
    'Song B': 5, 'Song D': 4, 'Song F': 5
})

# Add song features
songs_moods = {
    'Song A': {'mood': 'happy', 'genre': 'pop'},
    'Song B': {'mood': 'energetic', 'genre': 'rock'},
    'Song C': {'mood': 'chill', 'genre': 'jazz'},
    'Song D': {'mood': 'happy', 'genre': 'pop'},
    'Song E': {'mood': 'romantic', 'genre': 'ballad'},
    'Song F': {'mood': 'energetic', 'genre': 'electronic'}
}

for song, features in songs_moods.items():
    music_rec.add_song_features(song, features)

# Generate playlists
music_rec.generate_playlist('Alex', playlist_size=3)
music_rec.generate_playlist('Alex', playlist_size=3, mood='energetic')

๐Ÿš€ Advanced Concepts

๐Ÿง™โ€โ™‚๏ธ Matrix Factorization with SVD

When youโ€™re ready to level up, try this advanced pattern:

from scipy.sparse.linalg import svds
import numpy as np

# ๐ŸŽฏ Advanced: Matrix Factorization
class SVDRecommender:
    def __init__(self, n_factors=10):
        self.n_factors = n_factors
        self.user_features = None
        self.item_features = None
        self.user_means = None
        print(f"โœจ SVD Recommender with {n_factors} latent factors!")
    
    # ๐Ÿช„ Fit the model
    def fit(self, ratings_matrix):
        print("๐Ÿš€ Factorizing the matrix...")
        
        # Center the ratings
        self.user_means = ratings_matrix.mean(axis=1)
        ratings_centered = ratings_matrix.sub(self.user_means, axis=0)
        
        # Perform SVD
        U, sigma, Vt = svds(ratings_centered.values, k=self.n_factors)
        
        # Store factorized matrices
        self.user_features = U
        self.item_features = np.diag(sigma) @ Vt
        
        # Create mappings
        self.user_to_idx = {user: idx for idx, user in enumerate(ratings_matrix.index)}
        self.item_to_idx = {item: idx for idx, item in enumerate(ratings_matrix.columns)}
        self.idx_to_user = {idx: user for user, idx in self.user_to_idx.items()}
        self.idx_to_item = {idx: item for item, idx in self.item_to_idx.items()}
        
        print("โœ… Model trained successfully!")
    
    # ๐Ÿ’ซ Predict rating
    def predict(self, user, item):
        if user not in self.user_to_idx or item not in self.item_to_idx:
            return self.user_means[user] if user in self.user_means else 3.0
        
        user_idx = self.user_to_idx[user]
        item_idx = self.item_to_idx[item]
        
        # Reconstruct rating
        prediction = (self.user_features[user_idx] @ 
                     self.item_features[:, item_idx] + 
                     self.user_means[user])
        
        # Clip to valid range
        return np.clip(prediction, 1, 5)
    
    # ๐ŸŒŸ Get top-N recommendations
    def recommend(self, user, n=5, exclude_rated=True):
        if user not in self.user_to_idx:
            print(f"โš ๏ธ User {user} not found!")
            return []
        
        user_idx = self.user_to_idx[user]
        
        # Predict all items
        predictions = (self.user_features[user_idx] @ 
                      self.item_features + 
                      self.user_means[user])
        
        # Create recommendation list
        recs = []
        for item_idx, score in enumerate(predictions):
            item = self.idx_to_item[item_idx]
            recs.append((item, score))
        
        # Sort by score
        recs.sort(key=lambda x: x[1], reverse=True)
        
        print(f"\n๐ŸŽฏ SVD Recommendations for {user}:")
        count = 0
        recommendations = []
        for item, score in recs:
            if count >= n:
                break
            if not exclude_rated or ratings.loc[user, item] == 0:
                print(f"  โญ {item}: {score:.2f}")
                recommendations.append((item, score))
                count += 1
        
        return recommendations

# ๐ŸŽฎ Test SVD recommender
svd_rec = SVDRecommender(n_factors=2)
svd_rec.fit(ratings)
svd_rec.recommend('Alice', n=3)

๐Ÿ—๏ธ Hybrid Recommendation Systems

For the brave developers - combine multiple approaches:

# ๐Ÿš€ Hybrid recommender combining multiple techniques
class HybridRecommender:
    def __init__(self):
        self.collaborative_weight = 0.7
        self.content_weight = 0.3
        print("๐ŸŽจ Hybrid Recommender combining the best of all worlds!")
    
    def recommend(self, user, n=5):
        # ๐Ÿ’ซ Combine collaborative and content-based scores
        print(f"\n๐ŸŒŸ Generating hybrid recommendations for {user}...")
        
        # This is where you'd combine:
        # - Collaborative filtering scores
        # - Content-based similarity
        # - Popularity metrics
        # - User demographics
        # - Contextual information (time, location, device)
        
        print("๐ŸŽฏ Recommendations would combine:")
        print("  - 70% collaborative filtering")
        print("  - 30% content similarity")
        print("  - Boosted by trending items")
        print("  - Personalized by user context")

โš ๏ธ Common Pitfalls and Solutions

๐Ÿ˜ฑ Pitfall 1: The Cold Start Problem

# โŒ Wrong way - fails for new users!
def recommend_for_new_user(ratings_matrix, new_user):
    # This will crash - new_user not in matrix!
    similar_users = calculate_similarity(ratings_matrix, new_user)
    return get_recommendations(similar_users)

# โœ… Correct way - handle new users gracefully!
def recommend_for_new_user(ratings_matrix, new_user_ratings=None):
    if not new_user_ratings:
        # ๐ŸŽฏ Return popular items for brand new users
        print("๐Ÿ‘‹ Welcome! Here are our most popular items:")
        popularity = ratings_matrix.mean(axis=0).sort_values(ascending=False)
        return popularity.head(5)
    
    # ๐Ÿš€ Use available ratings to find similar users
    temp_matrix = ratings_matrix.copy()
    temp_matrix.loc['new_user'] = new_user_ratings
    
    # Now we can calculate similarity!
    return get_recommendations(temp_matrix, 'new_user')

๐Ÿคฏ Pitfall 2: Sparse Data Problems

# โŒ Dangerous - sparse matrices cause issues!
def naive_similarity(sparse_matrix):
    # Most values are 0, similarities will be misleading!
    return cosine_similarity(sparse_matrix)

# โœ… Safe - handle sparsity properly!
def smart_similarity(ratings_matrix, min_overlap=3):
    n_users = len(ratings_matrix)
    similarity_matrix = np.zeros((n_users, n_users))
    
    for i in range(n_users):
        for j in range(i+1, n_users):
            # ๐Ÿ›ก๏ธ Only calculate if users have enough overlap
            user_i = ratings_matrix.iloc[i]
            user_j = ratings_matrix.iloc[j]
            
            # Find commonly rated items
            mask = (user_i > 0) & (user_j > 0)
            
            if mask.sum() >= min_overlap:
                # โœ… Calculate similarity only on common items
                similarity = cosine_similarity(
                    [user_i[mask]], 
                    [user_j[mask]]
                )[0, 0]
                similarity_matrix[i, j] = similarity
                similarity_matrix[j, i] = similarity
    
    return similarity_matrix

๐Ÿ› ๏ธ Best Practices

  1. ๐ŸŽฏ Handle Missing Data: Use appropriate fill strategies (mean, 0, or predict)
  2. ๐Ÿ“ Normalize Ratings: Account for user rating scales (some rate harshly, others generously)
  3. ๐Ÿ›ก๏ธ Set Minimum Thresholds: Require minimum overlap for similarity calculations
  4. ๐ŸŽจ Combine Approaches: Hybrid systems outperform single methods
  5. โœจ Update Incrementally: Donโ€™t recalculate everything for each new rating

๐Ÿงช Hands-On Exercise

๐ŸŽฏ Challenge: Build a Book Recommendation System

Create a complete book recommendation system:

๐Ÿ“‹ Requirements:

  • โœ… Load ratings from multiple users
  • ๐Ÿท๏ธ Implement both user-based and item-based filtering
  • ๐Ÿ‘ค Handle new users gracefully
  • ๐Ÿ“… Add time decay (recent ratings matter more)
  • ๐ŸŽจ Create a simple evaluation metric

๐Ÿš€ Bonus Points:

  • Add genre-based content filtering
  • Implement matrix factorization
  • Create a simple web interface

๐Ÿ’ก Solution

๐Ÿ” Click to see solution
# ๐ŸŽฏ Complete Book Recommendation System!
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

class BookRecommendationSystem:
    def __init__(self):
        self.ratings = pd.DataFrame()
        self.books = {}
        self.user_similarity = None
        self.item_similarity = None
        print("๐Ÿ“š Book Recommendation System initialized!")
    
    # โž• Add rating with timestamp
    def add_rating(self, user, book, rating, timestamp=None):
        if timestamp is None:
            timestamp = datetime.now()
        
        new_rating = pd.DataFrame({
            'user': [user],
            'book': [book],
            'rating': [rating],
            'timestamp': [timestamp]
        })
        
        self.ratings = pd.concat([self.ratings, new_rating], ignore_index=True)
        print(f"โœ… {user} rated '{book}': {rating}โญ")
    
    # ๐ŸŽฏ Create rating matrix with time decay
    def create_rating_matrix(self, decay_factor=0.95, days_window=365):
        print("๐Ÿ”„ Creating rating matrix with time decay...")
        
        # Pivot to matrix form
        matrix = self.ratings.pivot_table(
            index='user',
            columns='book',
            values='rating',
            aggfunc='last'  # Take most recent rating
        )
        
        # Apply time decay
        now = datetime.now()
        for idx, row in self.ratings.iterrows():
            days_old = (now - row['timestamp']).days
            if days_old <= days_window:
                decay = decay_factor ** (days_old / 30)  # Monthly decay
                user = row['user']
                book = row['book']
                matrix.loc[user, book] *= decay
        
        return matrix.fillna(0)
    
    # ๐Ÿ“Š Calculate similarities
    def calculate_similarities(self, min_overlap=2):
        matrix = self.create_rating_matrix()
        
        # User similarity
        print("๐Ÿ‘ฅ Calculating user similarities...")
        self.user_similarity = self._calculate_similarity_matrix(
            matrix, min_overlap
        )
        
        # Item similarity
        print("๐Ÿ“š Calculating book similarities...")
        self.item_similarity = self._calculate_similarity_matrix(
            matrix.T, min_overlap
        )
        
        print("โœ… Similarities calculated!")
    
    def _calculate_similarity_matrix(self, matrix, min_overlap):
        n = len(matrix)
        similarity = np.zeros((n, n))
        
        for i in range(n):
            for j in range(i+1, n):
                vec_i = matrix.iloc[i].values
                vec_j = matrix.iloc[j].values
                
                # Check overlap
                mask = (vec_i > 0) & (vec_j > 0)
                if mask.sum() >= min_overlap:
                    # Pearson correlation
                    if vec_i[mask].std() > 0 and vec_j[mask].std() > 0:
                        corr = np.corrcoef(vec_i[mask], vec_j[mask])[0, 1]
                        similarity[i, j] = corr
                        similarity[j, i] = corr
        
        return pd.DataFrame(
            similarity,
            index=matrix.index,
            columns=matrix.index
        )
    
    # ๐ŸŽฏ Get recommendations
    def recommend_books(self, user, n=5, method='hybrid'):
        print(f"\n๐Ÿ“– Generating {method} recommendations for {user}...")
        
        matrix = self.create_rating_matrix()
        
        if user not in matrix.index:
            return self._cold_start_recommendations(n)
        
        if method == 'user_based':
            recs = self._user_based_recommendations(user, matrix, n)
        elif method == 'item_based':
            recs = self._item_based_recommendations(user, matrix, n)
        else:  # hybrid
            user_recs = self._user_based_recommendations(user, matrix, n*2)
            item_recs = self._item_based_recommendations(user, matrix, n*2)
            
            # Combine and deduplicate
            all_recs = {}
            for book, score in user_recs + item_recs:
                if book in all_recs:
                    all_recs[book] = (all_recs[book] + score) / 2
                else:
                    all_recs[book] = score
            
            recs = sorted(all_recs.items(), key=lambda x: x[1], reverse=True)[:n]
        
        print(f"๐Ÿ“š Top {n} recommendations:")
        for book, score in recs:
            print(f"  โญ {book}: {score:.2f} predicted rating")
        
        return recs
    
    def _user_based_recommendations(self, user, matrix, n):
        if self.user_similarity is None:
            self.calculate_similarities()
        
        # Get similar users
        similar_users = self.user_similarity[user].sort_values(ascending=False)[1:]
        
        predictions = {}
        user_ratings = matrix.loc[user]
        
        for book in matrix.columns:
            if user_ratings[book] == 0:  # Not rated
                weighted_sum = 0
                similarity_sum = 0
                
                for similar_user, similarity in similar_users.items():
                    if similarity > 0 and matrix.loc[similar_user, book] > 0:
                        weighted_sum += similarity * matrix.loc[similar_user, book]
                        similarity_sum += similarity
                
                if similarity_sum > 0:
                    predictions[book] = weighted_sum / similarity_sum
        
        return sorted(predictions.items(), key=lambda x: x[1], reverse=True)[:n]
    
    def _item_based_recommendations(self, user, matrix, n):
        if self.item_similarity is None:
            self.calculate_similarities()
        
        predictions = {}
        user_ratings = matrix.loc[user]
        rated_books = user_ratings[user_ratings > 0]
        
        for book in matrix.columns:
            if user_ratings[book] == 0:  # Not rated
                weighted_sum = 0
                similarity_sum = 0
                
                for rated_book, rating in rated_books.items():
                    if rated_book in self.item_similarity.index:
                        similarity = self.item_similarity.loc[book, rated_book]
                        if similarity > 0:
                            weighted_sum += similarity * rating
                            similarity_sum += similarity
                
                if similarity_sum > 0:
                    predictions[book] = weighted_sum / similarity_sum
        
        return sorted(predictions.items(), key=lambda x: x[1], reverse=True)[:n]
    
    def _cold_start_recommendations(self, n):
        print("๐Ÿ‘‹ New user detected! Here are our popular books:")
        
        # Return most popular books
        popularity = self.ratings.groupby('book')['rating'].agg(['mean', 'count'])
        popularity['score'] = popularity['mean'] * np.log1p(popularity['count'])
        
        top_books = popularity.nlargest(n, 'score')
        
        recs = []
        for book, row in top_books.iterrows():
            recs.append((book, row['mean']))
        
        return recs
    
    # ๐Ÿ“Š Evaluate the system
    def evaluate(self, test_size=0.2):
        print("\n๐Ÿงช Evaluating recommendation system...")
        
        # Split data
        train, test = train_test_split(
            self.ratings, 
            test_size=test_size, 
            random_state=42
        )
        
        # Create temporary system with training data
        temp_system = BookRecommendationSystem()
        for _, row in train.iterrows():
            temp_system.add_rating(
                row['user'], 
                row['book'], 
                row['rating'], 
                row['timestamp']
            )
        
        temp_system.calculate_similarities()
        
        # Make predictions on test set
        predictions = []
        actuals = []
        
        matrix = temp_system.create_rating_matrix()
        
        for _, row in test.iterrows():
            user = row['user']
            book = row['book']
            
            if user in matrix.index and book in matrix.columns:
                # Try to predict
                pred_recs = temp_system._user_based_recommendations(
                    user, matrix, len(matrix.columns)
                )
                
                pred_dict = dict(pred_recs)
                if book in pred_dict:
                    predictions.append(pred_dict[book])
                    actuals.append(row['rating'])
        
        if predictions:
            rmse = np.sqrt(mean_squared_error(actuals, predictions))
            print(f"๐Ÿ“Š RMSE: {rmse:.3f}")
            print(f"๐Ÿ“ˆ Evaluated on {len(predictions)} test ratings")
        else:
            print("โš ๏ธ Not enough data for evaluation")

# ๐ŸŽฎ Test the complete system!
book_system = BookRecommendationSystem()

# Add sample data
users_books = [
    ('Alice', 'The Great Gatsby', 5, datetime.now() - timedelta(days=10)),
    ('Alice', '1984', 4, datetime.now() - timedelta(days=20)),
    ('Alice', 'Pride and Prejudice', 4, datetime.now() - timedelta(days=30)),
    ('Bob', 'The Great Gatsby', 4, datetime.now() - timedelta(days=5)),
    ('Bob', 'To Kill a Mockingbird', 5, datetime.now() - timedelta(days=15)),
    ('Bob', '1984', 3, datetime.now() - timedelta(days=25)),
    ('Charlie', '1984', 5, datetime.now() - timedelta(days=7)),
    ('Charlie', 'Brave New World', 4, datetime.now() - timedelta(days=14)),
    ('Charlie', 'To Kill a Mockingbird', 4, datetime.now() - timedelta(days=21)),
    ('Diana', 'Pride and Prejudice', 5, datetime.now() - timedelta(days=3)),
    ('Diana', 'Jane Eyre', 4, datetime.now() - timedelta(days=12)),
    ('Diana', 'The Great Gatsby', 3, datetime.now() - timedelta(days=18)),
]

for user, book, rating, timestamp in users_books:
    book_system.add_rating(user, book, rating, timestamp)

# Calculate similarities
book_system.calculate_similarities()

# Get recommendations
book_system.recommend_books('Alice', n=3, method='hybrid')
book_system.recommend_books('Bob', n=3, method='user_based')
book_system.recommend_books('NewUser', n=3)  # Test cold start

# Evaluate the system
book_system.evaluate()

๐ŸŽ“ Key Takeaways

Youโ€™ve learned so much! Hereโ€™s what you can now do:

  • โœ… Build recommendation systems with confidence ๐Ÿ’ช
  • โœ… Implement collaborative filtering from scratch ๐Ÿ›ก๏ธ
  • โœ… Handle cold start problems like a pro ๐ŸŽฏ
  • โœ… Combine multiple approaches for better results ๐Ÿ›
  • โœ… Create personalized experiences with Python! ๐Ÿš€

Remember: The best recommendation system is one that understands your users and continuously improves! ๐Ÿค

๐Ÿค Next Steps

Congratulations! ๐ŸŽ‰ Youโ€™ve mastered collaborative filtering!

Hereโ€™s what to do next:

  1. ๐Ÿ’ป Build a recommendation system for your favorite domain
  2. ๐Ÿ—๏ธ Experiment with different similarity metrics
  3. ๐Ÿ“š Explore deep learning approaches (neural collaborative filtering)
  4. ๐ŸŒŸ Deploy your system and get real user feedback!

Keep experimenting, keep learning, and most importantly, keep recommending awesome things! ๐Ÿš€


Happy coding! ๐ŸŽ‰๐Ÿš€โœจ