Prerequisites
- Basic understanding of programming concepts ๐
- Python installation (3.8+) ๐
- VS Code or preferred IDE ๐ป
What you'll learn
- Understand the concept fundamentals ๐ฏ
- Apply the concept in real projects ๐๏ธ
- Debug common issues ๐
- Write clean, Pythonic code โจ
๐ Glob Patterns: File Matching
Welcome to your journey into the wonderful world of glob patterns! ๐ Have you ever wanted to find all .txt
files in a folder, or maybe all images that start with โvacationโ? Thatโs exactly what glob patterns help us do! Think of them as super-powered wildcards that make finding files a breeze. Letโs dive in and make file searching fun! ๐
๐ฏ Introduction
Imagine youโre looking for a specific book in a huge library ๐. Instead of checking every single book, wouldnโt it be amazing if you could say โshow me all books that start with โPythonโ and end with โGuideโโ? Thatโs exactly what glob patterns do for files on your computer!
In this tutorial, youโll learn:
- What glob patterns are and why theyโre awesome ๐
- How to use wildcards to match multiple files ๐ฏ
- Practical ways to organize and find files like a pro ๐ช
- Best practices thatโll save you hours of work โฐ
Ready? Letโs turn you into a file-matching wizard! ๐งโโ๏ธ
๐ Understanding Glob Patterns
What Are Glob Patterns? ๐ค
Glob patterns are like search patterns with superpowers! They use special characters (wildcards) to match multiple files at once. The name โglobโ comes from the phrase โglobal commandโ - pretty cool, right?
Think of glob patterns as a treasure map ๐บ๏ธ where:
*
means โany charactersโ (like a joker card ๐)?
means โexactly one characterโ[...]
means โany one of these charactersโ
Hereโs a quick visual guide:
# Pattern Examples:
# *.txt โ matches all .txt files
# report_*.pdf โ matches report_1.pdf, report_final.pdf, etc.
# data_?.csv โ matches data_1.csv, data_A.csv (single character)
# image_[123].png โ matches image_1.png, image_2.png, image_3.png
Why Use Glob Patterns? ๐ฏ
- Save Time: Find hundreds of files with one command! โฑ๏ธ
- Stay Organized: Process groups of files easily ๐
- Automate Tasks: Perfect for scripts and automation ๐ค
- Cross-Platform: Works on Windows, Mac, and Linux! ๐
๐ง Basic Syntax and Usage
Letโs start with the basics! Pythonโs glob
module makes pattern matching super easy:
import glob
# ๐ Finding all Python files
python_files = glob.glob("*.py")
print(f"Found {len(python_files)} Python files! ๐")
# ๐ผ๏ธ Finding all image files
image_files = glob.glob("*.jpg") + glob.glob("*.png")
print(f"Found {len(image_files)} images! ๐ธ")
The Wildcard Family ๐ช
Letโs meet our wildcard friends:
# 1๏ธโฃ The Star (*) - Matches any characters
all_docs = glob.glob("*.docx") # All Word documents
reports = glob.glob("report_*.pdf") # All reports
# 2๏ธโฃ The Question Mark (?) - Matches exactly one character
logs = glob.glob("log_?.txt") # Matches log_1.txt, log_A.txt, etc.
# 3๏ธโฃ Character Sets ([...]) - Matches any character in the set
data_files = glob.glob("data_[123].csv") # data_1.csv, data_2.csv, data_3.csv
vowel_files = glob.glob("file_[aeiou].txt") # file_a.txt, file_e.txt, etc.
# 4๏ธโฃ Character Ranges - Even more powerful!
numbered = glob.glob("file_[0-9].txt") # file_0.txt through file_9.txt
letters = glob.glob("doc_[a-z].pdf") # doc_a.pdf through doc_z.pdf
๐ก Practical Examples
Example 1: Photo Organizer ๐ธ
Letโs build a simple photo organizer that finds and sorts your vacation photos!
import glob
import os
from datetime import datetime
def organize_vacation_photos():
"""
Find and organize vacation photos by year ๐๏ธ
"""
# Find all vacation photos
vacation_photos = glob.glob("vacation_*.jpg")
if not vacation_photos:
print("No vacation photos found! Time to travel? โ๏ธ")
return
print(f"Found {len(vacation_photos)} vacation photos! ๐ธ")
# Organize by year
for photo in vacation_photos:
# Get file creation time
timestamp = os.path.getmtime(photo)
year = datetime.fromtimestamp(timestamp).year
# Create year folder if it doesn't exist
year_folder = f"Photos_{year}"
os.makedirs(year_folder, exist_ok=True)
# Move photo (in real code, you'd actually move it)
print(f" ๐ Moving {photo} to {year_folder}/")
print("โ
Photos organized by year!")
# Try it out!
organize_vacation_photos()
Example 2: Log File Analyzer ๐
Hereโs a practical example for analyzing log files:
import glob
def analyze_error_logs():
"""
Find and analyze error logs ๐
"""
# Find all error logs from the last week
error_logs = glob.glob("error_2024_01_[0-9][0-9].log")
total_errors = 0
critical_errors = 0
print("๐ Analyzing error logs...")
for log_file in error_logs:
with open(log_file, 'r') as f:
content = f.read()
errors = content.count("ERROR")
criticals = content.count("CRITICAL")
total_errors += errors
critical_errors += criticals
print(f" ๐ {log_file}: {errors} errors ({criticals} critical)")
print(f"\n๐ Summary:")
print(f" Total files analyzed: {len(error_logs)}")
print(f" Total errors: {total_errors}")
print(f" Critical errors: {critical_errors} โ ๏ธ")
if critical_errors > 0:
print(" ๐จ Action needed for critical errors!")
# Example usage
analyze_error_logs()
Example 3: Music Library Scanner ๐ต
Letโs create a fun music library scanner:
import glob
import os
def scan_music_library():
"""
Scan and categorize your music collection ๐ถ
"""
# Define music patterns
patterns = {
"MP3s": "*.mp3",
"FLACs": "*.flac",
"Playlists": "*.m3u",
"Album Art": "cover*.jpg"
}
print("๐ต Scanning your music library...\n")
total_size = 0
for category, pattern in patterns.items():
files = glob.glob(f"**/{pattern}", recursive=True)
if files:
# Calculate total size
size = sum(os.path.getsize(f) for f in files)
size_mb = size / (1024 * 1024)
total_size += size_mb
print(f"๐ถ {category}:")
print(f" Files: {len(files)}")
print(f" Size: {size_mb:.1f} MB")
print(f" Example: {os.path.basename(files[0])}")
print()
print(f"๐ฟ Total library size: {total_size:.1f} MB")
print(f"๐ Rock on!")
# Try it!
scan_music_library()
๐ Advanced Concepts
Recursive Searching with **
๐
Want to search in all subdirectories? Use the double star!
import glob
# Find ALL Python files, even in subdirectories
all_python_files = glob.glob("**/*.py", recursive=True)
print(f"Found {len(all_python_files)} Python files in total! ๐")
# Find all README files anywhere
readmes = glob.glob("**/README.md", recursive=True)
for readme in readmes:
print(f"๐ Found README at: {readme}")
Using pathlib for Modern Python ๐
Pythonโs pathlib
module offers a more modern approach:
from pathlib import Path
# Find all .txt files
txt_files = Path(".").glob("*.txt")
for file in txt_files:
print(f"๐ {file.name} ({file.stat().st_size} bytes)")
# Recursive search with pathlib
all_configs = Path(".").rglob("config.json")
for config in all_configs:
print(f"โ๏ธ Config found: {config}")
Combining Multiple Patterns ๐จ
import glob
from itertools import chain
def find_all_images():
"""
Find all image files using multiple patterns
"""
patterns = ["*.jpg", "*.jpeg", "*.png", "*.gif", "*.bmp"]
# Method 1: Using chain
all_images = list(chain.from_iterable(
glob.glob(pattern) for pattern in patterns
))
# Method 2: Using list comprehension
all_images_v2 = [f for pattern in patterns for f in glob.glob(pattern)]
return all_images
# Find them all!
images = find_all_images()
print(f"๐ผ๏ธ Found {len(images)} images!")
โ ๏ธ Common Pitfalls and Solutions
Pitfall 1: Case Sensitivity ๐ค
# โ Wrong: This might miss files on case-sensitive systems
files = glob.glob("*.PDF") # Won't find .pdf files on Linux/Mac
# โ
Right: Handle both cases
import glob
def find_pdfs_any_case():
"""Find PDFs regardless of case"""
patterns = ["*.pdf", "*.PDF", "*.Pdf"]
all_pdfs = []
for pattern in patterns:
all_pdfs.extend(glob.glob(pattern))
return list(set(all_pdfs)) # Remove duplicates
# Even better with pathlib
from pathlib import Path
pdfs = [p for p in Path(".").iterdir() if p.suffix.lower() == ".pdf"]
Pitfall 2: Special Characters in Filenames ๐ญ
# โ Wrong: Special characters can break patterns
files = glob.glob("report[1].txt") # Won't work as expected!
# โ
Right: Escape special characters
import glob
# Method 1: Escape the brackets
files = glob.glob("report\\[1\\].txt")
# Method 2: Use glob.escape() for safety
filename = "report[1].txt"
pattern = glob.escape(filename)
files = glob.glob(pattern)
Pitfall 3: Memory with Large Directories ๐พ
# โ Wrong: Loading everything into memory
all_files = glob.glob("**/*", recursive=True) # Could be millions!
# โ
Right: Use generators for large searches
from pathlib import Path
def process_files_efficiently():
"""Process files one at a time"""
for file in Path(".").rglob("*"):
if file.is_file():
# Process each file individually
print(f"Processing: {file}")
# Do something with the file
๐ ๏ธ Best Practices
1. Always Validate Patterns ๐
def safe_glob(pattern):
"""
Safely execute glob with validation
"""
try:
files = glob.glob(pattern)
if not files:
print(f"โ ๏ธ No files found matching: {pattern}")
return files
except Exception as e:
print(f"โ Invalid pattern: {e}")
return []
# Use it safely
results = safe_glob("*.txt")
2. Use Descriptive Patterns ๐
# โ Avoid: Too generic
files = glob.glob("*.*")
# โ
Better: Be specific
report_files = glob.glob("monthly_report_*.xlsx")
backup_files = glob.glob("backup_*_2024.zip")
3. Combine with Other Tools ๐ง
import glob
import os
from datetime import datetime
def find_recent_logs(days=7):
"""
Find log files modified in the last N days
"""
log_files = glob.glob("*.log")
recent_files = []
cutoff_time = datetime.now().timestamp() - (days * 24 * 60 * 60)
for log in log_files:
if os.path.getmtime(log) > cutoff_time:
recent_files.append(log)
return recent_files
# Find logs from the last week
recent = find_recent_logs(7)
print(f"๐
Found {len(recent)} recent log files")
๐งช Hands-On Exercise
Time to practice! Hereโs a fun challenge for you:
Challenge: Create a โDesktop Cleanerโ that organizes files by type! ๐งน
import glob
import os
from collections import defaultdict
def desktop_cleaner():
"""
Your mission: Organize desktop files by type!
Requirements:
1. Find all files on the desktop
2. Group them by extension
3. Create folders for each type
4. Move files to appropriate folders
Bonus: Skip system files and folders!
"""
# Your code here!
pass
# Try to solve it yourself first!
๐ก Click here for the solution
import glob
import os
from collections import defaultdict
def desktop_cleaner():
"""
Organize desktop files by type! ๐งน
"""
# Group files by extension
file_groups = defaultdict(list)
# Find all files (not directories)
all_items = glob.glob("*")
for item in all_items:
if os.path.isfile(item):
# Get file extension
_, ext = os.path.splitext(item)
if ext: # Skip files without extensions
ext = ext[1:].upper() # Remove dot and uppercase
file_groups[ext].append(item)
# Create folders and organize
for ext, files in file_groups.items():
# Create folder name
folder_name = f"{ext}_Files"
# Create folder if it doesn't exist
os.makedirs(folder_name, exist_ok=True)
print(f"๐ Creating {folder_name} for {len(files)} files")
# Move files (in practice, you'd use shutil.move)
for file in files:
print(f" โก๏ธ Moving {file} to {folder_name}/")
print("\nโจ Desktop organized! Your desk is clean! ๐")
# Run the cleaner!
desktop_cleaner()
Bonus Challenge! ๐
Create a function that finds duplicate files based on patterns:
def find_duplicates():
"""
Find files that might be duplicates
Example: report.txt, report(1).txt, report(2).txt
"""
# Hint: Use glob to find patterns like "file(*).ext"
# Your code here!
pass
๐ Key Takeaways
Congratulations! Youโre now a glob pattern master! ๐ Hereโs what youโve learned:
- Glob Patterns are powerful tools for finding files ๐
- Wildcards (
*
,?
,[...]
) make pattern matching flexible ๐ฏ - Recursive searching with
**
explores all subdirectories ๐ - pathlib offers a modern, object-oriented approach ๐
- Best practices ensure your code is efficient and safe ๐ก๏ธ
Remember:
- โญ Be specific with your patterns
- ๐ Validate patterns before using them
- ๐พ Consider memory usage with large directories
- ๐จ Combine glob with other Python tools for power!
๐ค Next Steps
Youโre doing amazing! Hereโs what to explore next:
- Try the os.walk() function for more complex file operations ๐ถ
- Learn about fnmatch for pattern matching on strings ๐ฏ
- Explore shutil for moving and copying matched files ๐ฆ
- Build a file backup system using glob patterns! ๐พ
Keep practicing, and remember - every expert was once a beginner! Youโve got this! ๐ช
Happy file matching! ๐๐
P.S. Did you try the desktop cleaner challenge? Share your solution and letโs learn together! The Python community is here to help! ๐ค