Prerequisites
- Basic understanding of programming concepts ๐
- Python installation (3.8+) ๐
- VS Code or preferred IDE ๐ป
What you'll learn
- Understand the concept fundamentals ๐ฏ
- Apply the concept in real projects ๐๏ธ
- Debug common issues ๐
- Write clean, Pythonic code โจ
๐ฏ Introduction
Welcome to this exciting tutorial on Abstract Syntax Trees (ASTs) for code analysis! ๐ In this guide, weโll explore how Python represents code as tree structures and how you can use this powerful feature to analyze, transform, and understand code programmatically.
Youโll discover how ASTs can transform your Python development experience. Whether youโre building code linters ๐งน, static analyzers ๐, or code transformation tools ๐ง, understanding ASTs is essential for writing powerful development tools.
By the end of this tutorial, youโll feel confident using ASTs in your own projects! Letโs dive in! ๐โโ๏ธ
๐ Understanding Abstract Syntax Trees
๐ค What is an AST?
An Abstract Syntax Tree is like a family tree for your code ๐ณ. Think of it as a hierarchical representation that shows how different parts of your code relate to each other - just like how a family tree shows relationships between people!
In Python terms, an AST is a tree representation of the syntactic structure of source code. This means you can:
- โจ Analyze code structure without executing it
- ๐ Transform code programmatically
- ๐ก๏ธ Build powerful development tools
๐ก Why Use ASTs?
Hereโs why developers love ASTs:
- Static Analysis ๐: Analyze code without running it
- Code Transformation ๐ง: Modify code programmatically
- Pattern Detection ๐ฏ: Find specific code patterns
- Tool Building ๐ ๏ธ: Create linters, formatters, and analyzers
Real-world example: Imagine building a code quality checker ๐จ. With ASTs, you can detect unused variables, find complex functions, or enforce coding standards - all without executing the code!
๐ง Basic Syntax and Usage
๐ Simple Example
Letโs start with a friendly example:
import ast
# ๐ Hello, AST!
code = """
def greet(name):
return f"Hello, {name}! ๐"
"""
# ๐จ Parse the code into an AST
tree = ast.parse(code)
# ๐ Let's see what we got
print(f"Root node type: {type(tree).__name__}")
print(f"Number of items in body: {len(tree.body)}")
# ๐ณ First item is our function
func_def = tree.body[0]
print(f"Function name: {func_def.name}")
print(f"Number of arguments: {len(func_def.args.args)}")
๐ก Explanation: Notice how we can inspect the code structure without running it! The AST gives us access to function names, arguments, and more.
๐ฏ Common AST Nodes
Here are the AST nodes youโll use daily:
import ast
# ๐๏ธ Pattern 1: Exploring different node types
code = """
x = 42 # Assignment
y = x + 10 # Binary operation
if y > 50: # Conditional
print("Big number! ๐")
"""
tree = ast.parse(code)
# ๐จ Pattern 2: Walking through the AST
for node in ast.walk(tree):
print(f"{type(node).__name__} node found! ๐")
# ๐ Pattern 3: Using NodeVisitor
class SimpleVisitor(ast.NodeVisitor):
def visit_Assign(self, node):
# ๐ Called for each assignment
target = node.targets[0].id
print(f"Found assignment to: {target} โจ")
self.generic_visit(node)
def visit_If(self, node):
# ๐ฏ Called for each if statement
print("Found an if statement! ๐")
self.generic_visit(node)
visitor = SimpleVisitor()
visitor.visit(tree)
๐ก Practical Examples
๐ Example 1: Variable Usage Analyzer
Letโs build something real:
import ast
from collections import defaultdict
# ๐๏ธ Analyze variable usage in code
class VariableAnalyzer(ast.NodeVisitor):
def __init__(self):
self.definitions = defaultdict(list) # ๐ Where variables are defined
self.usages = defaultdict(list) # ๐ Where variables are used
self.current_line = 0
def visit_Assign(self, node):
# โ Track variable definitions
for target in node.targets:
if isinstance(target, ast.Name):
self.definitions[target.id].append(node.lineno)
print(f"๐ Variable '{target.id}' defined at line {node.lineno}")
self.generic_visit(node)
def visit_Name(self, node):
# ๐ฐ Track variable usage (when loading)
if isinstance(node.ctx, ast.Load):
self.usages[node.id].append(node.lineno)
print(f"๐ Variable '{node.id}' used at line {node.lineno}")
self.generic_visit(node)
def get_unused_variables(self):
# ๐ Find variables that are defined but never used
unused = []
for var, lines in self.definitions.items():
if var not in self.usages:
unused.append((var, lines))
return unused
# ๐ฎ Let's use it!
code = """
x = 10 # Used variable
y = 20 # Used variable
z = 30 # Unused variable!
result = x + y
print(f"Result: {result} ๐")
"""
tree = ast.parse(code)
analyzer = VariableAnalyzer()
analyzer.visit(tree)
print("\nโ ๏ธ Unused variables:")
for var, lines in analyzer.get_unused_variables():
print(f" Variable '{var}' defined at lines {lines} but never used! ๐ข")
๐ฏ Try it yourself: Extend this to detect variables used before definition!
๐ฎ Example 2: Function Complexity Checker
Letโs make it fun:
import ast
# ๐ Check function complexity
class ComplexityChecker(ast.NodeVisitor):
def __init__(self):
self.functions = {} # ๐ Store function complexities
self.current_function = None
self.complexity = 0
def visit_FunctionDef(self, node):
# ๐ฎ Start analyzing a function
self.current_function = node.name
self.complexity = 1 # Base complexity
print(f"\n๐ Analyzing function: {node.name}")
# Visit function body
self.generic_visit(node)
# ๐ Store results
self.functions[node.name] = {
'complexity': self.complexity,
'lines': node.end_lineno - node.lineno + 1,
'emoji': self.get_complexity_emoji(self.complexity)
}
self.current_function = None
def visit_If(self, node):
# ๐ Each if adds complexity
if self.current_function:
self.complexity += 1
print(f" Found if statement (+1 complexity) ๐")
self.generic_visit(node)
def visit_For(self, node):
# ๐ Each loop adds complexity
if self.current_function:
self.complexity += 1
print(f" Found for loop (+1 complexity) ๐")
self.generic_visit(node)
def visit_While(self, node):
# ๐ While loops too!
if self.current_function:
self.complexity += 1
print(f" Found while loop (+1 complexity) ๐")
self.generic_visit(node)
def get_complexity_emoji(self, complexity):
# ๐จ Fun complexity indicators!
if complexity <= 3:
return "โ
Simple"
elif complexity <= 7:
return "โก Moderate"
elif complexity <= 10:
return "โ ๏ธ Complex"
else:
return "๐จ Very Complex!"
def print_report(self):
# ๐ Generate complexity report
print("\n๐ Complexity Report:")
print("-" * 40)
for func, info in self.functions.items():
print(f"Function: {func}")
print(f" Complexity: {info['complexity']} {info['emoji']}")
print(f" Lines: {info['lines']}")
print()
# ๐ฎ Test it out!
code = """
def simple_function(x):
return x * 2
def moderate_function(items):
total = 0
for item in items:
if item > 0:
total += item
return total
def complex_function(data):
result = []
for row in data:
if row['status'] == 'active':
for item in row['items']:
if item['price'] > 100:
if item['category'] == 'electronics':
result.append(item)
return result
"""
tree = ast.parse(code)
checker = ComplexityChecker()
checker.visit(tree)
checker.print_report()
๐ Advanced Concepts
๐งโโ๏ธ Advanced Topic 1: AST Transformation
When youโre ready to level up, try transforming code:
import ast
import astor # pip install astor
# ๐ฏ Transform print statements to include emojis!
class EmojiTransformer(ast.NodeTransformer):
def visit_Call(self, node):
# โจ Transform print calls
if (isinstance(node.func, ast.Name) and
node.func.id == 'print' and
node.args):
# ๐ช Add emoji to the first argument
first_arg = node.args[0]
if isinstance(first_arg, ast.Constant):
# Add emoji to string literals
new_value = f"{first_arg.value} ๐"
node.args[0] = ast.Constant(value=new_value)
return self.generic_visit(node)
# ๐ช Using the transformer
code = """
print("Hello")
print("World")
x = 42
print("The answer is", x)
"""
tree = ast.parse(code)
transformer = EmojiTransformer()
new_tree = transformer.visit(tree)
# ๐จ Convert back to code
new_code = astor.to_source(new_tree)
print("Transformed code:")
print(new_code)
๐๏ธ Advanced Topic 2: Custom Code Linter
For the brave developers:
import ast
# ๐ Build a custom linter
class CustomLinter(ast.NodeVisitor):
def __init__(self):
self.issues = [] # ๐ Collect issues
def visit_FunctionDef(self, node):
# ๐ฏ Check function naming
if not node.name.startswith(('get_', 'set_', 'is_', 'has_')):
if node.name[0].isupper():
self.issues.append({
'line': node.lineno,
'type': 'naming',
'message': f"Function '{node.name}' should use snake_case ๐",
'emoji': 'โ ๏ธ'
})
# ๐ Check function length
func_length = node.end_lineno - node.lineno
if func_length > 20:
self.issues.append({
'line': node.lineno,
'type': 'complexity',
'message': f"Function '{node.name}' is {func_length} lines long (max: 20) ๐",
'emoji': '๐จ'
})
self.generic_visit(node)
def visit_Name(self, node):
# ๐ซ Check for bad variable names
bad_names = ['temp', 'data', 'obj', 'var', 'foo', 'bar']
if isinstance(node.ctx, ast.Store) and node.id in bad_names:
self.issues.append({
'line': node.lineno,
'type': 'naming',
'message': f"Variable '{node.id}' has a non-descriptive name ๐ค",
'emoji': '๐ก'
})
self.generic_visit(node)
def print_report(self):
if not self.issues:
print("โ
No issues found! Your code is clean! ๐")
else:
print(f"๐ Found {len(self.issues)} issues:\n")
for issue in sorted(self.issues, key=lambda x: x['line']):
print(f"{issue['emoji']} Line {issue['line']}: {issue['message']}")
โ ๏ธ Common Pitfalls and Solutions
๐ฑ Pitfall 1: Forgetting to Handle All Node Types
# โ Wrong way - incomplete visitor
class BadVisitor(ast.NodeVisitor):
def visit_FunctionDef(self, node):
print(f"Function: {node.name}")
# Forgot to call generic_visit! ๐ฐ
# Won't visit nested nodes!
# โ
Correct way - always call generic_visit
class GoodVisitor(ast.NodeVisitor):
def visit_FunctionDef(self, node):
print(f"Function: {node.name}")
self.generic_visit(node) # ๐ก๏ธ Visit child nodes!
๐คฏ Pitfall 2: Modifying AST While Visiting
# โ Dangerous - modifying during visit
class DangerousVisitor(ast.NodeVisitor):
def visit_Name(self, node):
node.id = "modified" # ๐ฅ Don't modify during visit!
self.generic_visit(node)
# โ
Safe - use NodeTransformer for modifications
class SafeTransformer(ast.NodeTransformer):
def visit_Name(self, node):
# โ
Return new node for modifications
if node.id == "old_name":
return ast.Name(id="new_name", ctx=node.ctx)
return node
๐ ๏ธ Best Practices
- ๐ฏ Use NodeVisitor for Analysis: Read-only operations
- ๐ Use NodeTransformer for Modifications: When changing AST
- ๐ก๏ธ Always Call generic_visit(): Donโt skip child nodes
- ๐จ Handle All Relevant Nodes: Cover your use cases
- โจ Test with Complex Code: Edge cases matter!
๐งช Hands-On Exercise
๐ฏ Challenge: Build a Docstring Checker
Create an AST-based tool that checks Python docstrings:
๐ Requirements:
- โ Check if functions have docstrings
- ๐ท๏ธ Validate docstring format (Google/NumPy style)
- ๐ค Check parameter documentation
- ๐ Find missing return value docs
- ๐จ Each issue needs a severity emoji!
๐ Bonus Points:
- Add support for class docstrings
- Check for proper type hints
- Generate a markdown report
๐ก Solution
๐ Click to see solution
import ast
import re
# ๐ฏ Our docstring checker!
class DocstringChecker(ast.NodeVisitor):
def __init__(self):
self.issues = []
self.stats = {
'total_functions': 0,
'documented': 0,
'missing_docs': 0
}
def visit_FunctionDef(self, node):
self.stats['total_functions'] += 1
# ๐ Check for docstring
docstring = ast.get_docstring(node)
if not docstring:
self.issues.append({
'line': node.lineno,
'function': node.name,
'issue': 'Missing docstring',
'severity': '๐จ',
'type': 'missing'
})
self.stats['missing_docs'] += 1
else:
self.stats['documented'] += 1
# ๐ Check docstring quality
self.check_docstring_quality(node, docstring)
self.generic_visit(node)
def check_docstring_quality(self, node, docstring):
# ๐จ Check for parameter documentation
params = [arg.arg for arg in node.args.args]
for param in params:
if param != 'self' and f"{param}:" not in docstring:
self.issues.append({
'line': node.lineno,
'function': node.name,
'issue': f"Parameter '{param}' not documented",
'severity': 'โ ๏ธ',
'type': 'incomplete'
})
# ๐ Check for return documentation
has_return = any(isinstance(n, ast.Return) and n.value
for n in ast.walk(node))
if has_return and "Returns:" not in docstring:
self.issues.append({
'line': node.lineno,
'function': node.name,
'issue': "Missing return value documentation",
'severity': '๐ก',
'type': 'incomplete'
})
def print_report(self):
# ๐ Print statistics
print("๐ Docstring Report")
print("=" * 40)
print(f"Total functions: {self.stats['total_functions']}")
print(f"Documented: {self.stats['documented']} โ
")
print(f"Missing docs: {self.stats['missing_docs']} โ")
if self.stats['total_functions'] > 0:
coverage = (self.stats['documented'] /
self.stats['total_functions'] * 100)
print(f"Coverage: {coverage:.1f}% {self.get_coverage_emoji(coverage)}")
# ๐ Print issues
if self.issues:
print(f"\n๐ Found {len(self.issues)} issues:\n")
for issue in sorted(self.issues, key=lambda x: x['line']):
print(f"{issue['severity']} Line {issue['line']} - "
f"{issue['function']}(): {issue['issue']}")
else:
print("\nโจ All functions are properly documented! ๐")
def get_coverage_emoji(self, coverage):
if coverage >= 90:
return "๐"
elif coverage >= 70:
return "โ
"
elif coverage >= 50:
return "โก"
else:
return "๐จ"
# ๐ฎ Test it out!
code = '''
def well_documented(x, y):
"""
Add two numbers together.
Args:
x: First number
y: Second number
Returns:
The sum of x and y
"""
return x + y
def missing_params(a, b, c):
"""This function adds numbers."""
return a + b + c
def no_docstring(data):
return len(data)
'''
tree = ast.parse(code)
checker = DocstringChecker()
checker.visit(tree)
checker.print_report()
๐ Key Takeaways
Youโve learned so much! Hereโs what you can now do:
- โ Parse Python code into ASTs with confidence ๐ช
- โ Analyze code structure without executing it ๐ก๏ธ
- โ Build custom code analysis tools like a pro ๐ฏ
- โ Transform code programmatically using AST manipulation ๐
- โ Create your own linters and checkers with Python! ๐
Remember: ASTs are powerful tools that let you understand and manipulate code at a structural level. Theyโre the foundation of many development tools! ๐ค
๐ค Next Steps
Congratulations! ๐ Youโve mastered Abstract Syntax Trees for code analysis!
Hereโs what to do next:
- ๐ป Practice with the exercises above
- ๐๏ธ Build a custom linter for your projectโs coding standards
- ๐ Explore the
ast
module documentation for more node types - ๐ Create a code refactoring tool using AST transformations!
Remember: Every Python tool you use (formatters, linters, IDEs) uses ASTs under the hood. Now you can build your own! Keep coding, keep learning, and most importantly, have fun! ๐
Happy coding! ๐๐โจ