🐍 How Python Works¶

Understanding the Python interpreter, bytecode, and the GIL

What You'll Learn¶

How Python executes your code
What is bytecode and the PVM
The Global Interpreter Lock (GIL)
How .pyc files work
Performance implications

Python is an Interpreted Language¶

Unlike C or C++ which are compiled to machine code, Python code is interpreted.

But it's not as simple as "line by line":

Your Code (.py)
      ↓
[Parser] → AST (Abstract Syntax Tree)
      ↓
[Compiler] → Bytecode (.pyc)
      ↓
[PVM - Python Virtual Machine] → Execution

Two stages: 1. Compilation: Source code → Bytecode 2. Interpretation: Bytecode → Execution

Step 1: Parsing¶

Python first parses your code into an Abstract Syntax Tree (AST):

# Your code
def greet(name):
    return f"Hello, {name}!"

# Becomes a tree structure:
# Module
#   └── FunctionDef (name='greet')
#         ├── arguments (name='name')
#         └── Return
#               └── JoinedStr (f-string)

This checks for syntax errors before execution.

Step 2: Compilation to Bytecode¶

Bytecode is a platform-independent intermediate representation:

# Python source
def add(a, b):
    return a + b

# Bytecode (simplified)
# LOAD_FAST     0 (a)    # Push 'a' onto stack
# LOAD_FAST     1 (b)    # Push 'b' onto stack
# BINARY_ADD             # Pop two values, add, push result
# RETURN_VALUE           # Return top of stack

Think of bytecode as "assembly language for the Python Virtual Machine."

Step 3: The Python Virtual Machine (PVM)¶

The PVM is a stack-based virtual machine:

Execution Stack:
┌─────────────┐
│             │  ← Top (operations work here)
├─────────────┤
│             │
├─────────────┤
│             │
└─────────────┘

The PVM executes bytecode instructions one by one, manipulating the stack.

Viewing Bytecode¶

Use the dis module to see bytecode:

import dis

def add(a, b):
    return a + b

dis.dis(add)

Output:

  4           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_ADD
              6 RETURN_VALUE

Column meanings: - Line number - Byte offset - Opcode name - Opcode argument - Human-readable explanation

Common Bytecode Instructions¶

Instruction	Description
`LOAD_FAST`	Load local variable onto stack
`LOAD_GLOBAL`	Load global variable onto stack
`LOAD_CONST`	Load constant onto stack
`STORE_FAST`	Store value to local variable
`BINARY_ADD`	Add top two stack values
`BINARY_MULTIPLY`	Multiply top two stack values
`COMPARE_OP`	Compare top two values
`POP_JUMP_IF_FALSE`	Conditional jump
`CALL_FUNCTION`	Call a function
`RETURN_VALUE`	Return top of stack

.pyc Files and pycache¶

When you import a module, Python: 1. Compiles it to bytecode 2. Saves it in __pycache__/ as .pyc file 3. Reuses it next time (if source hasn't changed)

my_project/
├── my_module.py
└── __pycache__/
    └── my_module.cpython-39.pyc

Why? Loading bytecode is faster than recompiling source code!

The Global Interpreter Lock (GIL)¶

The GIL is a mutex that protects access to Python objects:

What the GIL Does¶

Only one thread can execute Python bytecode at a time
Prevents race conditions on Python objects
Simplifies memory management

Why Have the GIL?¶

Reference counting safety: Prevents race conditions when incrementing/decrementing reference counts
Simpler C extensions: Single-threaded access makes writing C extensions easier
Single-threaded performance: Faster for single-threaded programs

The Problem¶

import threading

def cpu_bound_task():
    """This won't run faster with threads due to GIL."""
    count = 0
    for i in range(10000000):
        count += 1

# These threads run sequentially, not in parallel!
t1 = threading.Thread(target=cpu_bound_task)
t2 = threading.Thread(target=cpu_bound_task)

CPU-bound tasks don't benefit from threads in Python!

Workarounds¶

Approach	Use Case
multiprocessing	CPU-bound tasks (true parallelism)
asyncio	I/O-bound tasks (concurrency)
C extensions	Release GIL (NumPy does this)
Alternative Python	Jython, IronPython (no GIL)

CPython vs Other Implementations¶

Implementation	Description	GIL?
CPython	The standard Python (written in C)	Yes
Jython	Python on the JVM	No
IronPython	Python on .NET	No
PyPy	Python with JIT compiler	Yes (but faster)
MicroPython	For microcontrollers	Yes

Performance Considerations¶

Compiled vs Interpreted¶

Language	Type	Speed
C/C++	Compiled to machine code	Fastest
Java	Compiled to bytecode + JIT	Fast
Python	Interpreted bytecode	Slower

Why is Python slower? - Bytecode interpretation overhead - Dynamic typing (type checking at runtime) - Object model overhead

When Python is Fast Enough¶

I/O-bound tasks (web requests, file operations)
Glue code (coordinating other systems)
Rapid development
Data science (NumPy uses optimized C)

Optimization Strategies¶

Use built-in functions (written in C)
Use appropriate data structures
Use libraries like NumPy/Pandas (C extensions)
Profile before optimizing
Consider Cython or Numba for hot paths
Use multiprocessing for CPU-bound work

How Import Works¶

import mymodule

What Python does: 1. Check sys.modules (cache) - already loaded? 2. Find the file (.py, .pyc, .so, etc.) 3. Load/compile the bytecode 4. Execute the module code 5. Store in sys.modules 6. Bind to local name

Why check cache first? Prevent circular imports and speed up repeated imports.

Common Mistakes ⚠️¶

1. Relying on bytecode¶

# Don't manually edit .pyc files!
# They can be regenerated and deleted anytime

2. Ignoring the GIL¶

# Don't expect thread speedup for CPU-bound work
import threading

# Use multiprocessing instead:
from multiprocessing import Pool

3. Assuming Python is always slow¶

# Python + NumPy can beat naive C!
import numpy as np

# This is fast (implemented in C)
arr = np.random.rand(1000000)
result = np.sum(arr)

Try It Out! 🚀¶

Run examples.py to see Python internals in action:

python examples.py

Then try exercises.py to practice:

python exercises.py

Key Takeaways¶

Python code is compiled to bytecode then interpreted by the PVM
Bytecode is platform-independent intermediate code
Use the dis module to view bytecode
.pyc files cache bytecode for faster loading
The GIL allows only one thread to execute Python code at a time
For CPU-bound parallelism, use multiprocessing, not threading
Python's "slowness" is often acceptable and worth the productivity gain

Previous: Memory Architecture ←
Back to Module: CS Fundamentals