So far you have learned to store data in variables and act on it with functions.
Variables hold information (strings, numbers, lists, dictionaries).
Functions perform actions — taking inputs, doing work, returning outputs.
This works beautifully for short scripts. But research code grows. The moment you are tracking many things — each of which has both data (a subject’s age, mass, species) and behaviour (schedule a recording, validate an entry) — keeping the data in loose dictionaries and the logic in separate floating functions starts to crack.
In this notebook we meet the tool built for exactly this problem: the Class. We will feel the pain of the dictionary-and-function approach first, then refactor it into clean, reliable classes, and finally meet decorators — the @ symbols that let us protect and supercharge our classes. Throughout, we build one running example drawn straight from this course: a small system for managing a behavioural study.
How to Use This Notebook Well¶
Run every code cell yourself, top to bottom — later cells depend on classes defined earlier.
When you meet a new keyword (
__init__,self,@property), pause and re-read the explanation before moving on.Mini Challenges and Checkpoints are for you to attempt — they are where the ideas stick.
Notice the recurring rhythm: we state a problem, write the messy version, then refactor it properly. Learning to recognise that “this is getting fragile — time for a class” instinct is the real goal.
Section 1: The Problem — Dictionaries & Floating Functions¶
Our running example: a Behavioral Study Data Management System¶
By Week 3 of this course you will be collecting data — videos and audio of your subjects. Behind every study is a quiet bookkeeping problem: who is being studied, by whom, and which recordings exist for each subject. Let us try to build that system with only the tools you already have — dictionaries and functions — and watch where it strains.
We need to track Researchers, the Subjects they study, and the Recording Sessions they schedule.
High-level workflow
A researcher registers in the system.
The system creates a
Researcherprofile (name, email, lab phone).The researcher registers one or more subjects (the animals or people being studied).
The researcher schedules video or audio recording sessions for a subject.
The system stores and manages these relationships.
Notice the data is already more complex than a few strings: researchers need unique IDs, they own lists of subjects (nested data), and they take actions (scheduling) that must be validated against allowed recording types.
1.1. Setup and Logic¶
First we define the system’s “rules”: a list of valid recording modalities, and the functions that act on a researcher’s data. Notice these functions are floating — they belong to no particular researcher; they just operate on whatever dictionary you hand them.
from uuid import uuid4 # generates a Universally Unique Identifier (a long random string)
# GLOBAL CONFIGURATION: the recording modalities our system supports.
# This list lives OUTSIDE any researcher; every function must "look it up" to validate.
MODALITY_TYPES = ["video", "audio"]
# A floating function to schedule recordings.
# It belongs to no researcher -- it is a "contractor" that operates on a dictionary
# passed in. Forget an argument, or mistype a key, and it breaks.
def schedule_recording(researcher_dict, subject_name, modality, dates):
# Step A: validation against the global list
if modality not in MODALITY_TYPES:
print(f"Error: '{modality}' is not a valid modality.")
return
# Step B: update the data -- append one session string per date
for date in dates:
entry = f"{date} | {subject_name} | {modality}"
researcher_dict["sessions"].append(entry)
print(f"Success: scheduled {entry} for {researcher_dict['name']}")
# Another floating function, separate from the first, to update a phone number.
def update_phone(researcher_dict, new_number):
researcher_dict["phone"] = new_number
print(f"Updated {researcher_dict['name']}'s phone to {new_number}.")
1.2. Creating Our First Researcher (The Data)¶
Now we hand-build a researcher as a dictionary. Here we are a careful “manual data-entry clerk”: a single typo in a key ("session" instead of "sessions") will make our functions fail later. We must also remember to call uuid4() ourselves and to initialise empty lists by hand.
researcher_01 = {
"id": str(uuid4()), # forget this line and there is no ID!
"name": "Asha",
"email": "asha@university.edu",
"phone": "555-0100",
# "subjects" is nested data: a list of dictionaries, one per studied individual.
"subjects": [
{"name": "Rex", "species": "Greyhound", "age": 3.0, "body_mass_kg": 27.0}
],
# "sessions" MUST be initialised as an empty list, or schedule_recording() will crash.
"sessions": []
}
print("--- Researcher Profile Created ---")
print(f"Researcher: {researcher_01['name']}")
print(f"ID assigned: {researcher_01['id']}")
print(f"Initial sessions: {researcher_01['sessions']}")
1.3. Executing an Action (Connecting Logic to Data)¶
The logic (schedule_recording) and the data (researcher_01) are separate entities. To make them work together, we must “inject” the dictionary into the function as an argument. The function then reaches inside, finds the "sessions" list, and modifies it.
print(f"--- Scheduling for {researcher_01['name']} ---")
# We must "plug" the dictionary INTO the function so it knows whose sessions to update.
schedule_recording(researcher_01, "Rex", "video", ["2026-03-01"])
schedule_recording(researcher_01, "Rex", "audio", ["2026-03-02", "2026-03-03"])
print("-" * 30)
print(f"{researcher_01['name']} now has {len(researcher_01['sessions'])} sessions.")
# A second floating function, called the same awkward way.
update_phone(researcher_01, "555-9999")
print(f"Updated phone: {researcher_01['phone']}")
1.4. Adding a New Researcher (The Scaling Problem)¶
What happens when a second researcher joins the lab? We must repeat the entire manual dictionary — and keep its structure identical. With 100 researchers we copy-paste this block 100 times. And if we ever rename "subjects" to "participants", we must hunt down and fix every single dictionary by hand.
researcher_02 = {
"id": str(uuid4()),
"name": "Diego", # must stay "name" -- not "Name" or "full_name"
"email": "diego@university.edu",
"phone": None, # Diego gave no phone; we keep the key with None
"subjects": [
{"name": "Finch-03", "species": "Zebra finch", "age": 1.0, "body_mass_kg": 0.013}
],
"sessions": []
}
print(f"--- Scheduling for {researcher_02['name']} ---")
schedule_recording(researcher_02, "Finch-03", "audio", ["2026-04-10", "2026-04-11"])
print("\nResearcher 02 sessions:", researcher_02["sessions"])
1.5. Why This Approach Breaks Down¶
Our prototype works, but it is fragile. The cracks are structural:
No schema enforcement. Nothing forces every researcher to have the same keys. If
researcher_01has"phone"butresearcher_02does not, code downstream hits aKeyError.Manual redundancy. Rename one key and you must edit every dictionary and every function that touches it — an open invitation to bugs.
Data inconsistency. Nothing stops
"subjects"in one record and"subject"in another. Retrieval logic becomes fragile guesswork.Decoupled logic and state. The function has no intrinsic connection to the data; it only acts on whatever you explicitly pass in.
High cognitive load. To write even one line, you must keep the entire nested structure — every key, every type — in your head.
And consuming the data is even harder than building it. Suppose you want “the first subject’s name and the most recent session.” You must reach into the "subjects" list, check it is not empty, grab element [0], read its "name", then reach into "sessions" and grab element [-1] — checking at every step that the key and index exist so the program does not crash.
There is a better way. We need something that bundles the data and its behaviour into one reliable unit. That is a Class.
Section 2: The Solution — Classes & Object-Oriented Design¶
2.1. Concept: From Functions to Objects¶
A Class is a formal blueprint for a kind of thing. It guarantees every object built from it starts with the right attributes, automates repetitive setup (like ID generation), and keeps the relevant logic attached as methods.
This buys us two things immediately:
Schema consistency — every
Researcheris guaranteed to have a name, an email, a subject list.Syntactic clarity — we read data with dot notation (
researcher.name) instead of error-prone string keys (researcher["name"]).
Table 1 — Structural shift
| Concept | Functional approach | Object-Oriented approach |
|---|---|---|
| Data storage | loose variables / dictionaries | Attributes (variables attached to an object) |
| Logic / action | floating functions | Methods (functions attached to an object) |
| Organisation | decoupled blocks | Encapsulation (one structured unit) |
Table 2 — Core terminology
| Idea | Python term | Meaning |
|---|---|---|
| Blueprint | class | the definition of a type (the concept of a “Researcher”) |
| Instance | object | a specific realisation (the researcher “Asha”) |
| State | attribute | a variable inside an object (asha.email) |
| Behaviour | method | a function belonging to an object (asha.update_phone()) |
Table 3 — Modelling our study system
| Entity | What its class manages |
|---|---|
| Researcher | identity, contact info, and the subjects + sessions they own |
| Subject | the studied individual’s record (name, species, age, body mass) and its owner link |
| RecordingSession | a single capture: date, modality, derived storage size, status |
| Modality | the recording-type rules (video / audio) and storage estimates |
2.2. Class Design Best Practices¶
Before writing a class, two conventions and one design question.
Naming (PEP 8):
Classes use
PascalCase— every word capitalised, no underscores:Researcher,RecordingSession.Attributes, methods, variables use
snake_case— lowercase with underscores:body_mass_kg,update_phone().
Where should the logic live? The Single Responsibility Principle says a method belongs in the class whose data it primarily touches. update_phone() changes a researcher’s contact info, so it lives in Researcher. Categorising a subject by body mass touches the subject’s data, so get_size_category() lives in Subject.
Which class do we build first? We answer with dependencies. A Researcher can exist before registering any subject (their subject list simply starts empty). But a Subject cannot exist as an orphan — it needs an owner to be meaningful. Because Subject depends on Researcher, we define Researcher first.
2.3. Implementing the Researcher Class¶
Below is the standard anatomy of a Python class. Three improvements over the dictionary approach are worth naming as you read:
Automated setup — the
researcher_idis generated automatically; no one can forget it.Schema enforcement — creating a
Researcherrequires a name and email, so every instance is consistent.Encapsulated logic —
schedule_recordingis now a method that validates against the object’s own subject list before changing its own session history.
from uuid import uuid4
MODALITY_TYPES = ["video", "audio"]
class Researcher:
"""A person running a behavioural study. Owns subjects and recording sessions."""
def __init__(self, name: str, email: str,
subject_names: list | None = None, phone: str | None = None):
"""The constructor: sets up the object's initial state."""
# DYNAMIC ATTRIBUTES (provided at creation)
self.name = name
self.email = email
self.phone = phone
# DATA STRUCTURES (initialised automatically so they always exist)
self.subjects = subject_names if subject_names is not None else []
self.sessions = []
# AUTOMATED METADATA (generated, never forgotten)
self.researcher_id = str(uuid4())
def update_phone(self, new_number: str):
"""Update this researcher's phone number."""
self.phone = new_number
print(f"[Update] {self.name}'s phone is now {self.phone}")
def schedule_recording(self, subject_name: str, modality: str, date_list: list):
"""Record one or more sessions, validating against THIS researcher's subjects."""
print(f"--- Scheduling {modality} for {subject_name} ---")
# VALIDATION: does this researcher actually study that subject?
if subject_name not in self.subjects:
print(f"Error: {self.name} has no registered subject '{subject_name}'. Register it first.")
return
if modality not in MODALITY_TYPES:
print(f"Error: '{modality}' is not a valid modality.")
return
# PROCESSING: append one structured record per date
for date in date_list:
entry = f"{date} | {subject_name} | {modality}"
self.sessions.append(entry)
print(f"Confirmed for {date}")
2.4. Concept: Deconstructing the Class Anatomy¶
Several new pieces appeared above. Here is what each one does.
The module import (uuid). from uuid import uuid4 pulls a ready-made function from Python’s standard library to generate unique ID strings — so we never hand-assign IDs.
The constructor __init__. A special method Python runs automatically the moment you create an object. The double underscores make it a “dunder” (double-underscore) method, reserved by the language. Its job is to set up the object’s starting state.
The instance reference self. Think of the class as a blank form and self as the specific copy currently being filled in. When you write asha = Researcher("Asha", ...), Python passes that new object in as self, so self.name = name writes “Asha” into Asha’s memory, not anyone else’s.
Type hints & defaults. In (name: str, email: str, phone: str | None = None):
: stris a type hint — documentation thatnameshould be a string.str | Nonemeans the value may be a string orNone.= Noneis a default — omitphoneand Python fills inNoneinstead of crashing.
Parameters vs. Attributes (a critical distinction)¶
These often share a name inside __init__ but play different roles:
Parameters are the temporary inputs in the function signature (
def __init__(self, name, ...)). They carry data in, then vanish when__init__finishes.Attributes are the permanent storage attached with
self(self.name). They live for the whole life of the object and are read later with dot notation.
The single line that bridges them is self.name = name, which means: “take the value in the temporary parameter name, and store it permanently in this object’s attribute self.name.”
Core rule: parameters are temporary inputs to a function; attributes are the persistent state of an object.
2.5. Concept: Instantiation and Dot Notation¶
Creating an object from a class is called instantiation. Instead of hand-building a dictionary, we call the class by name and pass the initialisation arguments; Python runs __init__ and hands back a new object.
A note on how we pass arguments. Python allows positional arguments (Researcher("Asha", "asha@university.edu")), which rely on order. Professionals prefer keyword arguments (Researcher(name="Asha", email="asha@university.edu")) because they are unambiguous and typo-proof — you cannot accidentally swap two strings.
Key Characteristics:
Read attributes with dot notation:
researcher.name, notresearcher["name"].Each object has its own isolated state — Asha’s
nameand Diego’snamelive in separate memory.The class now behaves like a custom data type, as predictable as
intorlist.
# INSTANTIATION using keyword arguments (clear and safe)
researcher_01 = Researcher(name="Asha", email="asha@university.edu", phone="555-0100")
researcher_01.subjects = ["Rex", "Bolt"] # register subject names for now
researcher_02 = Researcher(name="Diego", email="diego@university.edu")
researcher_02.subjects = ["Finch-03"]
# Reading state via dot notation
print(f"Name : {researcher_01.name}")
print(f"Email: {researcher_01.email}")
print(f"ID : {researcher_01.researcher_id}")
2.6. Calling Methods on an Object¶
Now we trigger the methods with dot notation. The object manages its own state and enforces its own validation — no floating functions, no data injection.
System note: if you re-run the instantiation cell, the
researcher_idchanges every time, becauseuuid4()mints a fresh string on each new object.
print(f"--- {researcher_01.name}: initial phone = {researcher_01.phone} ---")
researcher_01.update_phone("555-9999")
print("\n--- Scheduling sessions ---")
# VALID: "Rex" is in researcher_01's own subject list
researcher_01.schedule_recording("Rex", "video", ["2026-03-01", "2026-03-02"])
# BLOCKED: "Finch-03" belongs to researcher_02; this object's validation catches it
researcher_01.schedule_recording("Finch-03", "audio", ["2026-04-10"])
print(f"\n--- {researcher_01.name}'s session history ---")
for record in researcher_01.sessions:
print(" -", record)
Mini Challenge 2.6. Create a third researcher with your own name and email, register two subjects, schedule one video session for each, then print how many sessions they now hold.
2.7. Concept: A Dynamic Lifecycle (adding a Subject later)¶
Real data is rarely static. A researcher might sign up today and register a subject next week. Because we defaulted subject_names to None (and converted it to an empty list inside __init__), a researcher can safely exist with no subjects, then gain them over time.
Now we also build the Subject class itself — the studied individual. It carries its measurements and a get_size_category() method that bins it by body mass, the kind of derived label you will compute constantly in this course.
from uuid import uuid4
class Subject:
"""A studied individual (animal or person) linked to a researcher by owner_id."""
def __init__(self, name: str, species: str, age: float,
body_mass_kg: float, owner_id: str):
self.name = name
self.species = species
self.age = age
self.body_mass_kg = body_mass_kg
self.owner_id = owner_id # foreign key linking to a Researcher
self.subject_id = str(uuid4())
def get_size_category(self):
"""Bin the subject by body mass -- a simple derived label."""
if self.body_mass_kg < 5:
return "Small"
elif self.body_mass_kg < 30:
return "Medium"
else:
return "Large"
# --- Dynamic lifecycle demo ---
researcher_03 = Researcher(name="Mei", email="mei@university.edu")
print(f"Created {researcher_03.name}; subjects = {researcher_03.subjects}")
# Attempt a booking with no subjects yet -> validation blocks it
researcher_03.schedule_recording("Sparrow-12", "video", ["2026-07-01"])
# Register the subject NAME, then re-attempt -> now it works
researcher_03.subjects.append("Sparrow-12")
print(f"\n{researcher_03.name} registered a subject: {researcher_03.subjects}")
researcher_03.schedule_recording("Sparrow-12", "video", ["2026-07-01"])
# Create the formal Subject profile, linked by the researcher's ID
sparrow = Subject(name="Sparrow-12", species="House sparrow", age=1.0,
body_mass_kg=0.03, owner_id=researcher_03.researcher_id)
print(f"\nSubject profile: {sparrow.name} ({sparrow.species}), "
f"size = {sparrow.get_size_category()}, owner = {sparrow.owner_id[:8]}...")
2.8. Concept: The RecordingSession Class & Object Composition¶
So far a “session” was just a text string like "2026-03-01 | Rex | video". A string cannot estimate its own storage size or carry a status. Let us promote it to a real RecordingSession object.
Notice the class derives its storage_gb automatically from the modality — video sessions are far larger than audio. This is encapsulated business logic: the object computes its own properties, so they are always consistent.
Then we will see object composition: instead of storing a flat string in a researcher’s sessions list, we store the entire RecordingSession object — and gain access to all its attributes through chained dot notation (researcher.sessions[0].storage_gb).
from uuid import uuid4
# SYSTEM CONSTANTS (PEP 8: constants in ALL_CAPS) -- estimated storage per session.
VIDEO_GB_PER_SESSION = 12.0
AUDIO_GB_PER_SESSION = 0.5
class RecordingSession:
"""One capture, connecting a Researcher, a Subject, a modality, and a date."""
def __init__(self, researcher_id: str, subject_id: str, modality: str, date: str):
self.researcher_id = researcher_id
self.subject_id = subject_id
self.modality = modality.lower() # standardise input
self.date = date
# DERIVED STATE: storage estimate is computed from the modality.
if self.modality == "video":
self.storage_gb = VIDEO_GB_PER_SESSION
elif self.modality == "audio":
self.storage_gb = AUDIO_GB_PER_SESSION
else:
self.storage_gb = 0.0
print(f"[Warning] Unrecognised modality: {self.modality}")
self.session_id = str(uuid4())
self.status = "Scheduled" # default state at creation
Now we compose the objects together — a researcher, a subject, and a session — and inspect them through dot notation.
# Create the core entities
researcher_01 = Researcher(name="Asha", email="asha@university.edu", phone="555-0100")
rex = Subject(name="Rex", species="Greyhound", age=3.0,
body_mass_kg=27.0, owner_id=researcher_01.researcher_id)
researcher_01.subjects.append(rex.name) # register the name for validation
# Create a formal RecordingSession object linking researcher + subject
session_01 = RecordingSession(researcher_id=researcher_01.researcher_id,
subject_id=rex.subject_id,
modality="video", date="2026-03-05")
# OBJECT COMPOSITION: store the OBJECT itself, not a string
researcher_01.sessions.append(session_01)
print(f"--- System status for {researcher_01.name} ---")
print(f"Registered subjects: {len(researcher_01.subjects)} | Sessions: {len(researcher_01.sessions)}")
print("\n--- Inspecting the stored session object ---")
first = researcher_01.sessions[0] # this is a RecordingSession object
print(f"Date : {first.date}")
print(f"Modality: {first.modality.capitalize()}")
print(f"Storage : {first.storage_gb:.1f} GB")
print(f"Status : {first.status}")
print(f"Ref ID : {first.session_id[:8]}...")
print(f"\nSubject size category: {rex.get_size_category()}")
2.9. Concept: Automating Composition (passing whole objects)¶
Until now we linked entities with ID strings (owner_id). OOP allows something more powerful: passing the whole object as a reference. This grants instant access to the owner’s attributes and methods, and lets us automate the bookkeeping.
Three upgrades in the refactor below:
Two-way linking. When we create a
Subject, we pass the actualResearcherobject. Inside its constructor the subject appends itself to the owner’s subject list — automatic registration.A factory method.
Researcher.schedule_recordingnow manufacturesRecordingSessionobjects internally and stores them, instead of appending strings.System logging. Each
__init__prints a line, creating an automatic audit trail.
from uuid import uuid4
VIDEO_GB_PER_SESSION = 12.0
AUDIO_GB_PER_SESSION = 0.5
class Researcher:
"""Central hub: owns Subject objects and manufactures RecordingSession objects."""
def __init__(self, name: str, email: str, phone: str | None = None):
self.name = name
self.email = email
self.phone = phone
self.subjects = [] # now stores actual Subject objects
self.sessions = [] # now stores actual RecordingSession objects
self.researcher_id = str(uuid4())
print(f"[System] Created Researcher: {self.name} | {self.email} | {self.phone}")
def update_phone(self, new_number: str):
self.phone = new_number
print(f"[Update] {self.name}'s phone -> {self.phone}")
def schedule_recording(self, subject_name: str, modality: str, date_list: list):
"""Factory method: builds RecordingSession objects internally."""
print(f"\n--- Request: {modality.capitalize()} for {subject_name} ---")
# VALIDATION: search this researcher's list of Subject objects by name
target = None
for s in self.subjects:
if s.name == subject_name:
target = s
break
if target is None:
print(f"Error: {self.name} has no registered subject '{subject_name}'.")
return
# COMPOSITION: manufacture and store a RecordingSession per date
for date in date_list:
self.sessions.append(
RecordingSession(researcher=self, subject=target,
modality=modality, date=date))
class Subject:
"""A studied individual that registers ITSELF to its owner on creation."""
def __init__(self, name: str, species: str, body_mass_kg: float,
age: float, owner: Researcher):
self.name = name
self.species = species
self.body_mass_kg = body_mass_kg
self.age = age
self.owner = owner # store the whole Researcher object
self.subject_id = str(uuid4())
# AUTOMATED REGISTRATION: add myself to my owner's subject list
self.owner.subjects.append(self)
print(f"[System] Created Subject: {self.name} ({self.species}) "
f"-> owner {self.owner.name}")
def get_size_category(self):
if self.body_mass_kg < 5:
return "Small"
elif self.body_mass_kg < 30:
return "Medium"
return "Large"
class RecordingSession:
"""Transactional record connecting a Researcher, a Subject, and a modality."""
def __init__(self, researcher: Researcher, subject: Subject, modality: str, date: str):
self.researcher = researcher
self.subject = subject
self.modality = modality.lower()
self.date = date
self.storage_gb = (VIDEO_GB_PER_SESSION if self.modality == "video"
else AUDIO_GB_PER_SESSION if self.modality == "audio" else 0.0)
self.session_id = str(uuid4())
self.status = "Scheduled"
print(f" [Session] {self.modality.capitalize()} on {self.date} "
f"| {self.subject.name} | {self.storage_gb:.1f} GB | {self.status}")
# --- SYSTEM TEST: watch the automated logging tell the story ---
# 1. Create the researcher
asha = Researcher("Asha", "asha@university.edu", "555-0100")
# 2. Create the subject -> it auto-registers to Asha
rex = Subject("Rex", "Greyhound", 27.0, 3.0, owner=asha)
# 3. Schedule recordings -> the factory builds RecordingSession objects
asha.schedule_recording("Rex", "video", ["2026-03-10", "2026-03-11"])
# 4. Verify the composed state
print(f"\nAsha now owns {len(asha.subjects)} subject(s) and {len(asha.sessions)} session(s).")
total_gb = sum(s.storage_gb for s in asha.sessions)
print(f"Estimated storage for this study so far: {total_gb:.1f} GB")
Checkpoint 2. You have now built three linked classes and seen a subject register itself and a researcher manufacture sessions. In one sentence each, explain: what
selfrefers to, why we store whole objects instead of ID strings, and what “encapsulation” buys us. If any is fuzzy, revisit that subsection.
Section 3: Decorators — Protecting and Supercharging Classes¶
3.1. Concept: What Is a Decorator?¶
As a system matures, we often need to modify or control a function’s behaviour without rewriting its internals. A decorator does exactly that: it “wraps” a function or method with extra behaviour. Think of it as a gatekeeper that intercepts a call, runs some logic before or after, then lets the original run.
You spot a decorator by the @ symbol sitting directly above a function, method, or class definition.
3.2. The Decorator Ecosystem¶
Anyone can write a decorator, so there are effectively infinitely many. They fall into three groups by where they come from:
| Source | Availability | Examples |
|---|---|---|
| Built-in | no import needed | @property, @classmethod, @staticmethod |
| Standard library | import, no install | @dataclass (from dataclasses), @lru_cache (from functools) |
| Third-party | pip install + import | @app.get(...) (FastAPI), @tool (LangChain) |
A few you will actually meet:
@property— lets a method be read like a plain attribute (researcher.email), giving controlled, read-only access.@<attr>.setter— the gatekeeper that runs whenever someone assigns to an attribute, so you can validate before saving.@dataclass— auto-writes the boilerplate (__init__, a readable printout) for data-holding classes.@lru_cache— remembers expensive results so repeated calls with the same inputs return instantly (handy for heavy computations on your data).@tool(AI frameworks) — exposes a Python function to a large language model so an AI agent can call it. Increasingly relevant as you integrate LLMs into research workflows.
3.3. Concept: @property and @setter for Data Integrity¶
Right now nothing stops a typo from corrupting our data — researcher.email = 12345 would silently succeed. We can prevent this with encapsulation: store the real value in a protected attribute (by convention, a leading underscore: _email) and route every read and write through a @property getter and an @email.setter gatekeeper. The user still writes researcher.email = "..." as if it were an ordinary attribute, but our validation runs first.
from uuid import uuid4
class Researcher:
def __init__(self, name: str, email: str, phone: str | None = None):
self.name = name
self.email = email # this immediately triggers the setter below!
self.phone = phone
self.subjects = []
self.sessions = []
self.researcher_id = str(uuid4())
print(f"[System] Created Researcher: {self.name}")
# THE GETTER: lets us read 'researcher.email' (no parentheses)
@property
def email(self):
return self._email
# THE SETTER (gatekeeper): runs on every assignment to 'self.email'
@email.setter
def email(self, value: str):
print(f"--- [Validation] attempting to set email = {value} ---")
if not isinstance(value, str):
print(" Rejected: email must be a string.")
return
if "@" not in value or "." not in value:
print(f" Rejected: '{value}' is not a valid email format.")
return
self._email = value # passed validation -> save to protected attribute
print(" Accepted: email validated and saved.")
# --- TESTING THE GATEKEEPER ---
r = Researcher("Alex", "alex@university.edu") # A. valid email at creation
print("\n[Test 1] assign an invalid string (no '@'):")
r.email = "alex_at_university.edu" # B. rejected
print("\n[Test 2] assign a number:")
r.email = 123456 # C. rejected
print(f"\n[Final state] {r.name}'s email is still: {r.email}") # protected
3.4. Concept: Cutting Boilerplate with @dataclass¶
Writing a robust class involves a lot of repetitive “boilerplate” — the __init__, the self.x = x mapping, a readable printout. For classes whose main job is holding data, the standard-library @dataclass decorator writes all of that for you.
Key Characteristics:
Auto
__init__— it reads your type hints (name: str) and builds the constructor.Readable printout — printing the object shows
Researcher(name='Asha', ...)instead of a cryptic memory address.Safe mutable defaults — never write
dogs=[]as a default (all instances would secretly share one list).field(default_factory=list)gives each object its own fresh list.
from dataclasses import dataclass, field
from typing import List, Optional
from uuid import uuid4
@dataclass
class Modality:
"""A lightweight record describing a recording modality."""
name: str
base_storage_gb: float
description: str
modality_id: str = field(default_factory=lambda: str(uuid4()))
@dataclass
class Researcher:
name: str # required
email: str # required
phone: Optional[str] = None # optional
subjects: List[str] = field(default_factory=list) # fresh list per object
sessions: List[str] = field(default_factory=list)
researcher_id: str = field(default_factory=lambda: str(uuid4()))
def schedule_recording(self, subject_name: str, modality: str, date_list: List[str]):
if subject_name not in self.subjects:
print(f"Error: {self.name} does not study '{subject_name}'")
return
for date in date_list:
self.sessions.append(f"{date} | {subject_name} | {modality}")
print(f"Confirmed for {date}")
@dataclass
class Subject:
name: str
species: str
body_mass_kg: float
age: float
owner_id: str
subject_id: str = field(default_factory=lambda: str(uuid4()))
def get_size_category(self):
if self.body_mass_kg < 5: return "Small"
elif self.body_mass_kg < 30: return "Medium"
return "Large"
# --- Execution: notice how clean instantiation and printing become ---
david = Researcher(name="David", email="david@university.edu", subjects=["Bolt"])
print("Automatic representation:")
print(david) # readable, thanks to @dataclass
print("\nTesting an encapsulated method:")
david.schedule_recording("Bolt", "video", ["2026-08-15"])
video = Modality("video", 12.0, "High-frame-rate pose capture")
print("\nModality record:", video)
Section 4: Summary & Capstone Challenge¶
You have moved from flat, procedural code to a structured, object-oriented design. Thinking in entities rather than loose variables gives you a foundation that scales.
What you mastered:
Blueprints & instantiation — strict
Researcher/Subjectclasses replace fragile dictionaries, guaranteeing a consistent schema.Object composition — passing whole objects (a
Subjectcarries itsResearcher) letsRecordingSessionrecords bind everyone together.Encapsulation — internal validation and
@propertyprotect each object’s state from bad data.Rapid prototyping —
@dataclassremoves boilerplate for data-holding classes.
Capstone Challenge: Recording-Lab Capacity Management¶
Your study runs in a physical recording lab with a fixed number of capture stations. Right now the system allows unlimited check-ins; reality has limits. Build the management layer that governs the space.
Requirements:
Blueprint — define a
@dataclassnamedRecordingLab.State — give it:
lab_name: strmax_capacity: int(e.g. 8 simultaneous capture stations)checked_in_subjects: List[Subject](usefield(default_factory=list))
check_in_subject(self, subject: Subject)—Validation 1: if
len(self.checked_in_subjects)has reachedmax_capacity, reject with a “Capacity reached” warning.Validation 2: if the subject is already checked in, print a warning.
Otherwise: append the
Subjectand print a success message.
checkout_subject(self, subject_id: str)— find the matching subject, remove it, and free up a station.
This exercises object composition, internal state validation, and modelling a real-world constraint — exactly the skills you will lean on when your own data-collection pipeline has to respect equipment and storage limits.
End of Notebook 05. You now have the full foundational toolkit — data, automation, visualization, computational thinking, and object-oriented design — ready to apply to the computer-vision, audio, and statistics modules ahead.