Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Python Essentials for Research in Behavioral Sciences: Organizing Code with Classes & Decorators

Harvard University

So far you have learned to store data in variables and act on it with functions.

  • Variables hold information (strings, numbers, lists, dictionaries).

  • Functions perform actions — taking inputs, doing work, returning outputs.

This works beautifully for short scripts. But research code grows. The moment you are tracking many things — each of which has both data (a subject’s age, mass, species) and behaviour (schedule a recording, validate an entry) — keeping the data in loose dictionaries and the logic in separate floating functions starts to crack.

In this notebook we meet the tool built for exactly this problem: the Class. We will feel the pain of the dictionary-and-function approach first, then refactor it into clean, reliable classes, and finally meet decorators — the @ symbols that let us protect and supercharge our classes. Throughout, we build one running example drawn straight from this course: a small system for managing a behavioural study.

How to Use This Notebook Well

  • Run every code cell yourself, top to bottom — later cells depend on classes defined earlier.

  • When you meet a new keyword (__init__, self, @property), pause and re-read the explanation before moving on.

  • Mini Challenges and Checkpoints are for you to attempt — they are where the ideas stick.

  • Notice the recurring rhythm: we state a problem, write the messy version, then refactor it properly. Learning to recognise that “this is getting fragile — time for a class” instinct is the real goal.


Section 1: The Problem — Dictionaries & Floating Functions

Our running example: a Behavioral Study Data Management System

By Week 3 of this course you will be collecting data — videos and audio of your subjects. Behind every study is a quiet bookkeeping problem: who is being studied, by whom, and which recordings exist for each subject. Let us try to build that system with only the tools you already have — dictionaries and functions — and watch where it strains.

We need to track Researchers, the Subjects they study, and the Recording Sessions they schedule.

High-level workflow

  1. A researcher registers in the system.

  2. The system creates a Researcher profile (name, email, lab phone).

  3. The researcher registers one or more subjects (the animals or people being studied).

  4. The researcher schedules video or audio recording sessions for a subject.

  5. The system stores and manages these relationships.

Notice the data is already more complex than a few strings: researchers need unique IDs, they own lists of subjects (nested data), and they take actions (scheduling) that must be validated against allowed recording types.

1.1. Setup and Logic

First we define the system’s “rules”: a list of valid recording modalities, and the functions that act on a researcher’s data. Notice these functions are floating — they belong to no particular researcher; they just operate on whatever dictionary you hand them.

from uuid import uuid4   # generates a Universally Unique Identifier (a long random string)

# GLOBAL CONFIGURATION: the recording modalities our system supports.
# This list lives OUTSIDE any researcher; every function must "look it up" to validate.
MODALITY_TYPES = ["video", "audio"]

# A floating function to schedule recordings.
# It belongs to no researcher -- it is a "contractor" that operates on a dictionary
# passed in. Forget an argument, or mistype a key, and it breaks.
def schedule_recording(researcher_dict, subject_name, modality, dates):
    # Step A: validation against the global list
    if modality not in MODALITY_TYPES:
        print(f"Error: '{modality}' is not a valid modality.")
        return
    # Step B: update the data -- append one session string per date
    for date in dates:
        entry = f"{date} | {subject_name} | {modality}"
        researcher_dict["sessions"].append(entry)
        print(f"Success: scheduled {entry} for {researcher_dict['name']}")

# Another floating function, separate from the first, to update a phone number.
def update_phone(researcher_dict, new_number):
    researcher_dict["phone"] = new_number
    print(f"Updated {researcher_dict['name']}'s phone to {new_number}.")

1.2. Creating Our First Researcher (The Data)

Now we hand-build a researcher as a dictionary. Here we are a careful “manual data-entry clerk”: a single typo in a key ("session" instead of "sessions") will make our functions fail later. We must also remember to call uuid4() ourselves and to initialise empty lists by hand.

researcher_01 = {
    "id": str(uuid4()),            # forget this line and there is no ID!
    "name": "Asha",
    "email": "asha@university.edu",
    "phone": "555-0100",
    # "subjects" is nested data: a list of dictionaries, one per studied individual.
    "subjects": [
        {"name": "Rex", "species": "Greyhound", "age": 3.0, "body_mass_kg": 27.0}
    ],
    # "sessions" MUST be initialised as an empty list, or schedule_recording() will crash.
    "sessions": []
}

print("--- Researcher Profile Created ---")
print(f"Researcher: {researcher_01['name']}")
print(f"ID assigned: {researcher_01['id']}")
print(f"Initial sessions: {researcher_01['sessions']}")

1.3. Executing an Action (Connecting Logic to Data)

The logic (schedule_recording) and the data (researcher_01) are separate entities. To make them work together, we must “inject” the dictionary into the function as an argument. The function then reaches inside, finds the "sessions" list, and modifies it.

print(f"--- Scheduling for {researcher_01['name']} ---")

# We must "plug" the dictionary INTO the function so it knows whose sessions to update.
schedule_recording(researcher_01, "Rex", "video", ["2026-03-01"])
schedule_recording(researcher_01, "Rex", "audio", ["2026-03-02", "2026-03-03"])

print("-" * 30)
print(f"{researcher_01['name']} now has {len(researcher_01['sessions'])} sessions.")

# A second floating function, called the same awkward way.
update_phone(researcher_01, "555-9999")
print(f"Updated phone: {researcher_01['phone']}")

1.4. Adding a New Researcher (The Scaling Problem)

What happens when a second researcher joins the lab? We must repeat the entire manual dictionary — and keep its structure identical. With 100 researchers we copy-paste this block 100 times. And if we ever rename "subjects" to "participants", we must hunt down and fix every single dictionary by hand.

researcher_02 = {
    "id": str(uuid4()),
    "name": "Diego",            # must stay "name" -- not "Name" or "full_name"
    "email": "diego@university.edu",
    "phone": None,              # Diego gave no phone; we keep the key with None
    "subjects": [
        {"name": "Finch-03", "species": "Zebra finch", "age": 1.0, "body_mass_kg": 0.013}
    ],
    "sessions": []
}

print(f"--- Scheduling for {researcher_02['name']} ---")
schedule_recording(researcher_02, "Finch-03", "audio", ["2026-04-10", "2026-04-11"])
print("\nResearcher 02 sessions:", researcher_02["sessions"])

1.5. Why This Approach Breaks Down

Our prototype works, but it is fragile. The cracks are structural:

  1. No schema enforcement. Nothing forces every researcher to have the same keys. If researcher_01 has "phone" but researcher_02 does not, code downstream hits a KeyError.

  2. Manual redundancy. Rename one key and you must edit every dictionary and every function that touches it — an open invitation to bugs.

  3. Data inconsistency. Nothing stops "subjects" in one record and "subject" in another. Retrieval logic becomes fragile guesswork.

  4. Decoupled logic and state. The function has no intrinsic connection to the data; it only acts on whatever you explicitly pass in.

  5. High cognitive load. To write even one line, you must keep the entire nested structure — every key, every type — in your head.

And consuming the data is even harder than building it. Suppose you want “the first subject’s name and the most recent session.” You must reach into the "subjects" list, check it is not empty, grab element [0], read its "name", then reach into "sessions" and grab element [-1] — checking at every step that the key and index exist so the program does not crash.

There is a better way. We need something that bundles the data and its behaviour into one reliable unit. That is a Class.


Section 2: The Solution — Classes & Object-Oriented Design

2.1. Concept: From Functions to Objects

A Class is a formal blueprint for a kind of thing. It guarantees every object built from it starts with the right attributes, automates repetitive setup (like ID generation), and keeps the relevant logic attached as methods.

This buys us two things immediately:

  • Schema consistency — every Researcher is guaranteed to have a name, an email, a subject list.

  • Syntactic clarity — we read data with dot notation (researcher.name) instead of error-prone string keys (researcher["name"]).

Table 1 — Structural shift

ConceptFunctional approachObject-Oriented approach
Data storageloose variables / dictionariesAttributes (variables attached to an object)
Logic / actionfloating functionsMethods (functions attached to an object)
Organisationdecoupled blocksEncapsulation (one structured unit)

Table 2 — Core terminology

IdeaPython termMeaning
Blueprintclassthe definition of a type (the concept of a “Researcher”)
Instanceobjecta specific realisation (the researcher “Asha”)
Stateattributea variable inside an object (asha.email)
Behaviourmethoda function belonging to an object (asha.update_phone())

Table 3 — Modelling our study system

EntityWhat its class manages
Researcheridentity, contact info, and the subjects + sessions they own
Subjectthe studied individual’s record (name, species, age, body mass) and its owner link
RecordingSessiona single capture: date, modality, derived storage size, status
Modalitythe recording-type rules (video / audio) and storage estimates

2.2. Class Design Best Practices

Before writing a class, two conventions and one design question.

Naming (PEP 8):

  • Classes use PascalCase — every word capitalised, no underscores: Researcher, RecordingSession.

  • Attributes, methods, variables use snake_case — lowercase with underscores: body_mass_kg, update_phone().

Where should the logic live? The Single Responsibility Principle says a method belongs in the class whose data it primarily touches. update_phone() changes a researcher’s contact info, so it lives in Researcher. Categorising a subject by body mass touches the subject’s data, so get_size_category() lives in Subject.

Which class do we build first? We answer with dependencies. A Researcher can exist before registering any subject (their subject list simply starts empty). But a Subject cannot exist as an orphan — it needs an owner to be meaningful. Because Subject depends on Researcher, we define Researcher first.

2.3. Implementing the Researcher Class

Below is the standard anatomy of a Python class. Three improvements over the dictionary approach are worth naming as you read:

  • Automated setup — the researcher_id is generated automatically; no one can forget it.

  • Schema enforcement — creating a Researcher requires a name and email, so every instance is consistent.

  • Encapsulated logicschedule_recording is now a method that validates against the object’s own subject list before changing its own session history.

from uuid import uuid4

MODALITY_TYPES = ["video", "audio"]

class Researcher:
    """A person running a behavioural study. Owns subjects and recording sessions."""

    def __init__(self, name: str, email: str,
                 subject_names: list | None = None, phone: str | None = None):
        """The constructor: sets up the object's initial state."""
        # DYNAMIC ATTRIBUTES (provided at creation)
        self.name = name
        self.email = email
        self.phone = phone
        # DATA STRUCTURES (initialised automatically so they always exist)
        self.subjects = subject_names if subject_names is not None else []
        self.sessions = []
        # AUTOMATED METADATA (generated, never forgotten)
        self.researcher_id = str(uuid4())

    def update_phone(self, new_number: str):
        """Update this researcher's phone number."""
        self.phone = new_number
        print(f"[Update] {self.name}'s phone is now {self.phone}")

    def schedule_recording(self, subject_name: str, modality: str, date_list: list):
        """Record one or more sessions, validating against THIS researcher's subjects."""
        print(f"--- Scheduling {modality} for {subject_name} ---")
        # VALIDATION: does this researcher actually study that subject?
        if subject_name not in self.subjects:
            print(f"Error: {self.name} has no registered subject '{subject_name}'. Register it first.")
            return
        if modality not in MODALITY_TYPES:
            print(f"Error: '{modality}' is not a valid modality.")
            return
        # PROCESSING: append one structured record per date
        for date in date_list:
            entry = f"{date} | {subject_name} | {modality}"
            self.sessions.append(entry)
            print(f"Confirmed for {date}")

2.4. Concept: Deconstructing the Class Anatomy

Several new pieces appeared above. Here is what each one does.

The module import (uuid). from uuid import uuid4 pulls a ready-made function from Python’s standard library to generate unique ID strings — so we never hand-assign IDs.

The constructor __init__. A special method Python runs automatically the moment you create an object. The double underscores make it a “dunder” (double-underscore) method, reserved by the language. Its job is to set up the object’s starting state.

The instance reference self. Think of the class as a blank form and self as the specific copy currently being filled in. When you write asha = Researcher("Asha", ...), Python passes that new object in as self, so self.name = name writes “Asha” into Asha’s memory, not anyone else’s.

Type hints & defaults. In (name: str, email: str, phone: str | None = None):

  • : str is a type hint — documentation that name should be a string.

  • str | None means the value may be a string or None.

  • = None is a default — omit phone and Python fills in None instead of crashing.

Parameters vs. Attributes (a critical distinction)

These often share a name inside __init__ but play different roles:

  • Parameters are the temporary inputs in the function signature (def __init__(self, name, ...)). They carry data in, then vanish when __init__ finishes.

  • Attributes are the permanent storage attached with self (self.name). They live for the whole life of the object and are read later with dot notation.

The single line that bridges them is self.name = name, which means: “take the value in the temporary parameter name, and store it permanently in this object’s attribute self.name.”

Core rule: parameters are temporary inputs to a function; attributes are the persistent state of an object.

2.5. Concept: Instantiation and Dot Notation

Creating an object from a class is called instantiation. Instead of hand-building a dictionary, we call the class by name and pass the initialisation arguments; Python runs __init__ and hands back a new object.

A note on how we pass arguments. Python allows positional arguments (Researcher("Asha", "asha@university.edu")), which rely on order. Professionals prefer keyword arguments (Researcher(name="Asha", email="asha@university.edu")) because they are unambiguous and typo-proof — you cannot accidentally swap two strings.

Key Characteristics:

  • Read attributes with dot notation: researcher.name, not researcher["name"].

  • Each object has its own isolated state — Asha’s name and Diego’s name live in separate memory.

  • The class now behaves like a custom data type, as predictable as int or list.

# INSTANTIATION using keyword arguments (clear and safe)
researcher_01 = Researcher(name="Asha", email="asha@university.edu", phone="555-0100")
researcher_01.subjects = ["Rex", "Bolt"]   # register subject names for now

researcher_02 = Researcher(name="Diego", email="diego@university.edu")
researcher_02.subjects = ["Finch-03"]

# Reading state via dot notation
print(f"Name : {researcher_01.name}")
print(f"Email: {researcher_01.email}")
print(f"ID   : {researcher_01.researcher_id}")

2.6. Calling Methods on an Object

Now we trigger the methods with dot notation. The object manages its own state and enforces its own validation — no floating functions, no data injection.

System note: if you re-run the instantiation cell, the researcher_id changes every time, because uuid4() mints a fresh string on each new object.

print(f"--- {researcher_01.name}: initial phone = {researcher_01.phone} ---")
researcher_01.update_phone("555-9999")

print("\n--- Scheduling sessions ---")
# VALID: "Rex" is in researcher_01's own subject list
researcher_01.schedule_recording("Rex", "video", ["2026-03-01", "2026-03-02"])

# BLOCKED: "Finch-03" belongs to researcher_02; this object's validation catches it
researcher_01.schedule_recording("Finch-03", "audio", ["2026-04-10"])

print(f"\n--- {researcher_01.name}'s session history ---")
for record in researcher_01.sessions:
    print(" -", record)

Mini Challenge 2.6. Create a third researcher with your own name and email, register two subjects, schedule one video session for each, then print how many sessions they now hold.

2.7. Concept: A Dynamic Lifecycle (adding a Subject later)

Real data is rarely static. A researcher might sign up today and register a subject next week. Because we defaulted subject_names to None (and converted it to an empty list inside __init__), a researcher can safely exist with no subjects, then gain them over time.

Now we also build the Subject class itself — the studied individual. It carries its measurements and a get_size_category() method that bins it by body mass, the kind of derived label you will compute constantly in this course.

from uuid import uuid4

class Subject:
    """A studied individual (animal or person) linked to a researcher by owner_id."""

    def __init__(self, name: str, species: str, age: float,
                 body_mass_kg: float, owner_id: str):
        self.name = name
        self.species = species
        self.age = age
        self.body_mass_kg = body_mass_kg
        self.owner_id = owner_id              # foreign key linking to a Researcher
        self.subject_id = str(uuid4())

    def get_size_category(self):
        """Bin the subject by body mass -- a simple derived label."""
        if self.body_mass_kg < 5:
            return "Small"
        elif self.body_mass_kg < 30:
            return "Medium"
        else:
            return "Large"

# --- Dynamic lifecycle demo ---
researcher_03 = Researcher(name="Mei", email="mei@university.edu")
print(f"Created {researcher_03.name}; subjects = {researcher_03.subjects}")

# Attempt a booking with no subjects yet -> validation blocks it
researcher_03.schedule_recording("Sparrow-12", "video", ["2026-07-01"])

# Register the subject NAME, then re-attempt -> now it works
researcher_03.subjects.append("Sparrow-12")
print(f"\n{researcher_03.name} registered a subject: {researcher_03.subjects}")
researcher_03.schedule_recording("Sparrow-12", "video", ["2026-07-01"])

# Create the formal Subject profile, linked by the researcher's ID
sparrow = Subject(name="Sparrow-12", species="House sparrow", age=1.0,
                  body_mass_kg=0.03, owner_id=researcher_03.researcher_id)
print(f"\nSubject profile: {sparrow.name} ({sparrow.species}), "
      f"size = {sparrow.get_size_category()}, owner = {sparrow.owner_id[:8]}...")

2.8. Concept: The RecordingSession Class & Object Composition

So far a “session” was just a text string like "2026-03-01 | Rex | video". A string cannot estimate its own storage size or carry a status. Let us promote it to a real RecordingSession object.

Notice the class derives its storage_gb automatically from the modality — video sessions are far larger than audio. This is encapsulated business logic: the object computes its own properties, so they are always consistent.

Then we will see object composition: instead of storing a flat string in a researcher’s sessions list, we store the entire RecordingSession object — and gain access to all its attributes through chained dot notation (researcher.sessions[0].storage_gb).

from uuid import uuid4

# SYSTEM CONSTANTS (PEP 8: constants in ALL_CAPS) -- estimated storage per session.
VIDEO_GB_PER_SESSION = 12.0
AUDIO_GB_PER_SESSION = 0.5

class RecordingSession:
    """One capture, connecting a Researcher, a Subject, a modality, and a date."""

    def __init__(self, researcher_id: str, subject_id: str, modality: str, date: str):
        self.researcher_id = researcher_id
        self.subject_id = subject_id
        self.modality = modality.lower()      # standardise input
        self.date = date

        # DERIVED STATE: storage estimate is computed from the modality.
        if self.modality == "video":
            self.storage_gb = VIDEO_GB_PER_SESSION
        elif self.modality == "audio":
            self.storage_gb = AUDIO_GB_PER_SESSION
        else:
            self.storage_gb = 0.0
            print(f"[Warning] Unrecognised modality: {self.modality}")

        self.session_id = str(uuid4())
        self.status = "Scheduled"             # default state at creation

Now we compose the objects together — a researcher, a subject, and a session — and inspect them through dot notation.

# Create the core entities
researcher_01 = Researcher(name="Asha", email="asha@university.edu", phone="555-0100")
rex = Subject(name="Rex", species="Greyhound", age=3.0,
              body_mass_kg=27.0, owner_id=researcher_01.researcher_id)
researcher_01.subjects.append(rex.name)      # register the name for validation

# Create a formal RecordingSession object linking researcher + subject
session_01 = RecordingSession(researcher_id=researcher_01.researcher_id,
                              subject_id=rex.subject_id,
                              modality="video", date="2026-03-05")

# OBJECT COMPOSITION: store the OBJECT itself, not a string
researcher_01.sessions.append(session_01)

print(f"--- System status for {researcher_01.name} ---")
print(f"Registered subjects: {len(researcher_01.subjects)} | Sessions: {len(researcher_01.sessions)}")

print("\n--- Inspecting the stored session object ---")
first = researcher_01.sessions[0]            # this is a RecordingSession object
print(f"Date    : {first.date}")
print(f"Modality: {first.modality.capitalize()}")
print(f"Storage : {first.storage_gb:.1f} GB")
print(f"Status  : {first.status}")
print(f"Ref ID  : {first.session_id[:8]}...")
print(f"\nSubject size category: {rex.get_size_category()}")

2.9. Concept: Automating Composition (passing whole objects)

Until now we linked entities with ID strings (owner_id). OOP allows something more powerful: passing the whole object as a reference. This grants instant access to the owner’s attributes and methods, and lets us automate the bookkeeping.

Three upgrades in the refactor below:

  1. Two-way linking. When we create a Subject, we pass the actual Researcher object. Inside its constructor the subject appends itself to the owner’s subject list — automatic registration.

  2. A factory method. Researcher.schedule_recording now manufactures RecordingSession objects internally and stores them, instead of appending strings.

  3. System logging. Each __init__ prints a line, creating an automatic audit trail.

from uuid import uuid4

VIDEO_GB_PER_SESSION = 12.0
AUDIO_GB_PER_SESSION = 0.5

class Researcher:
    """Central hub: owns Subject objects and manufactures RecordingSession objects."""

    def __init__(self, name: str, email: str, phone: str | None = None):
        self.name = name
        self.email = email
        self.phone = phone
        self.subjects = []      # now stores actual Subject objects
        self.sessions = []      # now stores actual RecordingSession objects
        self.researcher_id = str(uuid4())
        print(f"[System] Created Researcher: {self.name} | {self.email} | {self.phone}")

    def update_phone(self, new_number: str):
        self.phone = new_number
        print(f"[Update] {self.name}'s phone -> {self.phone}")

    def schedule_recording(self, subject_name: str, modality: str, date_list: list):
        """Factory method: builds RecordingSession objects internally."""
        print(f"\n--- Request: {modality.capitalize()} for {subject_name} ---")
        # VALIDATION: search this researcher's list of Subject objects by name
        target = None
        for s in self.subjects:
            if s.name == subject_name:
                target = s
                break
        if target is None:
            print(f"Error: {self.name} has no registered subject '{subject_name}'.")
            return
        # COMPOSITION: manufacture and store a RecordingSession per date
        for date in date_list:
            self.sessions.append(
                RecordingSession(researcher=self, subject=target,
                                 modality=modality, date=date))


class Subject:
    """A studied individual that registers ITSELF to its owner on creation."""

    def __init__(self, name: str, species: str, body_mass_kg: float,
                 age: float, owner: Researcher):
        self.name = name
        self.species = species
        self.body_mass_kg = body_mass_kg
        self.age = age
        self.owner = owner               # store the whole Researcher object
        self.subject_id = str(uuid4())
        # AUTOMATED REGISTRATION: add myself to my owner's subject list
        self.owner.subjects.append(self)
        print(f"[System] Created Subject: {self.name} ({self.species}) "
              f"-> owner {self.owner.name}")

    def get_size_category(self):
        if self.body_mass_kg < 5:
            return "Small"
        elif self.body_mass_kg < 30:
            return "Medium"
        return "Large"


class RecordingSession:
    """Transactional record connecting a Researcher, a Subject, and a modality."""

    def __init__(self, researcher: Researcher, subject: Subject, modality: str, date: str):
        self.researcher = researcher
        self.subject = subject
        self.modality = modality.lower()
        self.date = date
        self.storage_gb = (VIDEO_GB_PER_SESSION if self.modality == "video"
                           else AUDIO_GB_PER_SESSION if self.modality == "audio" else 0.0)
        self.session_id = str(uuid4())
        self.status = "Scheduled"
        print(f"  [Session] {self.modality.capitalize()} on {self.date} "
              f"| {self.subject.name} | {self.storage_gb:.1f} GB | {self.status}")
# --- SYSTEM TEST: watch the automated logging tell the story ---
# 1. Create the researcher
asha = Researcher("Asha", "asha@university.edu", "555-0100")

# 2. Create the subject -> it auto-registers to Asha
rex = Subject("Rex", "Greyhound", 27.0, 3.0, owner=asha)

# 3. Schedule recordings -> the factory builds RecordingSession objects
asha.schedule_recording("Rex", "video", ["2026-03-10", "2026-03-11"])

# 4. Verify the composed state
print(f"\nAsha now owns {len(asha.subjects)} subject(s) and {len(asha.sessions)} session(s).")
total_gb = sum(s.storage_gb for s in asha.sessions)
print(f"Estimated storage for this study so far: {total_gb:.1f} GB")

Checkpoint 2. You have now built three linked classes and seen a subject register itself and a researcher manufacture sessions. In one sentence each, explain: what self refers to, why we store whole objects instead of ID strings, and what “encapsulation” buys us. If any is fuzzy, revisit that subsection.


Section 3: Decorators — Protecting and Supercharging Classes

3.1. Concept: What Is a Decorator?

As a system matures, we often need to modify or control a function’s behaviour without rewriting its internals. A decorator does exactly that: it “wraps” a function or method with extra behaviour. Think of it as a gatekeeper that intercepts a call, runs some logic before or after, then lets the original run.

You spot a decorator by the @ symbol sitting directly above a function, method, or class definition.

3.2. The Decorator Ecosystem

Anyone can write a decorator, so there are effectively infinitely many. They fall into three groups by where they come from:

SourceAvailabilityExamples
Built-inno import needed@property, @classmethod, @staticmethod
Standard libraryimport, no install@dataclass (from dataclasses), @lru_cache (from functools)
Third-partypip install + import@app.get(...) (FastAPI), @tool (LangChain)

A few you will actually meet:

  • @property — lets a method be read like a plain attribute (researcher.email), giving controlled, read-only access.

  • @<attr>.setter — the gatekeeper that runs whenever someone assigns to an attribute, so you can validate before saving.

  • @dataclass — auto-writes the boilerplate (__init__, a readable printout) for data-holding classes.

  • @lru_cache — remembers expensive results so repeated calls with the same inputs return instantly (handy for heavy computations on your data).

  • @tool (AI frameworks) — exposes a Python function to a large language model so an AI agent can call it. Increasingly relevant as you integrate LLMs into research workflows.

3.3. Concept: @property and @setter for Data Integrity

Right now nothing stops a typo from corrupting our data — researcher.email = 12345 would silently succeed. We can prevent this with encapsulation: store the real value in a protected attribute (by convention, a leading underscore: _email) and route every read and write through a @property getter and an @email.setter gatekeeper. The user still writes researcher.email = "..." as if it were an ordinary attribute, but our validation runs first.

from uuid import uuid4

class Researcher:
    def __init__(self, name: str, email: str, phone: str | None = None):
        self.name = name
        self.email = email            # this immediately triggers the setter below!
        self.phone = phone
        self.subjects = []
        self.sessions = []
        self.researcher_id = str(uuid4())
        print(f"[System] Created Researcher: {self.name}")

    # THE GETTER: lets us read 'researcher.email' (no parentheses)
    @property
    def email(self):
        return self._email

    # THE SETTER (gatekeeper): runs on every assignment to 'self.email'
    @email.setter
    def email(self, value: str):
        print(f"--- [Validation] attempting to set email = {value} ---")
        if not isinstance(value, str):
            print("  Rejected: email must be a string.")
            return
        if "@" not in value or "." not in value:
            print(f"  Rejected: '{value}' is not a valid email format.")
            return
        self._email = value           # passed validation -> save to protected attribute
        print("  Accepted: email validated and saved.")
# --- TESTING THE GATEKEEPER ---
r = Researcher("Alex", "alex@university.edu")    # A. valid email at creation

print("\n[Test 1] assign an invalid string (no '@'):")
r.email = "alex_at_university.edu"               # B. rejected

print("\n[Test 2] assign a number:")
r.email = 123456                                 # C. rejected

print(f"\n[Final state] {r.name}'s email is still: {r.email}")   # protected

3.4. Concept: Cutting Boilerplate with @dataclass

Writing a robust class involves a lot of repetitive “boilerplate” — the __init__, the self.x = x mapping, a readable printout. For classes whose main job is holding data, the standard-library @dataclass decorator writes all of that for you.

Key Characteristics:

  • Auto __init__ — it reads your type hints (name: str) and builds the constructor.

  • Readable printout — printing the object shows Researcher(name='Asha', ...) instead of a cryptic memory address.

  • Safe mutable defaults — never write dogs=[] as a default (all instances would secretly share one list). field(default_factory=list) gives each object its own fresh list.

from dataclasses import dataclass, field
from typing import List, Optional
from uuid import uuid4

@dataclass
class Modality:
    """A lightweight record describing a recording modality."""
    name: str
    base_storage_gb: float
    description: str
    modality_id: str = field(default_factory=lambda: str(uuid4()))

@dataclass
class Researcher:
    name: str                                  # required
    email: str                                 # required
    phone: Optional[str] = None                # optional
    subjects: List[str] = field(default_factory=list)   # fresh list per object
    sessions: List[str] = field(default_factory=list)
    researcher_id: str = field(default_factory=lambda: str(uuid4()))

    def schedule_recording(self, subject_name: str, modality: str, date_list: List[str]):
        if subject_name not in self.subjects:
            print(f"Error: {self.name} does not study '{subject_name}'")
            return
        for date in date_list:
            self.sessions.append(f"{date} | {subject_name} | {modality}")
            print(f"Confirmed for {date}")

@dataclass
class Subject:
    name: str
    species: str
    body_mass_kg: float
    age: float
    owner_id: str
    subject_id: str = field(default_factory=lambda: str(uuid4()))

    def get_size_category(self):
        if self.body_mass_kg < 5: return "Small"
        elif self.body_mass_kg < 30: return "Medium"
        return "Large"

# --- Execution: notice how clean instantiation and printing become ---
david = Researcher(name="David", email="david@university.edu", subjects=["Bolt"])
print("Automatic representation:")
print(david)                                   # readable, thanks to @dataclass

print("\nTesting an encapsulated method:")
david.schedule_recording("Bolt", "video", ["2026-08-15"])

video = Modality("video", 12.0, "High-frame-rate pose capture")
print("\nModality record:", video)

Section 4: Summary & Capstone Challenge

You have moved from flat, procedural code to a structured, object-oriented design. Thinking in entities rather than loose variables gives you a foundation that scales.

What you mastered:

  1. Blueprints & instantiation — strict Researcher / Subject classes replace fragile dictionaries, guaranteeing a consistent schema.

  2. Object composition — passing whole objects (a Subject carries its Researcher) lets RecordingSession records bind everyone together.

  3. Encapsulation — internal validation and @property protect each object’s state from bad data.

  4. Rapid prototyping@dataclass removes boilerplate for data-holding classes.

Capstone Challenge: Recording-Lab Capacity Management

Your study runs in a physical recording lab with a fixed number of capture stations. Right now the system allows unlimited check-ins; reality has limits. Build the management layer that governs the space.

Requirements:

  1. Blueprint — define a @dataclass named RecordingLab.

  2. State — give it:

    • lab_name: str

    • max_capacity: int (e.g. 8 simultaneous capture stations)

    • checked_in_subjects: List[Subject] (use field(default_factory=list))

  3. check_in_subject(self, subject: Subject)

    • Validation 1: if len(self.checked_in_subjects) has reached max_capacity, reject with a “Capacity reached” warning.

    • Validation 2: if the subject is already checked in, print a warning.

    • Otherwise: append the Subject and print a success message.

  4. checkout_subject(self, subject_id: str) — find the matching subject, remove it, and free up a station.

This exercises object composition, internal state validation, and modelling a real-world constraint — exactly the skills you will lean on when your own data-collection pipeline has to respect equipment and storage limits.


End of Notebook 05. You now have the full foundational toolkit — data, automation, visualization, computational thinking, and object-oriented design — ready to apply to the computer-vision, audio, and statistics modules ahead.