Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Python Essentials for Research in Behavioral Sciences: Computational Thinking — Data Structures, Algorithms & Fluency

Harvard University

In Notebooks 01 and 02 you learned to store data, automate repetitive work with loops and conditionals, and wrap logic into functions. That is already enough to get research done. This notebook is about getting fluent and efficient — writing code that is compact, readable, and fast enough to handle real research data.

Why does this matter? Imagine you record 200 minutes of video at 30 frames per second. That is 360,000 frames, each with dozens of numbers. A small, careless choice — like searching a list over and over — can turn a two-second job into a two-hour one. The difference is not the computer; it is how you think about the problem. That way of thinking is called computational thinking, and it rests on two pillars:

  1. Choosing the right data structure — the container you store data in determines how fast you can ask questions of it.

  2. Recognising algorithmic patterns — most problems are variations of a handful of patterns (counting, sorting, windowing, ...). Once you can name the pattern, the solution follows.

To build these skills we will solve a series of LeetCode / HackerRank-style problems, framed around the kind of data you will meet in this course. By the end you will also see why bundling data together with the functions that act on it leads naturally to Classes, which is exactly where Notebook 05 picks up.

How to Use This Notebook Well

  • Run every code cell yourself, top to bottom. Reading code is not the same as understanding it.

  • When you meet a new function, pause and print() the intermediate values. Curiosity here pays off.

  • For each problem, try it on paper or in your head first, then read the solution.

  • Many problems are shown in two versions: a first, explicit version that mirrors how you would reason it out, and a second, more Pythonic version. Compare them — the gap between the two is exactly the fluency we are building.

  • Mini Challenges and Checkpoints are for you. Do not skip them; they are where the learning sticks.

A note on framing: a few classic problems (a staircase, a cipher) are not about behaviour at all. They are here because they are clean, famous exercises for a specific pattern. Treat them as gym reps — once the pattern is in your fingers, you will reuse it on messy research data.


Section 0: Setup — Check Your Interpreter

Just as in the earlier notebooks, we begin by confirming which Python is running this notebook. If a later cell complains that a package is missing, the first question is always: am I in the environment I think I am in?

import sys

# sys.executable -> the full path to the Python program running this notebook
# sys.version    -> the exact version string (we want 3.10 or newer for this course)
print("Interpreter path:", sys.executable)
print("Python version  :", sys.version)

Section 1: Fluency Boosters — Writing Less, Saying More

Fluency is the difference between spelling out every word and speaking in sentences. The four tools in this section — richer slicing, comprehensions, string methods, and sorting with a key — let you express common operations in a single, readable line instead of a multi-line loop. None of them are new concepts; each is a compact form of something you already know.

1.1. Concept: Slicing, Deeper

You already know that my_list[0] grabs the first element. Slicing generalises this to grab a range of elements with the syntax sequence[start:stop:step].

Key Characteristics:

  • start is included, stop is excluded (so [0:2] gives elements 0 and 1).

  • Any of the three parts can be omitted: [:3] means “from the beginning”, [3:] means “to the end”.

  • A negative index counts from the end: [-1] is the last element.

  • A negative step, [::-1], walks the sequence backwards — the cleanest way to reverse.

Slicing works the same way on lists, tuples, and strings, because all three are ordered sequences.

# A few sequences to play with
my_tuple = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
my_list  = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

print("Whole tuple     :", my_tuple)
print("Single element  :", my_tuple[1])      # second element (index 1)
print("Last element    :", my_tuple[-1])     # counting from the end
print("First two       :", my_tuple[0:2])    # indices 0 and 1 (stop excluded)
print("A middle window :", my_tuple[3:6])    # indices 3, 4, 5
print("Reversed        :", my_tuple[::-1])   # negative step = walk backwards
print("Every 2nd item  :", my_tuple[::2])    # step of 2

Two common ways to reverse the first six items of a list — one with slicing, one with the .reverse() method. Notice that slicing leaves the original list untouched, while .reverse() changes the list in place.

# Method A: slicing -> returns a NEW reversed list, original unchanged
reversed_slice = my_list[0:6][::-1]
print("Slice reverse :", reversed_slice)
print("Original list :", my_list[0:6], "(unchanged)")

# Method B: .reverse() -> modifies the list IN PLACE (no new list returned)
sub = my_list[0:6]
sub.reverse()
print("In-place      :", sub)

Mini Challenge 1.1. Given frames = list(range(0, 100)) (a stand-in for frame numbers), use a single slice to grab every 10th frame, and a second slice to grab the last 5 frames in reverse order.

1.2. Concept: Comprehensions — The Compact Loop

A comprehension builds a new list (or dictionary, or set) from an existing iterable in a single expression. It is the compact form of a for loop whose only job is to collect results.

The pattern reads almost like English:

[ <what to keep>  for <item> in <iterable>  if <optional condition> ]

Key Characteristics:

  • The result is a brand-new list; the original is untouched.

  • The optional if at the end filters items.

  • It is usually faster and easier to read than the equivalent loop — as long as it stays on one logical line. If the logic gets complicated, prefer a normal loop.

# Goal: from a list of participant ages, keep only the adults (age >= 18).

ages = [12, 25, 17, 40, 8, 33, 18]

# --- The explicit loop you already know ---
adults_loop = []
for age in ages:
    if age >= 18:
        adults_loop.append(age)
print("Loop      :", adults_loop)

# --- The same thing as a list comprehension ---
adults_comp = [age for age in ages if age >= 18]
print("Comp      :", adults_comp)

# Comprehensions can transform, not just filter.
# Convert every age to 'age in months':
ages_in_months = [age * 12 for age in ages]
print("In months :", ages_in_months)

A dictionary comprehension uses the same idea but builds key: value pairs. Here we map each participant ID to a pass/fail label in one line.

participant_ids = ["P01", "P02", "P03", "P04"]
scores          = [88, 42, 91, 67]

# Build {id: "pass"/"fail"} where pass means score >= 60.
# zip() pairs the two lists together, item by item (more on zip in Notebook 02).
results = {pid: ("pass" if s >= 60 else "fail")
           for pid, s in zip(participant_ids, scores)}

print(results)
print("Did P02 pass? ->", results["P02"])

Mini Challenge 1.2. You have reaction_times = [0.45, 1.20, 0.33, 0.98, 1.51] (seconds). Use one list comprehension to keep only times under 1.0 s, and one dictionary comprehension to label each trial index (0, 1, 2, ...) as "fast" or "slow" using the same 1.0 s threshold.

1.3. Concept: The String-Method Toolkit

In behavioural research, metadata often hides inside text: a filename like athlete03_session2_pushup.mp4 encodes the subject, the session, and the task. String methods are built-in tools attached to every string that let you inspect and reshape that text.

A few you will reach for constantly:

  • .isupper(), .islower(), .isdigit(), .isalpha()ask a question about the characters (return True/False).

  • .startswith(...), .endswith(...), and the in keyword — test for substrings.

  • .lower(), .upper(), .strip()clean and normalise text.

  • .split(sep)break a string into a list of pieces.

  • "sep".join(list) — the inverse: glue a list of strings back together.

# Counting "words" in a camelCase identifier.
# In camelCase, every new word (after the first) starts with a capital letter.
identifier = "saveChangesInTheEditor"

num_words = 1  # the first word has no leading capital, so we start the count at 1
for char in identifier:
    if char.isupper():      # each capital marks the start of a new word
        num_words += 1

print(f"'{identifier}' contains {num_words} words.")
# Parsing metadata out of a filename with .split() and indexing.
filename = "athlete03_session2_pushup.mp4"

stem = filename.split(".")[0]          # drop the extension -> "athlete03_session2_pushup"
parts = stem.split("_")                # break on underscores -> ['athlete03','session2','pushup']
subject, session, task = parts         # unpack the three pieces into named variables

print("Subject:", subject)
print("Session:", session)
print("Task   :", task)

# .join() is the inverse of .split(): rebuild a clean label.
label = " | ".join(parts)
print("Label  :", label)

Mini Challenge 1.3. Given path = "trial_07_subjectB_correct.csv", extract the trial number, the subject, and the outcome. Then build a tidy label "subjectB - trial 07 - correct" using .join() or an f-string.

1.4. Concept: Sorting with a Key

sorted(iterable) returns a new, sorted list. The real power comes from the key= argument: a function that tells Python what to sort by. The same idea works for min() and max().

Key Characteristics:

  • sorted(...) returns a new list; the list method .sort() sorts in place.

  • key= takes a function that is applied to each item to produce the value used for comparison.

  • reverse=True flips the order (largest first).

  • A lambda is a tiny, throwaway one-line function — perfect for a key.

# Sort research records by a field buried inside each dictionary.
records = [
    {"id": "P01", "score": 88, "rt": 0.45},
    {"id": "P02", "score": 42, "rt": 1.20},
    {"id": "P03", "score": 91, "rt": 0.33},
]

# Sort by score, highest first. The key function pulls out the 'score' of each record.
by_score = sorted(records, key=lambda r: r["score"], reverse=True)
print("Ranked by score:")
for r in by_score:
    print("  ", r["id"], "->", r["score"])

# min()/max() accept the same key. Who reacted fastest (smallest reaction time)?
fastest = min(records, key=lambda r: r["rt"])
print("Fastest responder:", fastest["id"])

Checkpoint 1. You should now be able to: take a slice with a step, rewrite a collecting loop as a comprehension, pull fields out of a string with .split(), and sort a list of dictionaries by any field. If any of those feels shaky, re-run that subsection before moving on — Section 2 leans on all four.


Section 2: Choosing the Right Data Structure

Every container in Python is good at some things and slow at others. Picking the right one is the single biggest lever you have over how fast — and how readable — your analysis code is. This section compares the four workhorses and then shows, with a stopwatch, why the choice matters.

2.1. Concept: List vs. Tuple vs. Set vs. Dict

StructureLooks likeOrdered?Duplicates?Best at
list[1, 2, 2, 3]yesyesan ordered, changeable sequence
tuple(1, 2, 3)yesyesa fixed record that must not change
set{1, 2, 3}nonofast membership tests; removing duplicates
dict{"a": 1}yes*keys uniquefast lookup of a value by a key

*Dictionaries remember insertion order in modern Python, but you still access values by key, not by position.

The two that beginners under-use are set and dict — and they are exactly the two that make code fast. The next subsections show why.

# The same five readings, in four containers — notice what each one keeps.
readings_list  = [22, 45, 22, 38, 45]
readings_tuple = (22, 45, 22, 38, 45)
readings_set   = {22, 45, 22, 38, 45}            # duplicates collapse automatically
readings_dict  = {"sensorA": 22, "sensorB": 45}  # value looked up BY NAME

print("list :", readings_list)
print("tuple:", readings_tuple)
print("set  :", readings_set, "  <- duplicates removed, order not guaranteed")
print("dict :", readings_dict)
print("Value for sensorB:", readings_dict["sensorB"])

2.2. Concept: Sets for Fast Membership

The question “is this item in the collection?” comes up everywhere — is this a valid label? have we seen this ID before? With a list, Python must walk through every element to answer, so the cost grows with the size of the list. With a set, the answer is effectively instant, no matter how large the set is. (This near-instant lookup is what computer scientists call O(1); we will unpack that in 2.4.)

Use a set when you mainly need to (a) test membership quickly, or (b) remove duplicates.

# A controlled vocabulary of allowed behaviour labels.
valid_behaviors = {"walk", "run", "rest", "groom", "feed"}   # a set

# Membership tests with `in` are fast and read like English.
print("'run'  valid? ", "run"  in valid_behaviors)
print("'fly'  valid? ", "fly"  in valid_behaviors)

# Instantly remove duplicates from a messy annotation log by converting to a set.
annotation_log = ["walk", "walk", "rest", "run", "rest", "walk"]
unique_labels = set(annotation_log)
print("Unique labels used:", unique_labels)
print("How many distinct behaviours? ", len(unique_labels))

2.3. Concept: Dictionaries for Lookup and Counting

A dictionary maps a key to a value, and looks the value up almost instantly. Two patterns dominate research code:

  • Lookup — store something once, retrieve it by name many times (e.g. a subject’s group assignment).

  • Counting — tally how often each item appears (e.g. how many times each behaviour occurred).

Two tools make this clean:

  • dict.get(key, default) — fetch a value, but return a fallback instead of crashing if the key is missing.

  • collections.Counter — a ready-made dictionary built specifically for counting.

# Counting behaviour occurrences — the manual way and the easy way.
annotation_log = ["walk", "walk", "rest", "run", "rest", "walk", "feed"]

# --- Manual: build the tally yourself with .get() ---
counts = {}
for label in annotation_log:
    # .get(label, 0) returns the current count, or 0 if we have not seen it yet
    counts[label] = counts.get(label, 0) + 1
print("Manual counts :", counts)

# --- The easy way: collections.Counter does exactly this ---
from collections import Counter
counts_auto = Counter(annotation_log)
print("Counter       :", dict(counts_auto))

# Counter even ranks for you: the two most common behaviours.
print("Most common   :", counts_auto.most_common(2))

Mini Challenge 2.3. Given votes = ["A", "B", "A", "C", "B", "A"], build a count dictionary two ways — once with .get() and once with Counter — and print which option received the most votes.

2.4. Concept: A First Taste of Big-O

Big-O is shorthand for how the running time grows as the data grows. You do not need any maths to use it well — just an intuition for three cases:

  • O(1) — constant. The work does not depend on the size of the data. Looking up a key in a dict, or testing membership in a set.

  • O(n) — linear. The work grows in step with the data. Scanning a list once.

  • O(n²) — quadratic. The work grows with the square of the data — a loop inside a loop. Often a sign you should be using a set or dict instead. At 360,000 frames, O(n²) is catastrophic; O(n) is fine.

The classic trap: calling list.count(x) inside a loop over the same list. Each .count() is itself an O(n) scan, so the whole thing becomes O(n²). Doing the tally with a dictionary is O(n). Let us measure the difference.

import time
import random

# A biggish list of labels to make the difference visible.
data = [random.choice(["walk", "run", "rest", "groom", "feed"]) for _ in range(20000)]
vocab = set(data)

# --- SLOW: O(n^2) -- .count() rescans the whole list for every label ---
start = time.perf_counter()
counts_slow = {label: data.count(label) for label in vocab}
slow = time.perf_counter() - start

# --- FAST: O(n) -- one pass builds the whole tally ---
start = time.perf_counter()
counts_fast = Counter(data)
fast = time.perf_counter() - start

print("Same answer? ", counts_slow == dict(counts_fast))
print(f"O(n^2) .count() : {slow*1000:8.2f} ms")
print(f"O(n)   Counter  : {fast*1000:8.2f} ms")
print(f"Speed-up        : {slow/fast:6.0f}x faster")

Checkpoint 2. The takeaway is not “memorise Big-O notation” — it is a habit: whenever you find yourself searching a list inside a loop, ask whether a set or dict would let you do it in one pass. That single reflex will save you hours on real datasets.


Section 3: Algorithmic Patterns for Research Data

Most problems you will face are variations of a few patterns. Learn to recognise the pattern and you have already done 80% of the work. This section walks through six patterns, each with a short, classic exercise and a note on where it reappears in this course.

3.1. Pattern: Counting and Frequency

When to reach for it: any time the question is “how many” or “what fraction” — the share of trials that were correct, the proportion of frames in each behaviour, the most frequent event.

We will warm up on the HackerRank classic Plus Minus: given a list of numbers, report the fraction that are positive, negative, and zero. The same logic answers “what fraction of trials were rewarded / punished / neutral?”.

def fraction_by_sign(arr):
    """Return the fraction of positive, negative, and zero values in arr."""
    n = len(arr)
    pos = neg = zero = 0          # initialise all three tallies

    for num in arr:               # a single O(n) pass
        if num > 0:
            pos += 1
        elif num < 0:
            neg += 1
        else:
            zero += 1

    # f"{value:.6f}" formats with exactly 6 decimal places
    print(f"Positive: {pos / n:.6f}")
    print(f"Negative: {neg / n:.6f}")
    print(f"Zero    : {zero / n:.6f}")

# Read as: outcomes coded +1 (reward), -1 (punish), 0 (neutral)
fraction_by_sign([1, 1, 0, -1, -1])

A close cousin — Birthday Cake Candles: how many values equal the maximum? This is the “how many tied for the top” question (e.g. how many athletes hit the best score).

def count_tallest(candles):
    """How many candles are at the maximum height?"""
    max_height = max(candles)            # one pass to find the best
    return candles.count(max_height)     # one pass to count ties (fine: only called once)

print("Number tied for tallest:", count_tallest([3, 2, 1, 3]))

3.2. Pattern: Sorting-Based Solutions

When to reach for it: when order unlocks the answer — the top-k results, the median, the smallest gap. Sorting once (O(n log n)) often turns a hard question into an obvious one.

The exercise is Mini-Max Sum: given five numbers, find the smallest and largest sums obtainable from any four of them. Once the list is sorted, the answer is trivial: drop the largest for the min, drop the smallest for the max.

def min_max_sum(arr):
    """Smallest and largest sums of any four of the five values."""
    s = sorted(arr)              # ascending order
    min_sum = sum(s[:-1])        # everything except the LARGEST
    max_sum = sum(s[1:])         # everything except the SMALLEST
    print("min:", min_sum, " max:", max_sum)

min_max_sum([1, 3, 5, 7, 9])     # -> min: 16  max: 24

3.3. Pattern: Windowing over Sequences

When to reach for it: when meaning lives in a fixed-size chunk of a sequence — a 3-frame smoothing window, a beat of music, a repeating code block. You step through the sequence one window at a time.

This is the pattern behind much of your later pose work: a moving window over frames is how you smooth jitter and detect events. We will practise on Mars Exploration: a signal should be the message "SOS" repeated; count how many characters were corrupted in transmission.

def count_corrupted(signal):
    """Count positions where a repeated 'SOS' message differs from 'SOS'."""
    errors = 0
    # Step through the signal in fixed windows of length 3.
    for i in range(0, len(signal), 3):
        chunk = signal[i:i + 3]         # one window: 'SOS', then the next three, ...
        if chunk[0] != 'S':
            errors += 1
        if chunk[1] != 'O':
            errors += 1
        if chunk[2] != 'S':
            errors += 1
    return errors

print("Corrupted characters:", count_corrupted("SOSSPSSQSSOR"))

3.4. Pattern: Stacks

When to reach for it: when the most recent item is the one you need to compare against or undo — matching brackets, parsing, “cancel out adjacent duplicates”. A stack is just a list used from one end: .append() to push, .pop() to remove the last item.

The exercise: repeatedly remove adjacent matching letters until none remain ("aaabccddd" -> "abd"). Each incoming letter either cancels the top of the stack or sits on top of it.

def super_reduce(s):
    """Remove adjacent equal pairs repeatedly using a stack."""
    stack = []
    for char in s:
        if stack and stack[-1] == char:   # top of stack matches -> they cancel
            stack.pop()
        else:
            stack.append(char)            # otherwise, push this char on top
    return "".join(stack) if stack else "Empty String"

print(super_reduce("aaabccddd"))   # -> 'abd'
print(super_reduce("aa"))          # -> 'Empty String'

3.5. Pattern: Nested and 2D Data

When to reach for it: whenever data is a grid — a frame of pixels, a matrix of joint coordinates, a table of trials. The key skill is moving cleanly between a Python list-of-lists and a NumPy 2D array.

First, a gotcha: max() on a list-of-lists does not give the largest number. It compares the inner lists to each other (by their first element). To get the overall maximum you need a two-step “max of each row, then max of those”.

import numpy as np

grid = [[4, 5, 6, 7, 8],
        [9, 10, 11, 12, 13],
        [14, 15, 16, 17, 18]]

# Shape: rows = len(grid), columns = len(grid[0])
print("Rows   :", len(grid))
print("Columns:", len(grid[0]))
print("NumPy shape:", np.array(grid).shape)   # (rows, columns)

# WRONG: compares whole rows, not individual numbers
# print(max(grid))  # -> [14, 15, 16, 17, 18], NOT 18 alone

# RIGHT: max of each row, then the max of those. map(max, grid) applies
# max() to every row; the outer max() then finds the biggest of the row-maxes.
overall_max = max(map(max, grid))
print("Overall maximum:", overall_max)

A clean 2D exercise — Diagonal Difference: the absolute difference between the sums of a square matrix’s two diagonals. The primary diagonal is arr[i][i]; the secondary is arr[i][n-1-i].

def diagonal_difference(arr):
    n = len(arr)
    primary = secondary = 0
    for i in range(n):
        primary   += arr[i][i]          # top-left -> bottom-right
        secondary += arr[i][n - 1 - i]  # top-right -> bottom-left
    return abs(primary - secondary)

square = [[11, 2, 4],
          [4, 5, 6],
          [10, 8, -12]]
print("Diagonal difference:", diagonal_difference(square))

3.6. Pattern: Loops vs. Recursion (a brief aside)

Almost everything in this course is best written as a loop. But you will occasionally meet recursion — a function that calls itself on a smaller piece of the problem until it hits a base case that stops it. It is worth recognising, even if you rarely write it.

The two snippets below count to 5 identically. Note the recursive version’s base case (current >= stop): without it, the function would call itself forever.

# Iterative: a plain loop (your default choice)
def count_loop(n):
    for i in range(n):
        print("Count:", i)

# Recursive: the function calls itself with current+1 until the base case
def count_recursive(current, stop):
    if current >= stop:      # BASE CASE: stop here, do not recurse further
        return
    print("Count:", current)
    count_recursive(current + 1, stop)   # the recursive step

print("--- loop ---");      count_loop(5)
print("--- recursion ---"); count_recursive(0, 5)

Checkpoint 3. For each pattern, try to state in one sentence when you would use it. If you can say “I’d use windowing when the answer lives in a fixed-size chunk of a sequence,” the pattern is yours.


Section 4: Practice Problems (LeetCode / HackerRank-style)

Now you assemble the tools. For each problem: read it, plan an approach in words, then write it. Where two solutions are shown, study the gap between the explicit version (how you reason it out) and the Pythonic version (how fluency expresses it).

Problem 1 — Staircase

Print a right-aligned staircase of height n built from #, where row i has i hashes padded on the left by spaces. Pattern: string building.

def staircase(n):
    for i in range(1, n + 1):
        # (n - i) spaces for right-alignment, then i hashes
        print(" " * (n - i) + "#" * i)

staircase(4)

Problem 2 — Time Conversion

Convert a 12-hour time string ("07:05:45AM") to 24-hour format. The only tricky cases are 12 AM -> 00 and 12 PM -> 12. Pattern: string parsing + careful conditionals.

def to_24_hour(s):
    period = s[-2:]          # 'AM' or 'PM'
    hour   = int(s[:2])      # first two characters as an integer
    if period == "AM":
        if hour == 12:       # 12 AM is midnight -> 00
            hour = 0
    else:                    # PM
        if hour != 12:       # every PM except 12 PM shifts by 12
            hour += 12
    # f"{hour:02d}" pads the hour to two digits; s[2:-2] keeps ':mm:ss'
    return f"{hour:02d}{s[2:-2]}"

print(to_24_hour("07:05:45AM"))   # 07:05:45
print(to_24_hour("12:00:00AM"))   # 00:00:00
print(to_24_hour("12:00:00PM"))   # 12:00:00

Problem 3 — Password Strength (explicit vs. Pythonic)

A password is strong if it is at least 6 characters and contains a digit, a lowercase letter, an uppercase letter, and a special character. Return the minimum number of characters to add to make it strong. Pattern: track several conditions in one pass.

This is the showcase comparison of the notebook — the same problem solved twice.

# --- Version A: explicit. Collect characters of each type into buckets. ---
def min_to_strong_explicit(n, password):
    is_upper = is_lower = is_digit = is_special = ""
    for char in password:
        if char.isupper():
            is_upper += char
        if char.islower():
            is_lower += char
        if char.isdigit():
            is_digit += char
        if not char.isalnum():      # not a letter or number -> special
            is_special += char

    buckets = [is_upper, is_lower, is_digit, is_special]
    missing = sum(1 for b in buckets if b == "")   # how many types are absent

    # We must satisfy BOTH the missing-types need AND the length-6 need.
    if n > 5:
        return missing
    return max(missing, 6 - n)

print("Explicit:", min_to_strong_explicit(6, "SMa12#"))
# --- Version B: Pythonic. Booleans, a set for O(1) special-char lookup, one pass. ---
def min_to_strong(n, password):
    special_chars = set("!@#$%^&*()-+")       # set membership is O(1)
    has_digit = has_lower = has_upper = has_special = False

    for char in password:                     # a single pass
        if char.isdigit():
            has_digit = True
        elif char.islower():
            has_lower = True
        elif char.isupper():
            has_upper = True
        elif char in special_chars:
            has_special = True

    # In Python, True == 1 and False == 0, so we can ADD booleans.
    missing = 4 - (has_digit + has_lower + has_upper + has_special)
    return max(missing, 6 - n)                # satisfy types AND length at once

print("Pythonic:", min_to_strong(6, "SMa12#"))

Read the two side by side. Version B is shorter not because it is clever, but because it uses the right tools: booleans as 0/1, a set for fast lookup, and a single max() to combine the two requirements. That is fluency.

Problem 4 — Caesar Cipher

Shift every letter forward by k, wrapping z -> a, leaving non-letters untouched. Pattern: index arithmetic with the modulo (%) operator for wrap-around.

def caesar_cipher(s, k):
    alphabet = "abcdefghijklmnopqrstuvwxyz"
    out = []
    for ch in s:
        if ch.isalpha():
            was_upper = ch.isupper()
            idx = alphabet.index(ch.lower())     # position 0-25
            new_ch = alphabet[(idx + k) % 26]    # % 26 wraps past 'z' back to 'a'
            out.append(new_ch.upper() if was_upper else new_ch)
        else:
            out.append(ch)                       # spaces, dashes, digits pass through
    return "".join(out)

print(caesar_cipher("middle-Outz", 3))   # -> 'plggoh-Rxwc'

Problem 5 — Longest Alternating Substring

Given a string, choose two letters; delete all others; you are left with a string of just those two. What is the longest such string that alternates (no two neighbours equal)? Pattern: try all pairs (a double loop), then validate — and notice the cost.

def is_alternating(seq):
    """True if no two adjacent items are equal."""
    return all(seq[i] != seq[i + 1] for i in range(len(seq) - 1))

def longest_alternating(s):
    unique = list(set(s))           # the distinct letters present
    best = 0
    # Try every unordered PAIR of distinct letters: a double loop -> O(pairs * n).
    for i in range(len(unique)):
        for j in range(i + 1, len(unique)):
            a, b = unique[i], unique[j]
            filtered = [c for c in s if c == a or c == b]   # keep only these two
            if is_alternating(filtered):
                best = max(best, len(filtered))
    return best

print("Longest alternating length:", longest_alternating("beabeefeab"))   # -> 5

Mini Challenge 4. Re-frame Plus Minus (Problem set, Section 3.1) for your own data: given a list of frame-by-frame angle changes, report the fraction of frames where the angle increased, decreased, or stayed the same. You already have the pattern — only the story changes.


Section 5: Appendix (Optional) — Pattern Matching with Regex

Regular expressions (regex) are a mini-language for describing text patterns. They are powerful but dense, so we keep them optional: reach for them only when plain string methods (Section 1.3) are not enough. Skip this on a first pass and come back when you actually need to parse messy text.

The re module is the tool. Three jobs cover most needs: find all matches, split on a pattern, and extract a captured group.

import re

raw_text = """The DNA sequence was analyzed. The sequence showed a mutation in the DNA.
This mutation might be significant for the DNA study."""

# findall: every "word", lower-cased. [\w'-]+ means runs of letters/digits/_/'/-
all_words = re.findall(r"[\w'-]+", raw_text.lower())
print("Word count :", len(all_words))

# Combine with Section 2.3: count word frequencies with Counter.
from collections import Counter
print("Top 3 words:", Counter(all_words).most_common(3))
# Splitting a transcript into sentences WITHOUT losing the punctuation.
# The lookbehind (?<=[.!?]) splits *after* . ! or ? followed by a space.
transcript = "Hello class! Is this a test, Dr. Mandal? Yes, it is."
sentences = re.split(r"(?<=[.!?]) +", transcript)
for s in sentences:
    print("-", s)
# Extracting the 11-character video ID from several YouTube URL formats.
# (...) captures a group; {11} means exactly 11 of the allowed characters.
urls = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/dQw4w9WgXcQ",
    "https://www.youtube.com/shorts/S0Nib79VNUE",
]
pattern = r"(?:v=|/)([0-9A-Za-z_-]{11})"
for url in urls:
    m = re.search(pattern, url)
    print(m.group(1) if m else "no match", "  <-", url)

Section 6: Bridge to Notebook 05 — From Functions to Classes

Look back at how we have been working. We keep a piece of data — a research sample, an event — as a dictionary, and we write separate functions that each take that dictionary as input. Here is the recurring shape:

# A small dataset: each animal is a dictionary of measurements.
animals = [
    {"id": 1, "age": 12, "weight": 4.2},
    {"id": 2, "age": 8,  "weight": 3.1},
    {"id": 3, "age": 15, "weight": 6.5},
    {"id": 4, "age": 20, "weight": 2.2},
]

# Function 1: select adults that are also light enough for the study.
def select_samples(data, min_age=10, max_weight=5.0):
    return [a for a in data if a["age"] >= min_age and a["weight"] < max_weight]

# Function 2: summarise a sample.
def describe(animal):
    return f"Animal {animal['id']}: age {animal['age']}, weight {animal['weight']} kg"

selected = select_samples(animals)
print("Selected IDs:", [a["id"] for a in selected])
for a in selected:
    print("  ", describe(a))

Notice the friction. The data ({"id":..., "age":...}) and the behaviour (select_samples, describe) live apart. We must keep passing the same dictionaries into every function, and nothing stops us from typo-ing a key (a["wieght"]) or building a malformed animal in the first place. As a project grows, this gets fragile.

The natural question is: what if each animal carried its own behaviour with it — so that animal.describe() just worked, and an animal could only ever be created with valid fields? That bundling of data + the functions that act on it into a single object is exactly what a Class is, and it is where Notebook 05 begins.

Weekly Challenge. Combine three patterns from this notebook into one small pipeline on a behavioural dataset of your choice (real or invented):

  1. Parse a list of filenames like "subjectA_session1_walk.mp4" into structured records (Section 1.3).

  2. Count how many sessions each subject contributed, using a dictionary or Counter (Section 2.3).

  3. Rank the subjects from most to fewest sessions with sorted(..., key=...) (Section 1.4).

Write it first with explicit loops, then rewrite the collecting steps as comprehensions. Keep both versions — the difference is the fluency you have gained.


End of Notebook 04. Next: Notebook 05 — Introduction to Python Classes & Decorators, where the data and behaviour you have been juggling finally move into one place.