Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Goal: Extract body landmark coordinates (keypoints) for every person in each video frame on GPU and save them to a CSV.
This notebook is designed for undergraduate students in LS100 and mirrors the structure of your previous MediaPipe notebook (setup → verify → run → export), but uses a CUDA-native pipeline so it works reliably on Colab GPUs.


What you’ll get

  • An annotated output video (skeletons drawn).

  • A CSV file with one row per person per frame: frame_idx, timestamp_ms, track_id, 17 COCO keypoints (x, y, confidence) and detection confidence.

What you need to do

  • Provide a path to a video (Drive or uploaded file).

  • Optionally choose a different YOLOv8 pose model variant for accuracy/speed trade-offs.

1) Runtime & GPU Check

Instruction: Make sure your Colab runtime has a GPU enabled.
Menu: Runtime → Change runtime type → Hardware accelerator: GPU → Save.

Then run the cell below to confirm.


# DO NOT CHANGE ANYTHING IN THIS CELL
# Quick GPU check
!nvidia-smi || echo "nvidia-smi not available (GPU runtime not enabled?)"

import torch, platform
print("PyTorch CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU name:", torch.cuda.get_device_name(0))
print("Python:", platform.python_version())

2) Install Dependencies

Instruction: Run this once per fresh Colab session. Installs the Ultralytics package (YOLOv8).


# DO NOT CHANGE ANYTHING IN THIS CELL
# Install Ultralytics (YOLOv8) and supporting libs
%pip -q install ultralytics==8.3.30 opencv-python-headless pandas numpy tqdm
import ultralytics
from ultralytics import YOLO
print("Ultralytics version:", ultralytics.__version__)

3) Imports

Instruction: No edits needed.


# DO NOT CHANGE ANYTHING IN THIS CELL
import os, sys, cv2, math, time, pathlib, json
import numpy as np
import pandas as pd
from tqdm import tqdm

import torch
from ultralytics import YOLO

4) Set Input/Output Paths

Choose one of the options below:

  • Option A: Mount Google Drive and set INPUT_VIDEO to your file in Drive.

  • Option B: Upload a file from your computer.

Students: Use one option. Comment out the other.


# === Option A: Google Drive (Recommended for larger videos) ===
# INSTRUCTION: Uncomment this block if your video is in Drive.
# from google.colab import drive
# drive.mount('/content/drive')
# INPUT_VIDEO = "/content/drive/MyDrive/your_folder/your_video.mp4"  # <-- EDIT THIS PATH
# OUTPUT_DIR = "/content/drive/MyDrive/ls100_pose_outputs"           # <-- EDIT THIS PATH

# === Option B: Local upload ===
# INSTRUCTION: Use the UI prompt to upload a file; set INPUT_VIDEO automatically.
INPUT_VIDEO = None
OUTPUT_DIR = "/content/pose_outputs"
os.makedirs(OUTPUT_DIR, exist_ok=True)

try:
    from google.colab import files
    print("Upload a video file (mp4/mov/avi):")
    uploaded = files.upload()
    if uploaded:
        INPUT_VIDEO = list(uploaded.keys())[0]
        print("Using uploaded file as INPUT_VIDEO:", INPUT_VIDEO)
except Exception as e:
    print("Colab file upload not available in this environment:", e)

if INPUT_VIDEO is None:
    print("\n⚠️ IMPORTANT: If you did not upload a file, set INPUT_VIDEO to a valid path above (Option A).")
else:
    print("INPUT_VIDEO =", INPUT_VIDEO)
print("OUTPUT_DIR  =", OUTPUT_DIR)

5) Choose Pose Model Variant

Instruction: Pick a model for speed/accuracy:

  • yolov8n-pose.ptnano (fastest, least accurate)

  • yolov8s-pose.ptsmall (good balance)

  • yolov8m/l/x-pose.pt → larger, slower, more accurate

Start with nano or small for classroom demos.


# STUDENTS: You may change ONLY the model filename below.
MODEL_WEIGHTS = "yolov8s-pose.pt"  # try: 'yolov8n-pose.pt' for faster, 'yolov8m/l/x-pose.pt' for accuracy

# DO NOT CHANGE ANYTHING BELOW
assert torch.cuda.is_available(), "GPU not available. Enable GPU runtime in Colab (Runtime → Change runtime type)."
device = 0  # GPU index
model = YOLO(MODEL_WEIGHTS)
model.to('cuda')  # move model to GPU
print("Loaded model on CUDA:", MODEL_WEIGHTS)

6) COCO Keypoint Mapping (17 points)

We will export 17 COCO keypoints per person:

['nose','left_eye','right_eye','left_ear','right_ear','left_shoulder','right_shoulder',
 'left_elbow','right_elbow','left_wrist','right_wrist','left_hip','right_hip',
 'left_knee','right_knee','left_ankle','right_ankle']

Each keypoint will have x, y, conf in the CSV.


# DO NOT CHANGE ANYTHING IN THIS CELL
COCO_KP_NAMES = [
    "nose","left_eye","right_eye","left_ear","right_ear",
    "left_shoulder","right_shoulder","left_elbow","right_elbow",
    "left_wrist","right_wrist","left_hip","right_hip",
    "left_knee","right_knee","left_ankle","right_ankle"
]
NUM_KP = len(COCO_KP_NAMES)

7) Helper — Video Metadata

We read FPS (frames per second) to compute timestamp_ms = frame_idx * 1000 / fps.


# DO NOT CHANGE ANYTHING IN THIS CELL
def get_video_fps_frames(path):
    cap = cv2.VideoCapture(path)
    if not cap.isOpened():
        raise RuntimeError(f"Cannot open video: {path}")
    fps = cap.get(cv2.CAP_PROP_FPS) or 30.0
    frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT) or 0)
    cap.release()
    return fps, frames

8) Core Function — Track & Export Keypoints to CSV

This function:

  1. Runs tracking with ByteTrack (stable track IDs per person).

  2. Saves an annotated video to OUTPUT_DIR.

  3. Builds a CSV with columns:

    • frame_idx, timestamp_ms, track_id, det_conf

    • For each keypoint: <name>_x, <name>_y, <name>_conf

Do not modify this function unless instructed.


# DO NOT CHANGE ANYTHING IN THIS CELL
def process_video_to_csv(
    input_video: str,
    output_dir: str,
    model: YOLO,
    conf: float = 0.25,
    iou: float = 0.5,
    imgsz: int = 640,
    tracker: str = "bytetrack.yaml",
    project_name: str = "ls100_pose",
    run_name: str = "run1",
) -> dict:
    """Run YOLOv8-Pose tracking on a video and export keypoints to CSV.

    Returns a dict with paths: {'csv_path', 'annotated_video_dir', 'annotated_video' (if found)}
    """
    os.makedirs(output_dir, exist_ok=True)
    fps, total_frames = get_video_fps_frames(input_video)
    print(f"Video FPS: {fps:.2f}, frames: {total_frames}")

    # Run tracking (GPU). Ultralytics writes annotated video(s) to project/name directory.
    results = model.track(
        source=input_video,
        conf=conf,
        iou=iou,
        imgsz=imgsz,
        device=0,
        verbose=False,
        stream=False,
        save=True,
        project=output_dir,  # write directly into OUTPUT_DIR/<run_name>
        name=run_name,
        tracker=tracker,
        persist=True
    )

    # 'results' is a list of per-frame Results
    if not isinstance(results, list):
        # Some versions may return a generator if stream=True; we set stream=False above.
        results = list(results)

    rows = []
    for frame_idx, r in enumerate(tqdm(results, desc="Extracting keypoints")):
        ts_ms = int(round(frame_idx * 1000.0 / max(fps, 1e-6)))

        # No detections
        if r.keypoints is None or r.keypoints.xy is None or len(r.keypoints.xy) == 0:
            continue

        kps_xy = r.keypoints.xy  # shape: [num_people, 17, 2]
        kps_conf = getattr(r.keypoints, 'conf', None)  # may be None or [num_people, 17]
        boxes = r.boxes
        ids = None
        if boxes is not None and getattr(boxes, 'id', None) is not None:
            try:
                ids = boxes.id.cpu().numpy().astype(int)
            except Exception:
                ids = None

        det_conf = None
        if boxes is not None and getattr(boxes, 'conf', None) is not None:
            try:
                det_conf = boxes.conf.cpu().numpy()
            except Exception:
                det_conf = None

        num_people = kps_xy.shape[0]
        for i in range(num_people):
            track_id = int(ids[i]) if ids is not None and i < len(ids) else -1
            this_det_conf = float(det_conf[i]) if det_conf is not None and i < len(det_conf) else None

            row = {
                "frame_idx": frame_idx,
                "timestamp_ms": ts_ms,
                "track_id": track_id,
                "det_conf": this_det_conf
            }

            # keypoints
            kpi = kps_xy[i].cpu().numpy() if hasattr(kps_xy[i], 'cpu') else kps_xy[i]
            kpci = None
            if kps_conf is not None:
                kpci = kps_conf[i].cpu().numpy() if hasattr(kps_conf[i], 'cpu') else kps_conf[i]

            # Fill x,y,(conf) for each named keypoint
            for kp_idx, kp_name in enumerate(COCO_KP_NAMES):
                x = float(kpi[kp_idx, 0])
                y = float(kpi[kp_idx, 1])
                row[f"{kp_name}_x"] = x
                row[f"{kp_name}_y"] = y
                if kpci is not None and kp_idx < kpci.shape[0]:
                    row[f"{kp_name}_conf"] = float(kpci[kp_idx])
                else:
                    row[f"{kp_name}_conf"] = None

            rows.append(row)

    df = pd.DataFrame(rows)
    csv_path = os.path.join(output_dir, f"{run_name}_keypoints.csv")
    df.to_csv(csv_path, index=False)
    print(f"Saved CSV: {csv_path}  ({len(df):,} rows)")

    # Try to find the annotated video saved by Ultralytics
    ann_dir = os.path.join(output_dir, run_name)
    ann_video = None
    if os.path.isdir(ann_dir):
        # heuristics: find first video file in ann_dir
        for f in os.listdir(ann_dir):
            if f.lower().endswith((".mp4", ".avi", ".mov", ".mkv")):
                ann_video = os.path.join(ann_dir, f)
                break

    return {
        "csv_path": csv_path,
        "annotated_video_dir": ann_dir if os.path.isdir(ann_dir) else None,
        "annotated_video": ann_video
    }

9) Run on Your Video

Instruction: Set RUN_NAME (used to name outputs) and run.
The annotated video and CSV will appear under OUTPUT_DIR/RUN_NAME/ and OUTPUT_DIR/ respectively.


# STUDENTS: You may change RUN_NAME. Leave other defaults unless instructed.
RUN_NAME = "demo_run"

# DO NOT CHANGE ANYTHING BELOW
assert INPUT_VIDEO is not None and os.path.exists(INPUT_VIDEO), "Set INPUT_VIDEO to a valid video path."
out = process_video_to_csv(
    input_video=INPUT_VIDEO,
    output_dir=OUTPUT_DIR,
    model=model,
    conf=0.25,
    iou=0.5,
    imgsz=640,
    tracker="bytetrack.yaml",
    project_name="ls100_pose",
    run_name=RUN_NAME
)
print("\nOutputs:")
for k, v in out.items():
    print(f" - {k}: {v}")

10) Preview the CSV

Instruction: Display the first 5 rows.


# DO NOT CHANGE ANYTHING IN THIS CELL
csv_path = out["csv_path"]
df = pd.read_csv(csv_path)
df.head()

11) Optional — Download Outputs

If you’re on Colab, you can download the CSV and annotated video to your computer.


# STUDENTS: You may run this to download files locally.
try:
    from google.colab import files
    files.download(out["csv_path"])
    if out["annotated_video"]:
        files.download(out["annotated_video"])
except Exception as e:
    print("Colab download not available in this environment:", e)

12) Tips & Troubleshooting

  • Speed vs. Accuracy: Switch to yolov8n-pose.pt for faster processing on large videos or multiple people.

  • Tracking IDs: We use ByteTrack for stable track_id. If you see -1, a track wasn’t assigned for that frame.

  • Coordinates: Keypoints are in pixel coordinates of the processed image (default imgsz=640 maintains aspect by letterboxing in Ultralytics). If you need original-space coordinates, you may need to reverse letterboxing (advanced).

  • Multiple People: One row per person per frame. Use track_id to follow the same person across frames.

  • Memory: Long videos can be large. Consider trimming or using yolov8n-pose.pt.

  • Drive Paths: If paths with spaces cause issues, rename or wrap them in quotes (strings are already quoted in Python).


Assignment Ideas (LS100)

  • Compute joint angles over time (e.g., elbow, knee) from keypoints and visualize trends.

  • Segment activities based on pose dynamics (walking vs. running).

  • Compare left/right symmetry in movement for different subjects.