Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

This notebook splits a video into consecutive frame-count–based chunks and saves them into a subfolder named after the source video.

Example: user/xyz/documents/videos/example.mp4 → outputs into user/xyz/documents/videos/example/ as example_chunk01.mp4, example_chunk02.mp4, …

Overview

What this does

  • Creates an output subdirectory with the base filename of your video.

  • Splits the video into consecutive chunks, each containing a user-defined number of frames.

  • Writes a final remainder chunk if the video length isn’t divisible by your chunk size.

  • Names chunks as: <video_stem>_chunkXX.mp4 (e.g., example_chunk01.mp4).

Why frames (not time)?
This ensures exact frame counts per chunk, which is helpful when you want deterministic splits for annotation, ML, or analysis workflows.


Requirements

  • Python 3.8+

  • OpenCV (cv2) installed with a working backend (FFmpeg/GStreamer depending on OS).

  • Sufficient disk space to write chunked files.

Tip (macOS/Linux): If your OpenCV lacks codecs, install/enable FFmpeg. On macOS, brew install ffmpeg can help; on Linux, install via your package manager.

Install OpenCV, if you need

Uncomment and run the cell below if you need to install OpenCV. If you’re in a restricted network, install locally on your machine.


#!pip install --upgrade opencv-python

# If you need FFMPEG-enabled backend, ensure FFmpeg is installed system-wide.
#!pip install ffmpeg

1) Set Your Inputs

  • input_path: Absolute or relative path to your video file.

  • chunk_num_frame: Number of frames per chunk (positive integer).

  • codec: FourCC code for output (default mp4v; try avc1 or H264 if available on your system for smaller files).


from pathlib import Path

# >>> EDIT THESE <<<
input_path = Path("/Users/souvikmandal/Documents/example.mp4")
chunk_num_frame = 1000
codec = "mp4v"   # alternatives: "avc1", "H264" (requires proper system codecs)

# No edits needed below
input_path = input_path.expanduser().resolve()
input_path

2) Core Function

The function below reads frames sequentially and writes chunk files with the same FPS and resolution as the source video.


import sys
import cv2

def split_video_by_frames(input_path: Path, chunk_num_frame: int, codec: str = "mp4v") -> Path:
    """Split a video into consecutive chunks by frame count.

    Args:
        input_path: Path to the input video.
        chunk_num_frame: Number of frames per chunk (must be > 0).
        codec: FourCC for output encoding (e.g., 'mp4v', 'avc1', 'H264').

    Returns:
        Path to the output directory where chunks are saved.
    """
    if not input_path.exists() or not input_path.is_file():
        raise FileNotFoundError(f"Input file not found: {input_path}")
    if chunk_num_frame <= 0:
        raise ValueError("--chunk_num_frame must be a positive integer.")

    stem = input_path.stem                      # e.g., "example"
    parent_dir = input_path.parent              # e.g., user/xyz/documents/videos
    output_dir = parent_dir / stem              # e.g., user/xyz/documents/videos/example
    output_dir.mkdir(parents=True, exist_ok=True)

    cap = cv2.VideoCapture(str(input_path))
    if not cap.isOpened():
        raise RuntimeError(f"Could not open video: {input_path}")

    fps = cap.get(cv2.CAP_PROP_FPS)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    if fps <= 0 or width <= 0 or height <= 0:
        print("[WARN] Could not read video metadata reliably. Proceeding with defaults if possible.", file=sys.stderr)

    fourcc = cv2.VideoWriter_fourcc(*codec)
    chunk_idx = 1
    frames_in_current_chunk = 0
    total_frames = 0
    writer = None

    def start_new_writer(index: int):
        nonlocal writer, frames_in_current_chunk
        out_name = f"{stem}_chunk{index:02d}.mp4"
        out_path = output_dir / out_name
        writer = cv2.VideoWriter(str(out_path), fourcc, fps if fps > 0 else 30.0, (width, height))
        if not writer.isOpened():
            cap.release()
            raise RuntimeError(f"Could not open writer for: {out_path}")
        frames_in_current_chunk = 0
        print(f"[INFO] Writing: {out_path}")

    # Initialize writer for the first chunk
    start_new_writer(chunk_idx)

    try:
        while True:
            ok, frame = cap.read()
            if not ok:
                break  # end of video

            writer.write(frame)
            frames_in_current_chunk += 1
            total_frames += 1

            if frames_in_current_chunk >= chunk_num_frame:
                writer.release()
                chunk_idx += 1
                start_new_writer(chunk_idx)
    finally:
        if writer is not None:
            if frames_in_current_chunk == 0:
                # last writer created but no frames written; try to delete empty file
                writer.release()
                empty_out = output_dir / f"{stem}_chunk{chunk_idx:02d}.mp4"
                try:
                    if empty_out.exists() and empty_out.stat().st_size == 0:
                        empty_out.unlink(missing_ok=True)
                except Exception:
                    pass
            else:
                writer.release()
        cap.release()

    print("\n[SUMMARY]")
    print(f"  Input video: {input_path}")
    print(f"  Output dir : {output_dir}")
    print(f"  Total frames processed: {total_frames}")
    return output_dir

3) Run the Splitter

Run the cell below to split your video using the parameters defined earlier.


out_dir = split_video_by_frames(input_path, chunk_num_frame, codec)
out_dir

4) Verify Outputs

List the chunked files to confirm.


sorted(list(out_dir.glob("*.mp4")))