Files
duck-preview/notes/01-mvp-preview.md
2026-05-12 19:35:04 +02:00

5.5 KiB

PRD — Realtime Camera Preview Application (PySide6)

1. Overview

Desktop application written in Python using PySide6 for realtime camera preview, performance analysis and future computer vision integration.

The current phase focuses exclusively on:

  • camera communication,
  • frame acquisition,
  • rendering performance,
  • telemetry and diagnostics.

AI processing (YOLO/OCR) is intentionally excluded from the first implementation phase to isolate and optimize the video pipeline before introducing computational workloads.


2. Goals

Primary Goal

Create a low-latency, modular and extensible realtime video application capable of:

  • stable camera preview,
  • smooth rendering,
  • accurate performance measurements,
  • future AI pipeline integration.

Secondary Goals

  • Understand bottlenecks in the video pipeline.
  • Establish baseline performance metrics.
  • Validate architecture before adding AI workloads.
  • Create reusable infrastructure for future CV modules.

3. Key Architectural Decisions

3.1 Use PySide6 + QtMultimedia Instead of OpenCV VideoCapture

Decision

Use:

  • QCamera
  • QMediaCaptureSession
  • QVideoSink
  • QVideoWidget

instead of OpenCV as the primary camera/rendering backend.

Reasoning

QtMultimedia uses native multimedia frameworks:

  • AVFoundation on macOS,
  • native GPU accelerated rendering,
  • lower latency preview pipeline.

Benefits:

  • fewer frame copies,
  • smoother rendering,
  • better realtime behavior,
  • better integration with Qt event loop,
  • improved maintainability for GUI applications.

OpenCV remains optional for future image processing tasks but should not own the rendering pipeline.


3.2 Separate Video Rendering From Processing

Decision

Video preview must be independent from future AI processing.

Reasoning

Realtime UX is more important than processing every frame.

The application must:

  • keep preview responsive,
  • avoid GUI blocking,
  • allow frame dropping,
  • support asynchronous processing later.

Future AI modules must never block:

  • camera acquisition,
  • rendering,
  • UI thread.

3.3 Layer-Based Rendering Architecture

Decision

Bounding boxes and overlays must be rendered on separate layers instead of modifying video frames.

Reasoning

Drawing directly on video frames:

  • increases CPU usage,
  • introduces additional memory copies,
  • reduces rendering performance.

Separate overlay layers allow:

  • smooth preview,
  • independent overlay refresh rates,
  • future bbox rendering,
  • debug overlays,
  • annotations,
  • interactive tools.

3.4 Modular Application Design

Decision

Application must be modular and dependency-injection friendly.

Reasoning

Future AI pipeline will introduce:

  • multiprocessing,
  • frame subscribers,
  • OCR,
  • YOLO,
  • telemetry,
  • external integrations.

Loose coupling improves:

  • testability,
  • maintainability,
  • scalability,
  • replacement of components.

4. Functional Requirements

4.1 Camera Preview

Application must:

  • display realtime camera preview,
  • support camera switching,
  • support resolution selection,
  • support FPS selection,
  • support reconnect/restart.

Preview should prioritize:

  • low latency,
  • smooth rendering,
  • GUI responsiveness.

4.2 Performance Monitoring

Application must include a telemetry/performance module.

Metrics should include:

  • realtime FPS,
  • frame time,
  • frame acquisition time,
  • rendering time,
  • dropped frames,
  • idle time,
  • queue latency,
  • CPU usage,
  • optional memory usage.

Metrics should update in realtime.


4.3 Overlay System

Application must support transparent overlays rendered above video.

Initial use:

  • performance metrics display.

Future use:

  • bounding boxes,
  • object labels,
  • debug visualizations,
  • OCR results.

Overlay system must not modify original frames.


4.4 GUI

GUI must remain intentionally minimal.

Layout

Main window:

  • video preview only.

Top menu:

  • camera selection,
  • resolution selection,
  • FPS selection,
  • debug options,
  • telemetry options.

Overlay:

  • semi-transparent performance box.

5. Non-Functional Requirements

Performance

Application should:

  • minimize frame copies,
  • avoid unnecessary color conversions,
  • avoid blocking operations in GUI thread,
  • support realtime preview at target camera FPS.

Extensibility

Architecture must support future additions:

  • YOLO,
  • OCR,
  • multiprocessing,
  • recording,
  • snapshots,
  • streaming,
  • remote sinks.
  • play video files

Without major redesign.


Maintainability

Codebase should:

  • use clear module boundaries,
  • define explicit interfaces,
  • avoid tightly coupled UI/business logic,
  • support isolated testing.

6. Proposed High-Level Architecture

Camera Service
    ↓
Frame Dispatcher
    ├── Video Renderer
    ├── Telemetry Collector
    ├── Overlay Manager
    └── Future AI Subscribers

Video Renderer
    ↓
QVideoWidget

Overlay Layer
    ↓
Metrics / Future BBoxes

7. Future Expansion (Out of Scope)

The following features are intentionally excluded from current implementation:

  • YOLO inference,
  • OCR,
  • multiprocessing workers,
  • tracking,
  • recording,
  • networking.

Architecture must remain prepared for these additions.


8. Success Criteria

The first implementation phase is successful if:

  • camera preview is smooth and stable,
  • rendering latency is low,
  • telemetry data is accurate,
  • GUI remains responsive,
  • overlay system works correctly,
  • architecture supports future frame subscribers.