# PRD — Realtime Camera Preview Application (PySide6)

## 1. Overview

Desktop application written in Python using PySide6 for realtime camera preview, performance analysis and future computer vision integration.

The current phase focuses exclusively on:

* camera communication,
* frame acquisition,
* rendering performance,
* telemetry and diagnostics.

AI processing (YOLO/OCR) is intentionally excluded from the first implementation phase to isolate and optimize the video pipeline before introducing computational workloads.

---

# 2. Goals

## Primary Goal

Create a low-latency, modular and extensible realtime video application capable of:

* stable camera preview,
* smooth rendering,
* accurate performance measurements,
* future AI pipeline integration.

## Secondary Goals

* Understand bottlenecks in the video pipeline.
* Establish baseline performance metrics.
* Validate architecture before adding AI workloads.
* Create reusable infrastructure for future CV modules.

---

# 3. Key Architectural Decisions

## 3.1 Use PySide6 + QtMultimedia Instead of OpenCV VideoCapture

### Decision

Use:

* QCamera
* QMediaCaptureSession
* QVideoSink
* QVideoWidget

instead of OpenCV as the primary camera/rendering backend.

### Reasoning

QtMultimedia uses native multimedia frameworks:

* AVFoundation on macOS,
* native GPU accelerated rendering,
* lower latency preview pipeline.

Benefits:

* fewer frame copies,
* smoother rendering,
* better realtime behavior,
* better integration with Qt event loop,
* improved maintainability for GUI applications.

OpenCV remains optional for future image processing tasks but should not own the rendering pipeline.

---

## 3.2 Separate Video Rendering From Processing

### Decision

Video preview must be independent from future AI processing.

### Reasoning

Realtime UX is more important than processing every frame.

The application must:

* keep preview responsive,
* avoid GUI blocking,
* allow frame dropping,
* support asynchronous processing later.

Future AI modules must never block:

* camera acquisition,
* rendering,
* UI thread.

---

## 3.3 Layer-Based Rendering Architecture

### Decision

Bounding boxes and overlays must be rendered on separate layers instead of modifying video frames.

### Reasoning

Drawing directly on video frames:

* increases CPU usage,
* introduces additional memory copies,
* reduces rendering performance.

Separate overlay layers allow:

* smooth preview,
* independent overlay refresh rates,
* future bbox rendering,
* debug overlays,
* annotations,
* interactive tools.

---

## 3.4 Modular Application Design

### Decision

Application must be modular and dependency-injection friendly.

### Reasoning

Future AI pipeline will introduce:

* multiprocessing,
* frame subscribers,
* OCR,
* YOLO,
* telemetry,
* external integrations.

Loose coupling improves:

* testability,
* maintainability,
* scalability,
* replacement of components.

---

# 4. Functional Requirements

## 4.1 Camera Preview

Application must:

* display realtime camera preview,
* support camera switching,
* support resolution selection,
* support FPS selection,
* support reconnect/restart.

Preview should prioritize:

* low latency,
* smooth rendering,
* GUI responsiveness.

---

## 4.2 Performance Monitoring

Application must include a telemetry/performance module.

Metrics should include:

* realtime FPS,
* frame time,
* frame acquisition time,
* rendering time,
* dropped frames,
* idle time,
* queue latency,
* CPU usage,
* optional memory usage.

Metrics should update in realtime.

---

## 4.3 Overlay System

Application must support transparent overlays rendered above video.

Initial use:

* performance metrics display.

Future use:

* bounding boxes,
* object labels,
* debug visualizations,
* OCR results.

Overlay system must not modify original frames.

---

## 4.4 GUI

GUI must remain intentionally minimal.

### Layout

Main window:

* video preview only.

Top menu:

* camera selection,
* resolution selection,
* FPS selection,
* debug options,
* telemetry options.

Overlay:

* semi-transparent performance box.

---

# 5. Non-Functional Requirements

## Performance

Application should:

* minimize frame copies,
* avoid unnecessary color conversions,
* avoid blocking operations in GUI thread,
* support realtime preview at target camera FPS.

---

## Extensibility

Architecture must support future additions:

* YOLO,
* OCR,
* multiprocessing,
* recording,
* snapshots,
* streaming,
* remote sinks.
* play video files

Without major redesign.

---

## Maintainability

Codebase should:

* use clear module boundaries,
* define explicit interfaces,
* avoid tightly coupled UI/business logic,
* support isolated testing.

---

# 6. Proposed High-Level Architecture

```text
Camera Service
    ↓
Frame Dispatcher
    ├── Video Renderer
    ├── Telemetry Collector
    ├── Overlay Manager
    └── Future AI Subscribers

Video Renderer
    ↓
QVideoWidget

Overlay Layer
    ↓
Metrics / Future BBoxes
```

---

# 7. Future Expansion (Out of Scope)

The following features are intentionally excluded from current implementation:

* YOLO inference,
* OCR,
* multiprocessing workers,
* tracking,
* recording,
* networking.

Architecture must remain prepared for these additions.

---

# 8. Success Criteria

The first implementation phase is successful if:

* camera preview is smooth and stable,
* rendering latency is low,
* telemetry data is accurate,
* GUI remains responsive,
* overlay system works correctly,
* architecture supports future frame subscribers.