5.4 KiB
PRD — Realtime Camera Preview Application (PySide6)
1. Overview
Desktop application written in Python using PySide6 for realtime camera preview, performance analysis and future computer vision integration.
The current phase focuses exclusively on:
- camera communication,
- frame acquisition,
- rendering performance,
- telemetry and diagnostics.
AI processing (YOLO/OCR) is intentionally excluded from the first implementation phase to isolate and optimize the video pipeline before introducing computational workloads.
2. Goals
Primary Goal
Create a low-latency, modular and extensible realtime video application capable of:
- stable camera preview,
- smooth rendering,
- accurate performance measurements,
- future AI pipeline integration.
Secondary Goals
- Understand bottlenecks in the video pipeline.
- Establish baseline performance metrics.
- Validate architecture before adding AI workloads.
- Create reusable infrastructure for future CV modules.
3. Key Architectural Decisions
3.1 Use PySide6 + QtMultimedia Instead of OpenCV VideoCapture
Decision
Use:
- QCamera
- QMediaCaptureSession
- QVideoSink
- QVideoWidget
instead of OpenCV as the primary camera/rendering backend.
Reasoning
QtMultimedia uses native multimedia frameworks:
- AVFoundation on macOS,
- native GPU accelerated rendering,
- lower latency preview pipeline.
Benefits:
- fewer frame copies,
- smoother rendering,
- better realtime behavior,
- better integration with Qt event loop,
- improved maintainability for GUI applications.
OpenCV remains optional for future image processing tasks but should not own the rendering pipeline.
3.2 Separate Video Rendering From Processing
Decision
Video preview must be independent from future AI processing.
Reasoning
Realtime UX is more important than processing every frame.
The application must:
- keep preview responsive,
- avoid GUI blocking,
- allow frame dropping,
- support asynchronous processing later.
Future AI modules must never block:
- camera acquisition,
- rendering,
- UI thread.
3.3 Layer-Based Rendering Architecture
Decision
Bounding boxes and overlays must be rendered on separate layers instead of modifying video frames.
Reasoning
Drawing directly on video frames:
- increases CPU usage,
- introduces additional memory copies,
- reduces rendering performance.
Separate overlay layers allow:
- smooth preview,
- independent overlay refresh rates,
- future bbox rendering,
- debug overlays,
- annotations,
- interactive tools.
3.4 Modular Application Design
Decision
Application must be modular and dependency-injection friendly.
Reasoning
Future AI pipeline will introduce:
- multiprocessing,
- frame subscribers,
- OCR,
- YOLO,
- telemetry,
- external integrations.
Loose coupling improves:
- testability,
- maintainability,
- scalability,
- replacement of components.
4. Functional Requirements
4.1 Camera Preview
Application must:
- display realtime camera preview,
- support camera switching,
- support resolution selection,
- support FPS selection,
- support reconnect/restart.
Preview should prioritize:
- low latency,
- smooth rendering,
- GUI responsiveness.
4.2 Performance Monitoring
Application must include a telemetry/performance module.
Metrics should include:
- realtime FPS,
- frame time,
- frame acquisition time,
- rendering time,
- dropped frames,
- idle time,
- queue latency,
- CPU usage,
- optional memory usage.
Metrics should update in realtime.
4.3 Overlay System
Application must support transparent overlays rendered above video.
Initial use:
- performance metrics display.
Future use:
- bounding boxes,
- object labels,
- debug visualizations,
- OCR results.
Overlay system must not modify original frames.
4.4 GUI
GUI must remain intentionally minimal.
Layout
Main window:
- video preview only.
Top menu:
- camera selection,
- resolution selection,
- FPS selection,
- debug options,
- telemetry options.
Overlay:
- semi-transparent performance box.
5. Non-Functional Requirements
Performance
Application should:
- minimize frame copies,
- avoid unnecessary color conversions,
- avoid blocking operations in GUI thread,
- support realtime preview at target camera FPS.
Extensibility
Architecture must support future additions:
- YOLO,
- OCR,
- multiprocessing,
- recording,
- snapshots,
- streaming,
- remote sinks.
Without major redesign.
Maintainability
Codebase should:
- use clear module boundaries,
- define explicit interfaces,
- avoid tightly coupled UI/business logic,
- support isolated testing.
6. Proposed High-Level Architecture
Camera Service
↓
Frame Dispatcher
├── Video Renderer
├── Telemetry Collector
├── Overlay Manager
└── Future AI Subscribers
Video Renderer
↓
QVideoWidget
Overlay Layer
↓
Metrics / Future BBoxes
7. Future Expansion (Out of Scope)
The following features are intentionally excluded from current implementation:
- YOLO inference,
- OCR,
- multiprocessing workers,
- tracking,
- recording,
- networking.
Architecture must remain prepared for these additions.
8. Success Criteria
The first implementation phase is successful if:
- camera preview is smooth and stable,
- rendering latency is low,
- telemetry data is accurate,
- GUI remains responsive,
- overlay system works correctly,
- architecture supports future frame subscribers.