320 lines
5.5 KiB
Markdown
320 lines
5.5 KiB
Markdown
# PRD — Realtime Camera Preview Application (PySide6)
|
|
|
|
## 1. Overview
|
|
|
|
Desktop application written in Python using PySide6 for realtime camera preview, performance analysis and future computer vision integration.
|
|
|
|
The current phase focuses exclusively on:
|
|
|
|
* camera communication,
|
|
* frame acquisition,
|
|
* rendering performance,
|
|
* telemetry and diagnostics.
|
|
|
|
AI processing (YOLO/OCR) is intentionally excluded from the first implementation phase to isolate and optimize the video pipeline before introducing computational workloads.
|
|
|
|
---
|
|
|
|
# 2. Goals
|
|
|
|
## Primary Goal
|
|
|
|
Create a low-latency, modular and extensible realtime video application capable of:
|
|
|
|
* stable camera preview,
|
|
* smooth rendering,
|
|
* accurate performance measurements,
|
|
* future AI pipeline integration.
|
|
|
|
## Secondary Goals
|
|
|
|
* Understand bottlenecks in the video pipeline.
|
|
* Establish baseline performance metrics.
|
|
* Validate architecture before adding AI workloads.
|
|
* Create reusable infrastructure for future CV modules.
|
|
|
|
---
|
|
|
|
# 3. Key Architectural Decisions
|
|
|
|
## 3.1 Use PySide6 + QtMultimedia Instead of OpenCV VideoCapture
|
|
|
|
### Decision
|
|
|
|
Use:
|
|
|
|
* QCamera
|
|
* QMediaCaptureSession
|
|
* QVideoSink
|
|
* QVideoWidget
|
|
|
|
instead of OpenCV as the primary camera/rendering backend.
|
|
|
|
### Reasoning
|
|
|
|
QtMultimedia uses native multimedia frameworks:
|
|
|
|
* AVFoundation on macOS,
|
|
* native GPU accelerated rendering,
|
|
* lower latency preview pipeline.
|
|
|
|
Benefits:
|
|
|
|
* fewer frame copies,
|
|
* smoother rendering,
|
|
* better realtime behavior,
|
|
* better integration with Qt event loop,
|
|
* improved maintainability for GUI applications.
|
|
|
|
OpenCV remains optional for future image processing tasks but should not own the rendering pipeline.
|
|
|
|
---
|
|
|
|
## 3.2 Separate Video Rendering From Processing
|
|
|
|
### Decision
|
|
|
|
Video preview must be independent from future AI processing.
|
|
|
|
### Reasoning
|
|
|
|
Realtime UX is more important than processing every frame.
|
|
|
|
The application must:
|
|
|
|
* keep preview responsive,
|
|
* avoid GUI blocking,
|
|
* allow frame dropping,
|
|
* support asynchronous processing later.
|
|
|
|
Future AI modules must never block:
|
|
|
|
* camera acquisition,
|
|
* rendering,
|
|
* UI thread.
|
|
|
|
---
|
|
|
|
## 3.3 Layer-Based Rendering Architecture
|
|
|
|
### Decision
|
|
|
|
Bounding boxes and overlays must be rendered on separate layers instead of modifying video frames.
|
|
|
|
### Reasoning
|
|
|
|
Drawing directly on video frames:
|
|
|
|
* increases CPU usage,
|
|
* introduces additional memory copies,
|
|
* reduces rendering performance.
|
|
|
|
Separate overlay layers allow:
|
|
|
|
* smooth preview,
|
|
* independent overlay refresh rates,
|
|
* future bbox rendering,
|
|
* debug overlays,
|
|
* annotations,
|
|
* interactive tools.
|
|
|
|
---
|
|
|
|
## 3.4 Modular Application Design
|
|
|
|
### Decision
|
|
|
|
Application must be modular and dependency-injection friendly.
|
|
|
|
### Reasoning
|
|
|
|
Future AI pipeline will introduce:
|
|
|
|
* multiprocessing,
|
|
* frame subscribers,
|
|
* OCR,
|
|
* YOLO,
|
|
* telemetry,
|
|
* external integrations.
|
|
|
|
Loose coupling improves:
|
|
|
|
* testability,
|
|
* maintainability,
|
|
* scalability,
|
|
* replacement of components.
|
|
|
|
---
|
|
|
|
# 4. Functional Requirements
|
|
|
|
## 4.1 Camera Preview
|
|
|
|
Application must:
|
|
|
|
* display realtime camera preview,
|
|
* support camera switching,
|
|
* support resolution selection,
|
|
* support FPS selection,
|
|
* support reconnect/restart.
|
|
|
|
Preview should prioritize:
|
|
|
|
* low latency,
|
|
* smooth rendering,
|
|
* GUI responsiveness.
|
|
|
|
---
|
|
|
|
## 4.2 Performance Monitoring
|
|
|
|
Application must include a telemetry/performance module.
|
|
|
|
Metrics should include:
|
|
|
|
* realtime FPS,
|
|
* frame time,
|
|
* frame acquisition time,
|
|
* rendering time,
|
|
* dropped frames,
|
|
* idle time,
|
|
* queue latency,
|
|
* CPU usage,
|
|
* optional memory usage.
|
|
|
|
Metrics should update in realtime.
|
|
|
|
---
|
|
|
|
## 4.3 Overlay System
|
|
|
|
Application must support transparent overlays rendered above video.
|
|
|
|
Initial use:
|
|
|
|
* performance metrics display.
|
|
|
|
Future use:
|
|
|
|
* bounding boxes,
|
|
* object labels,
|
|
* debug visualizations,
|
|
* OCR results.
|
|
|
|
Overlay system must not modify original frames.
|
|
|
|
---
|
|
|
|
## 4.4 GUI
|
|
|
|
GUI must remain intentionally minimal.
|
|
|
|
### Layout
|
|
|
|
Main window:
|
|
|
|
* video preview only.
|
|
|
|
Top menu:
|
|
|
|
* camera selection,
|
|
* resolution selection,
|
|
* FPS selection,
|
|
* debug options,
|
|
* telemetry options.
|
|
|
|
Overlay:
|
|
|
|
* semi-transparent performance box.
|
|
|
|
---
|
|
|
|
# 5. Non-Functional Requirements
|
|
|
|
## Performance
|
|
|
|
Application should:
|
|
|
|
* minimize frame copies,
|
|
* avoid unnecessary color conversions,
|
|
* avoid blocking operations in GUI thread,
|
|
* support realtime preview at target camera FPS.
|
|
|
|
---
|
|
|
|
## Extensibility
|
|
|
|
Architecture must support future additions:
|
|
|
|
* YOLO,
|
|
* OCR,
|
|
* multiprocessing,
|
|
* recording,
|
|
* snapshots,
|
|
* streaming,
|
|
* remote sinks.
|
|
* play video files
|
|
|
|
Without major redesign.
|
|
|
|
---
|
|
|
|
## Maintainability
|
|
|
|
Codebase should:
|
|
|
|
* use clear module boundaries,
|
|
* define explicit interfaces,
|
|
* avoid tightly coupled UI/business logic,
|
|
* support isolated testing.
|
|
|
|
---
|
|
|
|
# 6. Proposed High-Level Architecture
|
|
|
|
```text
|
|
Camera Service
|
|
↓
|
|
Frame Dispatcher
|
|
├── Video Renderer
|
|
├── Telemetry Collector
|
|
├── Overlay Manager
|
|
└── Future AI Subscribers
|
|
|
|
Video Renderer
|
|
↓
|
|
QVideoWidget
|
|
|
|
Overlay Layer
|
|
↓
|
|
Metrics / Future BBoxes
|
|
```
|
|
|
|
---
|
|
|
|
# 7. Future Expansion (Out of Scope)
|
|
|
|
The following features are intentionally excluded from current implementation:
|
|
|
|
* YOLO inference,
|
|
* OCR,
|
|
* multiprocessing workers,
|
|
* tracking,
|
|
* recording,
|
|
* networking.
|
|
|
|
Architecture must remain prepared for these additions.
|
|
|
|
---
|
|
|
|
# 8. Success Criteria
|
|
|
|
The first implementation phase is successful if:
|
|
|
|
* camera preview is smooth and stable,
|
|
* rendering latency is low,
|
|
* telemetry data is accurate,
|
|
* GUI remains responsive,
|
|
* overlay system works correctly,
|
|
* architecture supports future frame subscribers.
|