init commit

This commit is contained in:
2026-05-11 19:05:24 +02:00
commit be85d7ca31
2 changed files with 350 additions and 0 deletions

318
notes/01-mvp-preview.md Normal file
View File

@@ -0,0 +1,318 @@
# PRD — Realtime Camera Preview Application (PySide6)
## 1. Overview
Desktop application written in Python using PySide6 for realtime camera preview, performance analysis and future computer vision integration.
The current phase focuses exclusively on:
* camera communication,
* frame acquisition,
* rendering performance,
* telemetry and diagnostics.
AI processing (YOLO/OCR) is intentionally excluded from the first implementation phase to isolate and optimize the video pipeline before introducing computational workloads.
---
# 2. Goals
## Primary Goal
Create a low-latency, modular and extensible realtime video application capable of:
* stable camera preview,
* smooth rendering,
* accurate performance measurements,
* future AI pipeline integration.
## Secondary Goals
* Understand bottlenecks in the video pipeline.
* Establish baseline performance metrics.
* Validate architecture before adding AI workloads.
* Create reusable infrastructure for future CV modules.
---
# 3. Key Architectural Decisions
## 3.1 Use PySide6 + QtMultimedia Instead of OpenCV VideoCapture
### Decision
Use:
* QCamera
* QMediaCaptureSession
* QVideoSink
* QVideoWidget
instead of OpenCV as the primary camera/rendering backend.
### Reasoning
QtMultimedia uses native multimedia frameworks:
* AVFoundation on macOS,
* native GPU accelerated rendering,
* lower latency preview pipeline.
Benefits:
* fewer frame copies,
* smoother rendering,
* better realtime behavior,
* better integration with Qt event loop,
* improved maintainability for GUI applications.
OpenCV remains optional for future image processing tasks but should not own the rendering pipeline.
---
## 3.2 Separate Video Rendering From Processing
### Decision
Video preview must be independent from future AI processing.
### Reasoning
Realtime UX is more important than processing every frame.
The application must:
* keep preview responsive,
* avoid GUI blocking,
* allow frame dropping,
* support asynchronous processing later.
Future AI modules must never block:
* camera acquisition,
* rendering,
* UI thread.
---
## 3.3 Layer-Based Rendering Architecture
### Decision
Bounding boxes and overlays must be rendered on separate layers instead of modifying video frames.
### Reasoning
Drawing directly on video frames:
* increases CPU usage,
* introduces additional memory copies,
* reduces rendering performance.
Separate overlay layers allow:
* smooth preview,
* independent overlay refresh rates,
* future bbox rendering,
* debug overlays,
* annotations,
* interactive tools.
---
## 3.4 Modular Application Design
### Decision
Application must be modular and dependency-injection friendly.
### Reasoning
Future AI pipeline will introduce:
* multiprocessing,
* frame subscribers,
* OCR,
* YOLO,
* telemetry,
* external integrations.
Loose coupling improves:
* testability,
* maintainability,
* scalability,
* replacement of components.
---
# 4. Functional Requirements
## 4.1 Camera Preview
Application must:
* display realtime camera preview,
* support camera switching,
* support resolution selection,
* support FPS selection,
* support reconnect/restart.
Preview should prioritize:
* low latency,
* smooth rendering,
* GUI responsiveness.
---
## 4.2 Performance Monitoring
Application must include a telemetry/performance module.
Metrics should include:
* realtime FPS,
* frame time,
* frame acquisition time,
* rendering time,
* dropped frames,
* idle time,
* queue latency,
* CPU usage,
* optional memory usage.
Metrics should update in realtime.
---
## 4.3 Overlay System
Application must support transparent overlays rendered above video.
Initial use:
* performance metrics display.
Future use:
* bounding boxes,
* object labels,
* debug visualizations,
* OCR results.
Overlay system must not modify original frames.
---
## 4.4 GUI
GUI must remain intentionally minimal.
### Layout
Main window:
* video preview only.
Top menu:
* camera selection,
* resolution selection,
* FPS selection,
* debug options,
* telemetry options.
Overlay:
* semi-transparent performance box.
---
# 5. Non-Functional Requirements
## Performance
Application should:
* minimize frame copies,
* avoid unnecessary color conversions,
* avoid blocking operations in GUI thread,
* support realtime preview at target camera FPS.
---
## Extensibility
Architecture must support future additions:
* YOLO,
* OCR,
* multiprocessing,
* recording,
* snapshots,
* streaming,
* remote sinks.
Without major redesign.
---
## Maintainability
Codebase should:
* use clear module boundaries,
* define explicit interfaces,
* avoid tightly coupled UI/business logic,
* support isolated testing.
---
# 6. Proposed High-Level Architecture
```text
Camera Service
Frame Dispatcher
├── Video Renderer
├── Telemetry Collector
├── Overlay Manager
└── Future AI Subscribers
Video Renderer
QVideoWidget
Overlay Layer
Metrics / Future BBoxes
```
---
# 7. Future Expansion (Out of Scope)
The following features are intentionally excluded from current implementation:
* YOLO inference,
* OCR,
* multiprocessing workers,
* tracking,
* recording,
* networking.
Architecture must remain prepared for these additions.
---
# 8. Success Criteria
The first implementation phase is successful if:
* camera preview is smooth and stable,
* rendering latency is low,
* telemetry data is accurate,
* GUI remains responsive,
* overlay system works correctly,
* architecture supports future frame subscribers.