init commit
This commit is contained in:
318
notes/01-mvp-preview.md
Normal file
318
notes/01-mvp-preview.md
Normal file
@@ -0,0 +1,318 @@
|
||||
# PRD — Realtime Camera Preview Application (PySide6)
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Desktop application written in Python using PySide6 for realtime camera preview, performance analysis and future computer vision integration.
|
||||
|
||||
The current phase focuses exclusively on:
|
||||
|
||||
* camera communication,
|
||||
* frame acquisition,
|
||||
* rendering performance,
|
||||
* telemetry and diagnostics.
|
||||
|
||||
AI processing (YOLO/OCR) is intentionally excluded from the first implementation phase to isolate and optimize the video pipeline before introducing computational workloads.
|
||||
|
||||
---
|
||||
|
||||
# 2. Goals
|
||||
|
||||
## Primary Goal
|
||||
|
||||
Create a low-latency, modular and extensible realtime video application capable of:
|
||||
|
||||
* stable camera preview,
|
||||
* smooth rendering,
|
||||
* accurate performance measurements,
|
||||
* future AI pipeline integration.
|
||||
|
||||
## Secondary Goals
|
||||
|
||||
* Understand bottlenecks in the video pipeline.
|
||||
* Establish baseline performance metrics.
|
||||
* Validate architecture before adding AI workloads.
|
||||
* Create reusable infrastructure for future CV modules.
|
||||
|
||||
---
|
||||
|
||||
# 3. Key Architectural Decisions
|
||||
|
||||
## 3.1 Use PySide6 + QtMultimedia Instead of OpenCV VideoCapture
|
||||
|
||||
### Decision
|
||||
|
||||
Use:
|
||||
|
||||
* QCamera
|
||||
* QMediaCaptureSession
|
||||
* QVideoSink
|
||||
* QVideoWidget
|
||||
|
||||
instead of OpenCV as the primary camera/rendering backend.
|
||||
|
||||
### Reasoning
|
||||
|
||||
QtMultimedia uses native multimedia frameworks:
|
||||
|
||||
* AVFoundation on macOS,
|
||||
* native GPU accelerated rendering,
|
||||
* lower latency preview pipeline.
|
||||
|
||||
Benefits:
|
||||
|
||||
* fewer frame copies,
|
||||
* smoother rendering,
|
||||
* better realtime behavior,
|
||||
* better integration with Qt event loop,
|
||||
* improved maintainability for GUI applications.
|
||||
|
||||
OpenCV remains optional for future image processing tasks but should not own the rendering pipeline.
|
||||
|
||||
---
|
||||
|
||||
## 3.2 Separate Video Rendering From Processing
|
||||
|
||||
### Decision
|
||||
|
||||
Video preview must be independent from future AI processing.
|
||||
|
||||
### Reasoning
|
||||
|
||||
Realtime UX is more important than processing every frame.
|
||||
|
||||
The application must:
|
||||
|
||||
* keep preview responsive,
|
||||
* avoid GUI blocking,
|
||||
* allow frame dropping,
|
||||
* support asynchronous processing later.
|
||||
|
||||
Future AI modules must never block:
|
||||
|
||||
* camera acquisition,
|
||||
* rendering,
|
||||
* UI thread.
|
||||
|
||||
---
|
||||
|
||||
## 3.3 Layer-Based Rendering Architecture
|
||||
|
||||
### Decision
|
||||
|
||||
Bounding boxes and overlays must be rendered on separate layers instead of modifying video frames.
|
||||
|
||||
### Reasoning
|
||||
|
||||
Drawing directly on video frames:
|
||||
|
||||
* increases CPU usage,
|
||||
* introduces additional memory copies,
|
||||
* reduces rendering performance.
|
||||
|
||||
Separate overlay layers allow:
|
||||
|
||||
* smooth preview,
|
||||
* independent overlay refresh rates,
|
||||
* future bbox rendering,
|
||||
* debug overlays,
|
||||
* annotations,
|
||||
* interactive tools.
|
||||
|
||||
---
|
||||
|
||||
## 3.4 Modular Application Design
|
||||
|
||||
### Decision
|
||||
|
||||
Application must be modular and dependency-injection friendly.
|
||||
|
||||
### Reasoning
|
||||
|
||||
Future AI pipeline will introduce:
|
||||
|
||||
* multiprocessing,
|
||||
* frame subscribers,
|
||||
* OCR,
|
||||
* YOLO,
|
||||
* telemetry,
|
||||
* external integrations.
|
||||
|
||||
Loose coupling improves:
|
||||
|
||||
* testability,
|
||||
* maintainability,
|
||||
* scalability,
|
||||
* replacement of components.
|
||||
|
||||
---
|
||||
|
||||
# 4. Functional Requirements
|
||||
|
||||
## 4.1 Camera Preview
|
||||
|
||||
Application must:
|
||||
|
||||
* display realtime camera preview,
|
||||
* support camera switching,
|
||||
* support resolution selection,
|
||||
* support FPS selection,
|
||||
* support reconnect/restart.
|
||||
|
||||
Preview should prioritize:
|
||||
|
||||
* low latency,
|
||||
* smooth rendering,
|
||||
* GUI responsiveness.
|
||||
|
||||
---
|
||||
|
||||
## 4.2 Performance Monitoring
|
||||
|
||||
Application must include a telemetry/performance module.
|
||||
|
||||
Metrics should include:
|
||||
|
||||
* realtime FPS,
|
||||
* frame time,
|
||||
* frame acquisition time,
|
||||
* rendering time,
|
||||
* dropped frames,
|
||||
* idle time,
|
||||
* queue latency,
|
||||
* CPU usage,
|
||||
* optional memory usage.
|
||||
|
||||
Metrics should update in realtime.
|
||||
|
||||
---
|
||||
|
||||
## 4.3 Overlay System
|
||||
|
||||
Application must support transparent overlays rendered above video.
|
||||
|
||||
Initial use:
|
||||
|
||||
* performance metrics display.
|
||||
|
||||
Future use:
|
||||
|
||||
* bounding boxes,
|
||||
* object labels,
|
||||
* debug visualizations,
|
||||
* OCR results.
|
||||
|
||||
Overlay system must not modify original frames.
|
||||
|
||||
---
|
||||
|
||||
## 4.4 GUI
|
||||
|
||||
GUI must remain intentionally minimal.
|
||||
|
||||
### Layout
|
||||
|
||||
Main window:
|
||||
|
||||
* video preview only.
|
||||
|
||||
Top menu:
|
||||
|
||||
* camera selection,
|
||||
* resolution selection,
|
||||
* FPS selection,
|
||||
* debug options,
|
||||
* telemetry options.
|
||||
|
||||
Overlay:
|
||||
|
||||
* semi-transparent performance box.
|
||||
|
||||
---
|
||||
|
||||
# 5. Non-Functional Requirements
|
||||
|
||||
## Performance
|
||||
|
||||
Application should:
|
||||
|
||||
* minimize frame copies,
|
||||
* avoid unnecessary color conversions,
|
||||
* avoid blocking operations in GUI thread,
|
||||
* support realtime preview at target camera FPS.
|
||||
|
||||
---
|
||||
|
||||
## Extensibility
|
||||
|
||||
Architecture must support future additions:
|
||||
|
||||
* YOLO,
|
||||
* OCR,
|
||||
* multiprocessing,
|
||||
* recording,
|
||||
* snapshots,
|
||||
* streaming,
|
||||
* remote sinks.
|
||||
|
||||
Without major redesign.
|
||||
|
||||
---
|
||||
|
||||
## Maintainability
|
||||
|
||||
Codebase should:
|
||||
|
||||
* use clear module boundaries,
|
||||
* define explicit interfaces,
|
||||
* avoid tightly coupled UI/business logic,
|
||||
* support isolated testing.
|
||||
|
||||
---
|
||||
|
||||
# 6. Proposed High-Level Architecture
|
||||
|
||||
```text
|
||||
Camera Service
|
||||
↓
|
||||
Frame Dispatcher
|
||||
├── Video Renderer
|
||||
├── Telemetry Collector
|
||||
├── Overlay Manager
|
||||
└── Future AI Subscribers
|
||||
|
||||
Video Renderer
|
||||
↓
|
||||
QVideoWidget
|
||||
|
||||
Overlay Layer
|
||||
↓
|
||||
Metrics / Future BBoxes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 7. Future Expansion (Out of Scope)
|
||||
|
||||
The following features are intentionally excluded from current implementation:
|
||||
|
||||
* YOLO inference,
|
||||
* OCR,
|
||||
* multiprocessing workers,
|
||||
* tracking,
|
||||
* recording,
|
||||
* networking.
|
||||
|
||||
Architecture must remain prepared for these additions.
|
||||
|
||||
---
|
||||
|
||||
# 8. Success Criteria
|
||||
|
||||
The first implementation phase is successful if:
|
||||
|
||||
* camera preview is smooth and stable,
|
||||
* rendering latency is low,
|
||||
* telemetry data is accurate,
|
||||
* GUI remains responsive,
|
||||
* overlay system works correctly,
|
||||
* architecture supports future frame subscribers.
|
||||
Reference in New Issue
Block a user