init commit
This commit is contained in:
32
.gitignore
vendored
Normal file
32
.gitignore
vendored
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
.pytest_cache/
|
||||||
|
.ruff_cache/
|
||||||
|
.mypy_cache/
|
||||||
|
|
||||||
|
# Virtual environments
|
||||||
|
.venv/
|
||||||
|
.venv-*/
|
||||||
|
venv/
|
||||||
|
env/
|
||||||
|
|
||||||
|
# Local/runtime data
|
||||||
|
captures/photos/*
|
||||||
|
captures/videos/*
|
||||||
|
!captures/photos/.gitkeep
|
||||||
|
!captures/videos/.gitkeep
|
||||||
|
|
||||||
|
!models/.gitkeep
|
||||||
|
|
||||||
|
# OS/editor
|
||||||
|
.DS_Store
|
||||||
|
.idea/
|
||||||
|
.vscode/
|
||||||
|
|
||||||
|
# Ultralytics/runtime caches
|
||||||
|
runs/
|
||||||
|
*.onnx
|
||||||
|
*.engine
|
||||||
|
*.log
|
||||||
318
notes/01-mvp-preview.md
Normal file
318
notes/01-mvp-preview.md
Normal file
@@ -0,0 +1,318 @@
|
|||||||
|
# PRD — Realtime Camera Preview Application (PySide6)
|
||||||
|
|
||||||
|
## 1. Overview
|
||||||
|
|
||||||
|
Desktop application written in Python using PySide6 for realtime camera preview, performance analysis and future computer vision integration.
|
||||||
|
|
||||||
|
The current phase focuses exclusively on:
|
||||||
|
|
||||||
|
* camera communication,
|
||||||
|
* frame acquisition,
|
||||||
|
* rendering performance,
|
||||||
|
* telemetry and diagnostics.
|
||||||
|
|
||||||
|
AI processing (YOLO/OCR) is intentionally excluded from the first implementation phase to isolate and optimize the video pipeline before introducing computational workloads.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 2. Goals
|
||||||
|
|
||||||
|
## Primary Goal
|
||||||
|
|
||||||
|
Create a low-latency, modular and extensible realtime video application capable of:
|
||||||
|
|
||||||
|
* stable camera preview,
|
||||||
|
* smooth rendering,
|
||||||
|
* accurate performance measurements,
|
||||||
|
* future AI pipeline integration.
|
||||||
|
|
||||||
|
## Secondary Goals
|
||||||
|
|
||||||
|
* Understand bottlenecks in the video pipeline.
|
||||||
|
* Establish baseline performance metrics.
|
||||||
|
* Validate architecture before adding AI workloads.
|
||||||
|
* Create reusable infrastructure for future CV modules.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 3. Key Architectural Decisions
|
||||||
|
|
||||||
|
## 3.1 Use PySide6 + QtMultimedia Instead of OpenCV VideoCapture
|
||||||
|
|
||||||
|
### Decision
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
* QCamera
|
||||||
|
* QMediaCaptureSession
|
||||||
|
* QVideoSink
|
||||||
|
* QVideoWidget
|
||||||
|
|
||||||
|
instead of OpenCV as the primary camera/rendering backend.
|
||||||
|
|
||||||
|
### Reasoning
|
||||||
|
|
||||||
|
QtMultimedia uses native multimedia frameworks:
|
||||||
|
|
||||||
|
* AVFoundation on macOS,
|
||||||
|
* native GPU accelerated rendering,
|
||||||
|
* lower latency preview pipeline.
|
||||||
|
|
||||||
|
Benefits:
|
||||||
|
|
||||||
|
* fewer frame copies,
|
||||||
|
* smoother rendering,
|
||||||
|
* better realtime behavior,
|
||||||
|
* better integration with Qt event loop,
|
||||||
|
* improved maintainability for GUI applications.
|
||||||
|
|
||||||
|
OpenCV remains optional for future image processing tasks but should not own the rendering pipeline.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3.2 Separate Video Rendering From Processing
|
||||||
|
|
||||||
|
### Decision
|
||||||
|
|
||||||
|
Video preview must be independent from future AI processing.
|
||||||
|
|
||||||
|
### Reasoning
|
||||||
|
|
||||||
|
Realtime UX is more important than processing every frame.
|
||||||
|
|
||||||
|
The application must:
|
||||||
|
|
||||||
|
* keep preview responsive,
|
||||||
|
* avoid GUI blocking,
|
||||||
|
* allow frame dropping,
|
||||||
|
* support asynchronous processing later.
|
||||||
|
|
||||||
|
Future AI modules must never block:
|
||||||
|
|
||||||
|
* camera acquisition,
|
||||||
|
* rendering,
|
||||||
|
* UI thread.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3.3 Layer-Based Rendering Architecture
|
||||||
|
|
||||||
|
### Decision
|
||||||
|
|
||||||
|
Bounding boxes and overlays must be rendered on separate layers instead of modifying video frames.
|
||||||
|
|
||||||
|
### Reasoning
|
||||||
|
|
||||||
|
Drawing directly on video frames:
|
||||||
|
|
||||||
|
* increases CPU usage,
|
||||||
|
* introduces additional memory copies,
|
||||||
|
* reduces rendering performance.
|
||||||
|
|
||||||
|
Separate overlay layers allow:
|
||||||
|
|
||||||
|
* smooth preview,
|
||||||
|
* independent overlay refresh rates,
|
||||||
|
* future bbox rendering,
|
||||||
|
* debug overlays,
|
||||||
|
* annotations,
|
||||||
|
* interactive tools.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3.4 Modular Application Design
|
||||||
|
|
||||||
|
### Decision
|
||||||
|
|
||||||
|
Application must be modular and dependency-injection friendly.
|
||||||
|
|
||||||
|
### Reasoning
|
||||||
|
|
||||||
|
Future AI pipeline will introduce:
|
||||||
|
|
||||||
|
* multiprocessing,
|
||||||
|
* frame subscribers,
|
||||||
|
* OCR,
|
||||||
|
* YOLO,
|
||||||
|
* telemetry,
|
||||||
|
* external integrations.
|
||||||
|
|
||||||
|
Loose coupling improves:
|
||||||
|
|
||||||
|
* testability,
|
||||||
|
* maintainability,
|
||||||
|
* scalability,
|
||||||
|
* replacement of components.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 4. Functional Requirements
|
||||||
|
|
||||||
|
## 4.1 Camera Preview
|
||||||
|
|
||||||
|
Application must:
|
||||||
|
|
||||||
|
* display realtime camera preview,
|
||||||
|
* support camera switching,
|
||||||
|
* support resolution selection,
|
||||||
|
* support FPS selection,
|
||||||
|
* support reconnect/restart.
|
||||||
|
|
||||||
|
Preview should prioritize:
|
||||||
|
|
||||||
|
* low latency,
|
||||||
|
* smooth rendering,
|
||||||
|
* GUI responsiveness.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4.2 Performance Monitoring
|
||||||
|
|
||||||
|
Application must include a telemetry/performance module.
|
||||||
|
|
||||||
|
Metrics should include:
|
||||||
|
|
||||||
|
* realtime FPS,
|
||||||
|
* frame time,
|
||||||
|
* frame acquisition time,
|
||||||
|
* rendering time,
|
||||||
|
* dropped frames,
|
||||||
|
* idle time,
|
||||||
|
* queue latency,
|
||||||
|
* CPU usage,
|
||||||
|
* optional memory usage.
|
||||||
|
|
||||||
|
Metrics should update in realtime.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4.3 Overlay System
|
||||||
|
|
||||||
|
Application must support transparent overlays rendered above video.
|
||||||
|
|
||||||
|
Initial use:
|
||||||
|
|
||||||
|
* performance metrics display.
|
||||||
|
|
||||||
|
Future use:
|
||||||
|
|
||||||
|
* bounding boxes,
|
||||||
|
* object labels,
|
||||||
|
* debug visualizations,
|
||||||
|
* OCR results.
|
||||||
|
|
||||||
|
Overlay system must not modify original frames.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4.4 GUI
|
||||||
|
|
||||||
|
GUI must remain intentionally minimal.
|
||||||
|
|
||||||
|
### Layout
|
||||||
|
|
||||||
|
Main window:
|
||||||
|
|
||||||
|
* video preview only.
|
||||||
|
|
||||||
|
Top menu:
|
||||||
|
|
||||||
|
* camera selection,
|
||||||
|
* resolution selection,
|
||||||
|
* FPS selection,
|
||||||
|
* debug options,
|
||||||
|
* telemetry options.
|
||||||
|
|
||||||
|
Overlay:
|
||||||
|
|
||||||
|
* semi-transparent performance box.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 5. Non-Functional Requirements
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
Application should:
|
||||||
|
|
||||||
|
* minimize frame copies,
|
||||||
|
* avoid unnecessary color conversions,
|
||||||
|
* avoid blocking operations in GUI thread,
|
||||||
|
* support realtime preview at target camera FPS.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Extensibility
|
||||||
|
|
||||||
|
Architecture must support future additions:
|
||||||
|
|
||||||
|
* YOLO,
|
||||||
|
* OCR,
|
||||||
|
* multiprocessing,
|
||||||
|
* recording,
|
||||||
|
* snapshots,
|
||||||
|
* streaming,
|
||||||
|
* remote sinks.
|
||||||
|
|
||||||
|
Without major redesign.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Maintainability
|
||||||
|
|
||||||
|
Codebase should:
|
||||||
|
|
||||||
|
* use clear module boundaries,
|
||||||
|
* define explicit interfaces,
|
||||||
|
* avoid tightly coupled UI/business logic,
|
||||||
|
* support isolated testing.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 6. Proposed High-Level Architecture
|
||||||
|
|
||||||
|
```text
|
||||||
|
Camera Service
|
||||||
|
↓
|
||||||
|
Frame Dispatcher
|
||||||
|
├── Video Renderer
|
||||||
|
├── Telemetry Collector
|
||||||
|
├── Overlay Manager
|
||||||
|
└── Future AI Subscribers
|
||||||
|
|
||||||
|
Video Renderer
|
||||||
|
↓
|
||||||
|
QVideoWidget
|
||||||
|
|
||||||
|
Overlay Layer
|
||||||
|
↓
|
||||||
|
Metrics / Future BBoxes
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 7. Future Expansion (Out of Scope)
|
||||||
|
|
||||||
|
The following features are intentionally excluded from current implementation:
|
||||||
|
|
||||||
|
* YOLO inference,
|
||||||
|
* OCR,
|
||||||
|
* multiprocessing workers,
|
||||||
|
* tracking,
|
||||||
|
* recording,
|
||||||
|
* networking.
|
||||||
|
|
||||||
|
Architecture must remain prepared for these additions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 8. Success Criteria
|
||||||
|
|
||||||
|
The first implementation phase is successful if:
|
||||||
|
|
||||||
|
* camera preview is smooth and stable,
|
||||||
|
* rendering latency is low,
|
||||||
|
* telemetry data is accurate,
|
||||||
|
* GUI remains responsive,
|
||||||
|
* overlay system works correctly,
|
||||||
|
* architecture supports future frame subscribers.
|
||||||
Reference in New Issue
Block a user