commit be85d7ca316307883f845bc68f2e5cb04226ce53 Author: bartool Date: Mon May 11 19:05:24 2026 +0200 init commit diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..4d01637 --- /dev/null +++ b/.gitignore @@ -0,0 +1,32 @@ +# Python +__pycache__/ +*.py[cod] +*$py.class +.pytest_cache/ +.ruff_cache/ +.mypy_cache/ + +# Virtual environments +.venv/ +.venv-*/ +venv/ +env/ + +# Local/runtime data +captures/photos/* +captures/videos/* +!captures/photos/.gitkeep +!captures/videos/.gitkeep + +!models/.gitkeep + +# OS/editor +.DS_Store +.idea/ +.vscode/ + +# Ultralytics/runtime caches +runs/ +*.onnx +*.engine +*.log \ No newline at end of file diff --git a/notes/01-mvp-preview.md b/notes/01-mvp-preview.md new file mode 100644 index 0000000..a5cee3d --- /dev/null +++ b/notes/01-mvp-preview.md @@ -0,0 +1,318 @@ +# PRD — Realtime Camera Preview Application (PySide6) + +## 1. Overview + +Desktop application written in Python using PySide6 for realtime camera preview, performance analysis and future computer vision integration. + +The current phase focuses exclusively on: + +* camera communication, +* frame acquisition, +* rendering performance, +* telemetry and diagnostics. + +AI processing (YOLO/OCR) is intentionally excluded from the first implementation phase to isolate and optimize the video pipeline before introducing computational workloads. + +--- + +# 2. Goals + +## Primary Goal + +Create a low-latency, modular and extensible realtime video application capable of: + +* stable camera preview, +* smooth rendering, +* accurate performance measurements, +* future AI pipeline integration. + +## Secondary Goals + +* Understand bottlenecks in the video pipeline. +* Establish baseline performance metrics. +* Validate architecture before adding AI workloads. +* Create reusable infrastructure for future CV modules. + +--- + +# 3. Key Architectural Decisions + +## 3.1 Use PySide6 + QtMultimedia Instead of OpenCV VideoCapture + +### Decision + +Use: + +* QCamera +* QMediaCaptureSession +* QVideoSink +* QVideoWidget + +instead of OpenCV as the primary camera/rendering backend. + +### Reasoning + +QtMultimedia uses native multimedia frameworks: + +* AVFoundation on macOS, +* native GPU accelerated rendering, +* lower latency preview pipeline. + +Benefits: + +* fewer frame copies, +* smoother rendering, +* better realtime behavior, +* better integration with Qt event loop, +* improved maintainability for GUI applications. + +OpenCV remains optional for future image processing tasks but should not own the rendering pipeline. + +--- + +## 3.2 Separate Video Rendering From Processing + +### Decision + +Video preview must be independent from future AI processing. + +### Reasoning + +Realtime UX is more important than processing every frame. + +The application must: + +* keep preview responsive, +* avoid GUI blocking, +* allow frame dropping, +* support asynchronous processing later. + +Future AI modules must never block: + +* camera acquisition, +* rendering, +* UI thread. + +--- + +## 3.3 Layer-Based Rendering Architecture + +### Decision + +Bounding boxes and overlays must be rendered on separate layers instead of modifying video frames. + +### Reasoning + +Drawing directly on video frames: + +* increases CPU usage, +* introduces additional memory copies, +* reduces rendering performance. + +Separate overlay layers allow: + +* smooth preview, +* independent overlay refresh rates, +* future bbox rendering, +* debug overlays, +* annotations, +* interactive tools. + +--- + +## 3.4 Modular Application Design + +### Decision + +Application must be modular and dependency-injection friendly. + +### Reasoning + +Future AI pipeline will introduce: + +* multiprocessing, +* frame subscribers, +* OCR, +* YOLO, +* telemetry, +* external integrations. + +Loose coupling improves: + +* testability, +* maintainability, +* scalability, +* replacement of components. + +--- + +# 4. Functional Requirements + +## 4.1 Camera Preview + +Application must: + +* display realtime camera preview, +* support camera switching, +* support resolution selection, +* support FPS selection, +* support reconnect/restart. + +Preview should prioritize: + +* low latency, +* smooth rendering, +* GUI responsiveness. + +--- + +## 4.2 Performance Monitoring + +Application must include a telemetry/performance module. + +Metrics should include: + +* realtime FPS, +* frame time, +* frame acquisition time, +* rendering time, +* dropped frames, +* idle time, +* queue latency, +* CPU usage, +* optional memory usage. + +Metrics should update in realtime. + +--- + +## 4.3 Overlay System + +Application must support transparent overlays rendered above video. + +Initial use: + +* performance metrics display. + +Future use: + +* bounding boxes, +* object labels, +* debug visualizations, +* OCR results. + +Overlay system must not modify original frames. + +--- + +## 4.4 GUI + +GUI must remain intentionally minimal. + +### Layout + +Main window: + +* video preview only. + +Top menu: + +* camera selection, +* resolution selection, +* FPS selection, +* debug options, +* telemetry options. + +Overlay: + +* semi-transparent performance box. + +--- + +# 5. Non-Functional Requirements + +## Performance + +Application should: + +* minimize frame copies, +* avoid unnecessary color conversions, +* avoid blocking operations in GUI thread, +* support realtime preview at target camera FPS. + +--- + +## Extensibility + +Architecture must support future additions: + +* YOLO, +* OCR, +* multiprocessing, +* recording, +* snapshots, +* streaming, +* remote sinks. + +Without major redesign. + +--- + +## Maintainability + +Codebase should: + +* use clear module boundaries, +* define explicit interfaces, +* avoid tightly coupled UI/business logic, +* support isolated testing. + +--- + +# 6. Proposed High-Level Architecture + +```text +Camera Service + ↓ +Frame Dispatcher + ├── Video Renderer + ├── Telemetry Collector + ├── Overlay Manager + └── Future AI Subscribers + +Video Renderer + ↓ +QVideoWidget + +Overlay Layer + ↓ +Metrics / Future BBoxes +``` + +--- + +# 7. Future Expansion (Out of Scope) + +The following features are intentionally excluded from current implementation: + +* YOLO inference, +* OCR, +* multiprocessing workers, +* tracking, +* recording, +* networking. + +Architecture must remain prepared for these additions. + +--- + +# 8. Success Criteria + +The first implementation phase is successful if: + +* camera preview is smooth and stable, +* rendering latency is low, +* telemetry data is accurate, +* GUI remains responsive, +* overlay system works correctly, +* architecture supports future frame subscribers.