# PRD — Realtime Camera Preview Application (PySide6) ## 1. Overview Desktop application written in Python using PySide6 for realtime camera preview, performance analysis and future computer vision integration. The current phase focuses exclusively on: * camera communication, * frame acquisition, * rendering performance, * telemetry and diagnostics. AI processing (YOLO/OCR) is intentionally excluded from the first implementation phase to isolate and optimize the video pipeline before introducing computational workloads. --- # 2. Goals ## Primary Goal Create a low-latency, modular and extensible realtime video application capable of: * stable camera preview, * smooth rendering, * accurate performance measurements, * future AI pipeline integration. ## Secondary Goals * Understand bottlenecks in the video pipeline. * Establish baseline performance metrics. * Validate architecture before adding AI workloads. * Create reusable infrastructure for future CV modules. --- # 3. Key Architectural Decisions ## 3.1 Use PySide6 + QtMultimedia Instead of OpenCV VideoCapture ### Decision Use: * QCamera * QMediaCaptureSession * QVideoSink * QVideoWidget instead of OpenCV as the primary camera/rendering backend. ### Reasoning QtMultimedia uses native multimedia frameworks: * AVFoundation on macOS, * native GPU accelerated rendering, * lower latency preview pipeline. Benefits: * fewer frame copies, * smoother rendering, * better realtime behavior, * better integration with Qt event loop, * improved maintainability for GUI applications. OpenCV remains optional for future image processing tasks but should not own the rendering pipeline. --- ## 3.2 Separate Video Rendering From Processing ### Decision Video preview must be independent from future AI processing. ### Reasoning Realtime UX is more important than processing every frame. The application must: * keep preview responsive, * avoid GUI blocking, * allow frame dropping, * support asynchronous processing later. Future AI modules must never block: * camera acquisition, * rendering, * UI thread. --- ## 3.3 Layer-Based Rendering Architecture ### Decision Bounding boxes and overlays must be rendered on separate layers instead of modifying video frames. ### Reasoning Drawing directly on video frames: * increases CPU usage, * introduces additional memory copies, * reduces rendering performance. Separate overlay layers allow: * smooth preview, * independent overlay refresh rates, * future bbox rendering, * debug overlays, * annotations, * interactive tools. --- ## 3.4 Modular Application Design ### Decision Application must be modular and dependency-injection friendly. ### Reasoning Future AI pipeline will introduce: * multiprocessing, * frame subscribers, * OCR, * YOLO, * telemetry, * external integrations. Loose coupling improves: * testability, * maintainability, * scalability, * replacement of components. --- # 4. Functional Requirements ## 4.1 Camera Preview Application must: * display realtime camera preview, * support camera switching, * support resolution selection, * support FPS selection, * support reconnect/restart. Preview should prioritize: * low latency, * smooth rendering, * GUI responsiveness. --- ## 4.2 Performance Monitoring Application must include a telemetry/performance module. Metrics should include: * realtime FPS, * frame time, * frame acquisition time, * rendering time, * dropped frames, * idle time, * queue latency, * CPU usage, * optional memory usage. Metrics should update in realtime. --- ## 4.3 Overlay System Application must support transparent overlays rendered above video. Initial use: * performance metrics display. Future use: * bounding boxes, * object labels, * debug visualizations, * OCR results. Overlay system must not modify original frames. --- ## 4.4 GUI GUI must remain intentionally minimal. ### Layout Main window: * video preview only. Top menu: * camera selection, * resolution selection, * FPS selection, * debug options, * telemetry options. Overlay: * semi-transparent performance box. --- # 5. Non-Functional Requirements ## Performance Application should: * minimize frame copies, * avoid unnecessary color conversions, * avoid blocking operations in GUI thread, * support realtime preview at target camera FPS. --- ## Extensibility Architecture must support future additions: * YOLO, * OCR, * multiprocessing, * recording, * snapshots, * streaming, * remote sinks. * play video files Without major redesign. --- ## Maintainability Codebase should: * use clear module boundaries, * define explicit interfaces, * avoid tightly coupled UI/business logic, * support isolated testing. --- # 6. Proposed High-Level Architecture ```text Camera Service ↓ Frame Dispatcher ├── Video Renderer ├── Telemetry Collector ├── Overlay Manager └── Future AI Subscribers Video Renderer ↓ QVideoWidget Overlay Layer ↓ Metrics / Future BBoxes ``` --- # 7. Future Expansion (Out of Scope) The following features are intentionally excluded from current implementation: * YOLO inference, * OCR, * multiprocessing workers, * tracking, * recording, * networking. Architecture must remain prepared for these additions. --- # 8. Success Criteria The first implementation phase is successful if: * camera preview is smooth and stable, * rendering latency is low, * telemetry data is accurate, * GUI remains responsive, * overlay system works correctly, * architecture supports future frame subscribers.