Real-time person detection and facial recognition system deployed on edge devices for intelligent security monitoring with 99.2% daytime accuracy and sub-50ms inference on Jetson Nano.
12 months, Q1 2023 – Q1 2024
2 ML Engineers, 1 Edge Deployment Specialist, 1 Security Domain Consultant
UK-based commercial security integrator serving retail and hospitality
Pilot: 15 sites → Full rollout: 500+ locations
The client provides integrated security services to national retail chains and hospitality groups. Their existing CCTV infrastructure relied on motion-triggered recording with high false-alarm rates, leading to alert fatigue among monitoring staff and missed genuine incidents. They needed an AI layer that could run on their existing camera hardware with minimal per-site cost.
Developed an edge AI security camera system capable of real-time person detection and optional facial recognition for authorised personnel. The system processes video streams locally on embedded hardware, eliminating the need for cloud connectivity while maintaining high accuracy and low latency. The project was scoped specifically for person detection and face matching — not for crowd counting or general behaviour analytics.
NVIDIA Jetson Nano, Raspberry Pi 4, Custom ARM SoC
TensorFlow Lite, OpenVINO, ONNX Runtime
OpenCV, YOLOv7, FaceNet, DeepSORT
Docker, Remote OTA Updates, Edge Computing
A UK-based commercial security integrator serving retail and hospitality needed to upgrade their surveillance infrastructure with AI-powered person detection and face matching while maintaining strict GDPR compliance and reducing bandwidth costs. Key requirements included:
The client had previously trialled a cloud-based video analytics service from a major CCTV vendor. While detection accuracy was adequate, the solution required continuous video upload to remote servers — creating unacceptable bandwidth costs (averaging 8 Mbps per camera) and raising GDPR concerns around off-site biometric processing. Latency between detection and alert was over 3 seconds due to round-trip network delays. The per-camera licensing model also made national rollout economically unviable. The client needed an on-premise, per-site-licensed alternative that could run on low-cost hardware.
Developed a lightweight YOLOv7-tiny-based person detection model optimised for edge deployment. The model was trained on a curated dataset combining COCO person annotations with 12,000 frames captured from the client's own camera feeds (covering indoor retail, outdoor car parks, and loading bays). Post-training quantization to INT8 precision reduced model size by 75% while maintaining 99.2% person detection accuracy in daytime conditions.
Why YOLOv7-tiny: We evaluated YOLOv5s, YOLOv7-tiny, and EfficientDet-Lite. YOLOv7-tiny offered the best accuracy-to-latency ratio on Jetson Nano (45ms inference vs. 62ms for YOLOv5s and 85ms for EfficientDet-Lite at comparable mAP). Its architecture also quantises more cleanly to INT8 with minimal accuracy loss compared to EfficientDet's depthwise separable convolutions.
Implemented a FaceNet-based recognition system for optional face matching at access-controlled entry points. The system enrols authorised personnel via a secure admin interface and stores only 128-dimensional encrypted feature vectors — never raw images. It achieves 97.5% true-positive matching in controlled indoor lighting and 93.1% in mixed outdoor conditions.
Why FaceNet over ArcFace: Although ArcFace achieves marginally higher accuracy on benchmark datasets, FaceNet's embedding model is 40% smaller (23MB vs. 39MB) and runs within the Jetson Nano's memory budget alongside the detection model. For the client's use case (matching against a database of fewer than 200 authorised personnel per site), FaceNet's discriminative power is more than sufficient.
Optimised the entire pipeline for ARM-based processors using TensorFlow Lite (Raspberry Pi 4) and TensorRT (Jetson Nano). Achieved inference times of 45ms per frame on NVIDIA Jetson Nano and 78ms on Raspberry Pi 4.
Why TensorRT on Jetson, TFLite on Pi: TensorRT leverages the Jetson Nano's 128-core Maxwell GPU for INT8 acceleration — something TFLite cannot exploit. On Raspberry Pi 4 (CPU-only), TFLite's XNNPACK delegate outperforms ONNX Runtime by approximately 15% on ARM Cortex-A72. This dual-runtime approach maximises performance on both hardware tiers without maintaining two separate model architectures.
Phase 1 pilot covered 15 flagship retail sites over 3 months, validating detection accuracy and false-alarm rates in real-world conditions. Phase 2 expanded to 120 sites in the South East, introducing the remote OTA update pipeline. Phase 3 completed the national rollout to 500+ locations with a fully automated remote deployment pipeline — each new site requires only a 2-hour on-site setup (mounting the edge unit, connecting to the existing camera feed, and running the automated calibration script).
All facial recognition processing occurs on-device — no biometric data leaves the edge unit. The system stores only encrypted feature vectors (not images) with automatic 30-day expiry. A privacy impact assessment was completed with the client's Data Protection Officer, and signage complying with ICO guidance is provided for all installation sites. The face matching feature is entirely optional and disabled by default — sites that do not require it run person detection only, with no biometric processing whatsoever.
The system consists of three main components:
Applied optimisation techniques to achieve real-time performance on low-cost hardware:
The system has known performance boundaries that were documented and communicated to the client during the pilot phase:
Person Detection Accuracy (daytime)
Inference Time (Jetson Nano, after TensorRT optimisation)
Bandwidth Reduction vs. Cloud Upload
Deployed Locations (national rollout)
Motion-triggered recording: ~60% true-positive rate, 40+ false alarms/site/day
AI person detection: 99.2% true-positive rate, ~6 false alarms/site/day
Cloud video analytics: 3+ second alert latency, 8 Mbps/camera bandwidth
Edge processing: <100ms alert latency, metadata-only upload (<50 Kbps)
The system is in active operation across all 500+ sites with ongoing support and iterative improvements: