Back to Portfolio

Smart Security Camera System Edge AI Solution

Real-time person detection and facial recognition system deployed on edge devices for intelligent security monitoring with 99.2% daytime accuracy and sub-50ms inference on Jetson Nano.

Project Context

Timeline

12 months, Q1 2023 – Q1 2024

Team

2 ML Engineers, 1 Edge Deployment Specialist, 1 Security Domain Consultant

Client

UK-based commercial security integrator serving retail and hospitality

Scope

Pilot: 15 sites → Full rollout: 500+ locations

The client provides integrated security services to national retail chains and hospitality groups. Their existing CCTV infrastructure relied on motion-triggered recording with high false-alarm rates, leading to alert fatigue among monitoring staff and missed genuine incidents. They needed an AI layer that could run on their existing camera hardware with minimal per-site cost.

Project Overview

Developed an edge AI security camera system capable of real-time person detection and optional facial recognition for authorised personnel. The system processes video streams locally on embedded hardware, eliminating the need for cloud connectivity while maintaining high accuracy and low latency. The project was scoped specifically for person detection and face matching — not for crowd counting or general behaviour analytics.

Hardware

NVIDIA Jetson Nano, Raspberry Pi 4, Custom ARM SoC

AI Framework

TensorFlow Lite, OpenVINO, ONNX Runtime

Computer Vision

OpenCV, YOLOv7, FaceNet, DeepSORT

Deployment

Docker, Remote OTA Updates, Edge Computing

The Challenge

A UK-based commercial security integrator serving retail and hospitality needed to upgrade their surveillance infrastructure with AI-powered person detection and face matching while maintaining strict GDPR compliance and reducing bandwidth costs. Key requirements included:

  • Real-time person detection with 95%+ accuracy across day and night conditions
  • Optional facial recognition for authorised personnel at access-controlled areas
  • On-device processing to ensure no biometric data leaves the edge unit
  • Integration with existing VMS (video management systems) and alarm panels
  • Cost-effective deployment across 500+ retail and hospitality locations
  • Remote management and OTA firmware updates for all deployed units

What Had Been Tried Before

The client had previously trialled a cloud-based video analytics service from a major CCTV vendor. While detection accuracy was adequate, the solution required continuous video upload to remote servers — creating unacceptable bandwidth costs (averaging 8 Mbps per camera) and raising GDPR concerns around off-site biometric processing. Latency between detection and alert was over 3 seconds due to round-trip network delays. The per-camera licensing model also made national rollout economically unviable. The client needed an on-premise, per-site-licensed alternative that could run on low-cost hardware.

Our Solution

1. Custom AI Model Development

Developed a lightweight YOLOv7-tiny-based person detection model optimised for edge deployment. The model was trained on a curated dataset combining COCO person annotations with 12,000 frames captured from the client's own camera feeds (covering indoor retail, outdoor car parks, and loading bays). Post-training quantization to INT8 precision reduced model size by 75% while maintaining 99.2% person detection accuracy in daytime conditions.

Why YOLOv7-tiny: We evaluated YOLOv5s, YOLOv7-tiny, and EfficientDet-Lite. YOLOv7-tiny offered the best accuracy-to-latency ratio on Jetson Nano (45ms inference vs. 62ms for YOLOv5s and 85ms for EfficientDet-Lite at comparable mAP). Its architecture also quantises more cleanly to INT8 with minimal accuracy loss compared to EfficientDet's depthwise separable convolutions.

2. Facial Recognition Pipeline

Implemented a FaceNet-based recognition system for optional face matching at access-controlled entry points. The system enrols authorised personnel via a secure admin interface and stores only 128-dimensional encrypted feature vectors — never raw images. It achieves 97.5% true-positive matching in controlled indoor lighting and 93.1% in mixed outdoor conditions.

Why FaceNet over ArcFace: Although ArcFace achieves marginally higher accuracy on benchmark datasets, FaceNet's embedding model is 40% smaller (23MB vs. 39MB) and runs within the Jetson Nano's memory budget alongside the detection model. For the client's use case (matching against a database of fewer than 200 authorised personnel per site), FaceNet's discriminative power is more than sufficient.

3. Edge Optimisation

Optimised the entire pipeline for ARM-based processors using TensorFlow Lite (Raspberry Pi 4) and TensorRT (Jetson Nano). Achieved inference times of 45ms per frame on NVIDIA Jetson Nano and 78ms on Raspberry Pi 4.

Why TensorRT on Jetson, TFLite on Pi: TensorRT leverages the Jetson Nano's 128-core Maxwell GPU for INT8 acceleration — something TFLite cannot exploit. On Raspberry Pi 4 (CPU-only), TFLite's XNNPACK delegate outperforms ONNX Runtime by approximately 15% on ARM Cortex-A72. This dual-runtime approach maximises performance on both hardware tiers without maintaining two separate model architectures.

4. Phased Deployment & Remote Management

Phase 1 pilot covered 15 flagship retail sites over 3 months, validating detection accuracy and false-alarm rates in real-world conditions. Phase 2 expanded to 120 sites in the South East, introducing the remote OTA update pipeline. Phase 3 completed the national rollout to 500+ locations with a fully automated remote deployment pipeline — each new site requires only a 2-hour on-site setup (mounting the edge unit, connecting to the existing camera feed, and running the automated calibration script).

5. GDPR Compliance & Privacy Architecture

All facial recognition processing occurs on-device — no biometric data leaves the edge unit. The system stores only encrypted feature vectors (not images) with automatic 30-day expiry. A privacy impact assessment was completed with the client's Data Protection Officer, and signage complying with ICO guidance is provided for all installation sites. The face matching feature is entirely optional and disabled by default — sites that do not require it run person detection only, with no biometric processing whatsoever.

Technical Implementation

Architecture

The system consists of three main components:

  • Capture Module: Handles video input from IP cameras (RTSP) and USB devices, with automatic resolution and frame-rate negotiation
  • AI Processing Engine: Runs detection (YOLOv7-tiny) and optional recognition (FaceNet) models with DeepSORT multi-object tracking
  • Alert System: Manages notifications via the client's existing VMS and alarm panels, with configurable alert thresholds per zone

Model Optimisation

Applied optimisation techniques to achieve real-time performance on low-cost hardware:

  • Structured pruning to remove 30% of convolutional filters with less than 1% accuracy impact
  • Post-training quantization from FP32 to INT8 (TensorRT) and dynamic-range quantization (TFLite)
  • Knowledge distillation from a full YOLOv7 teacher model to the YOLOv7-tiny student
  • TensorRT layer fusion and kernel auto-tuning for Jetson Nano's Maxwell GPU

Limitations & Edge Cases

The system has known performance boundaries that were documented and communicated to the client during the pilot phase:

  • Nighttime IR mode: Person detection accuracy drops to 96.8% (vs. 99.2% in daylight) due to reduced contrast in infrared imagery. This remains well above the client's 95% threshold.
  • Heavy rain: Effective detection range reduces from 30m to approximately 18m in heavy rainfall, as water droplets on the lens housing scatter IR illumination.
  • Crowded scenes: When more than 15 people are simultaneously visible in a single frame, processing latency increases to approximately 45ms on Jetson Nano (from the typical 30ms) but detection accuracy is maintained.
  • Maximum tracked individuals: The DeepSORT tracker supports a maximum of 25 simultaneously tracked individuals per camera. Beyond this, the oldest tracks are dropped.
  • Scope limitation: The system is not designed for crowd counting or behaviour analytics — it focuses exclusively on person detection and optional face matching.

Results & Impact

99.2%

Person Detection Accuracy (daytime)

45ms → 30ms

Inference Time (Jetson Nano, after TensorRT optimisation)

75%

Bandwidth Reduction vs. Cloud Upload

500+

Deployed Locations (national rollout)

Before & After Comparison

Before

Motion-triggered recording: ~60% true-positive rate, 40+ false alarms/site/day

After

AI person detection: 99.2% true-positive rate, ~6 false alarms/site/day

Before

Cloud video analytics: 3+ second alert latency, 8 Mbps/camera bandwidth

After

Edge processing: <100ms alert latency, metadata-only upload (<50 Kbps)

Business Impact

  • Reduced false alarms by 85% (from ~40/site/day to ~6/site/day) through AI-based person detection replacing simple motion triggers
  • Decreased bandwidth costs by 75% by processing video on-device and transmitting only metadata and alert thumbnails
  • Improved alert-to-response time from over 3 seconds (cloud) to under 100ms (edge)
  • Achieved full GDPR compliance with on-device biometric processing and ICO-compliant signage
  • National rollout completed across 500+ sites with 2-hour average per-site installation time

Ongoing & Next Steps

The system is in active operation across all 500+ sites with ongoing support and iterative improvements:

  • In progress: Multi-camera person re-identification across adjacent cameras within a single site (e.g., tracking a person from car park to entrance)
  • Scheduled Q2 2025: Migration from YOLOv7-tiny to YOLOv8n for improved accuracy on the same hardware budget
  • Under evaluation: Integration with the client's access control system to correlate face match events with door entry logs
  • Ongoing: Monthly model performance reviews using a sample of anonymised detection logs to identify drift and edge cases
  • Exploring: Thermal camera integration for improved nighttime detection performance in outdoor environments