DIY facial recognition and identification for your home

Ruwan Rajapakse
Nov 16, 2025
16 min read

As part of a broader AI use case I’ve been exploring, I recently did a bit of R&D on facial recognition using low-cost components and lightweight methods. I’ll admit I didn’t have much confidence in “vibe coding” at the outset, thanks to a few negative experiences a couple of years ago when the tools were far less mature. But I was genuinely impressed by what I managed to build in just a couple of days last week, sitting in front of Claude and ChatGPT.

I powered through hardware quirks, library installations, network glitches and debug sessions to end up with a stable, robust smart camera that can identify and track who visits my cubicle, including multiple visitors at once. Its true I’m a former, somewhat mediocre programmer rather than a complete novice, but I haven’t built anything this involved in a longer while than I care to admit.

The solution uses an ESP32-CAM and two small, purpose-built programs: a camera web server running on the device itself (written in Arduino Wiring/C++), and a facial recognition server running on a laptop or Raspberry Pi (written in Python). This sort of DIY facial-recognition setup has been around for more than five years, but from what I’ve seen, the underlying technologies and libraries have matured significantly, making the results far more stable and reliable. It shows promise for utilization in my broader use case.

If you’re interested in the code and configuration steps, they’re included further down.

What I really want to highlight here, though, is just how delightful the overall experience was. I resolved more than two dozen issues, ranging from library incompatibilities to network-routing headaches to tuning the system for my specific deployment environment, and still produced a robust solution in only two days. ChatGPT cleared many of the simpler hurdles, such as network configuration, but Claude seems to have the edge when it comes to deeper programming and troubleshooting. I’d recommend it to anyone looking to assemble a quick, functional IOT/AI solution from popular components.

Here is the Arduino C++ program, which is a modification of the CameraWebServer program found in the examples coming with the ESP32 Wrover Module (File > Examples > ESP32 > Camera > CameraWebServer). This is the first tab of the modified CameraWebServer codebase.

#include "esp_camera.h"
#include <WiFi.h>
#include <ESPmDNS.h>
#include "esp_wifi.h"  // ← ADD THIS LINE for esp_wifi_set_ps
#include "board_config.h"

// ===========================
// WiFi credentials
// ===========================
const char* ssid = "network name";
const char* password = "network password";

// No static IP needed! Router assigns 192.168.1.100 via DHCP reservation

bool faceDetectionEnabled = false;
bool faceRecognitionEnabled = false;
camera_fb_t* fb_for_detection = nullptr;

void startCameraServer();

void setup() {
  Serial.begin(115200);
  Serial.setDebugOutput(true);
  Serial.println("\n=================================");
  Serial.println("UY Scuti ESP32-CAM Starting...");
  Serial.println("=================================");

  // ===========================
  // Camera Configuration
  // ===========================
  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sccb_sda = SIOD_GPIO_NUM;
  config.pin_sccb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;

  config.pixel_format = PIXFORMAT_JPEG;
  config.frame_size = FRAMESIZE_VGA;
  config.fb_location = CAMERA_FB_IN_PSRAM;
  config.jpeg_quality = 12;
  config.grab_mode = CAMERA_GRAB_LATEST;

  if (psramFound()) {
    config.fb_count = 2;
  } else {
    config.fb_count = 1;
  }

#if defined(CAMERA_MODEL_ESP_EYE)
  pinMode(13, INPUT_PULLUP);
  pinMode(14, INPUT_PULLUP);
#endif

  Serial.println("\n=== Camera Initialization ===");
  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("❌ Camera init failed: 0x%x\n", err);
    delay(5000);
    ESP.restart();
    return;
  }
  Serial.println("✅ Camera initialized");

  sensor_t *s = esp_camera_sensor_get();
  if (s->id.PID == OV3660_PID) {
    s->set_vflip(s, 1);
    s->set_brightness(s, 1);
    s->set_saturation(s, -2);
  }
  s->set_framesize(s, FRAMESIZE_VGA);

#if defined(CAMERA_MODEL_M5STACK_WIDE) || defined(CAMERA_MODEL_M5STACK_ESP32CAM)
  s->set_vflip(s, 1);
  s->set_hmirror(s, 1);
#endif

#if defined(CAMERA_MODEL_ESP32S3_EYE)
  s->set_vflip(s, 1);
#endif

  Serial.printf("PSRAM: %s\n", psramFound() ? "✅ Available" : "❌ Not found");

  // ===========================
  // WiFi Connection (DHCP)
  // ===========================
  Serial.println("\n=== WiFi Connection ===");
  WiFi.mode(WIFI_STA);
  WiFi.setSleep(false);
  
  // Disable WiFi power saving for stable streaming
  esp_wifi_set_ps(WIFI_PS_NONE);
  
  WiFi.begin(ssid, password);
  
  Serial.print("Connecting");
  int attempts = 0;
  while (WiFi.status() != WL_CONNECTED && attempts < 30) {
    delay(500);
    Serial.print(".");
    attempts++;
  }
  
  if (WiFi.status() != WL_CONNECTED) {
    Serial.println("\n❌ WiFi failed. Restarting...");
    delay(5000);
    ESP.restart();
    return;
  }
  
  Serial.println("\n✅ WiFi connected!");
  Serial.print("MAC Address: ");
  Serial.println(WiFi.macAddress());
  Serial.print("IP Address:  ");
  Serial.println(WiFi.localIP());
  Serial.print("Signal:      ");
  Serial.print(WiFi.RSSI());
  Serial.println(" dBm");

  // ===========================
  // mDNS Setup
  // ===========================
  Serial.println("\n=== mDNS Setup ===");
  delay(500);
  
  if (MDNS.begin("uyscuti")) {
    Serial.println("✅ mDNS: http://uyscuti.local");
    MDNS.addService("http", "tcp", 80);
  } else {
    Serial.println("⚠️  mDNS failed");
  }

  // ===========================
  // Start Camera Server
  // ===========================
  Serial.println("\n=== Starting Camera Server ===");
  startCameraServer();
  Serial.println("✅ Server started");

  // ===========================
  // Connection Summary
  // ===========================
  Serial.println("\n=================================");
  Serial.println("✅ CAMERA READY!");
  Serial.println("=================================");
  Serial.println("Access via:");
  Serial.print("  • http://");
  Serial.println(WiFi.localIP());
  Serial.println("  • http://uyscuti.local");
  Serial.print("  • Stream: http://");
  Serial.print(WiFi.localIP());
  Serial.println(":81/stream");
  Serial.println("=================================\n");
}

void loop() {
  static unsigned long lastCheck = 0;
  
  if (millis() - lastCheck > 10000) {
    lastCheck = millis();
    
    if (WiFi.status() != WL_CONNECTED) {
      Serial.println("⚠️  WiFi lost. Reconnecting...");
      WiFi.reconnect();
    } else {
      Serial.printf("✓ WiFi OK | RSSI: %d dBm | IP: %s\n", 
                    WiFi.RSSI(), WiFi.localIP().toString().c_str());
    }
  }
  
  delay(100);
}

void clearFaceOverlay(camera_fb_t* fb) {
  if (!fb) return;
  uint16_t* pixels = (uint16_t*)fb->buf;
  if (!pixels) return;
  for (size_t i = 0; i < (fb->len / 2); i++) {
    pixels[i] = 0;
  }
}

The other tabs in the example codebase remain unchanged, except for uncommenting #define CAMERA_MODEL_AI_THINKER in the board_config.h tab.

If you encounter minor configuration problems when installing and connecting to the ESP32-CAM hardware through the Arduino IDE via the USB-C port in your laptop, I am confident that seeking a bit of assistance from Claud or even ChatGPT will help you overcome them with ease.

You would then have to setup an infinite DHCP lease for the camera static IP (192.168.1.100 in this example), and make sure to restart your router and test it. Also, make sure the lease is within your DHCP range on your router.

Then, your camera's control panel would become available on http://192.168.1.100 and the video stream on http://192.168.1.100:81/stream.

Now, you must create and run this Python facial recognition server program on your server device (can be a laptop, or a Raspberry Pi in your final implementation after testing). I used Visual Studio Code for writing the Python program.

import cv2
import numpy as np
import os
import pickle
import urllib.request
import sys
import tkinter as tk
from tkinter import ttk, simpledialog, messagebox
from PIL import Image, ImageTk
import threading
import queue
import time
from collections import defaultdict, deque
from datetime import datetime

# -----------------------------
# CONFIG
# -----------------------------
VIDEO_URL = "http://192.168.1.100:81/stream"  # your ESP32-CAM stream
KNOWN_FACES_DIR = "known_faces"
DATA_FILE = "face_encodings.pkl"

# Recognition thresholds - CRITICAL for accuracy
RECOGNITION_THRESHOLD_DLIB = 0.6  # dlib typical good match: 0.3-0.4
RECOGNITION_THRESHOLD_OPENCV = 0.25  # OpenCV normalized distance

# Multi-shot enrollment - FIXED TIMING
SHOTS_PER_PERSON = 5  # Capture 5 different shots per person
SHOT_DELAY = 10.0  # 10 SECONDS between captures - gives time to move!
COUNTDOWN_START = 3  # 3 second countdown before first shot

# Temporal smoothing - reduces name flickering
FACE_TRACKING_FRAMES = 15  # Increased from 10 - track faces over more frames for stability
CONFIDENCE_THRESHOLD = 25  # Lowered from 30 - minimum confidence to display name

# Debug mode
DEBUG_MODE = True  # Show distance values in console

os.makedirs(KNOWN_FACES_DIR, exist_ok=True)

# -----------------------------
# MODELS
# -----------------------------
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

try:
    import dlib
    shape_predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
    face_rec_model = dlib.face_recognition_model_v1("dlib_face_recognition_resnet_model_v1.dat")
    # Test if dlib actually works
    test_img = np.zeros((100, 100, 3), dtype=np.uint8)
    test_rect = dlib.rectangle(0, 0, 100, 100)
    test_shape = shape_predictor(test_img, test_rect)
    USE_DLIB = True
    print("✅ Using dlib for face recognition")
except Exception as e:
    print(f"⚠️ Dlib not available or not working ({e}), using OpenCV")
    USE_DLIB = False

# -----------------------------
# KNOWN FACES (Multi-shot storage)
# -----------------------------
if os.path.exists(DATA_FILE):
    with open(DATA_FILE, "rb") as f:
        known_faces = pickle.load(f)
else:
    known_faces = {}  # Structure: {name: [embedding1, embedding2, ...]}

# -----------------------------
# FACE TRACKING
# -----------------------------
class FaceTracker:
    """Track faces across frames for temporal smoothing"""
    def __init__(self, max_frames=FACE_TRACKING_FRAMES):
        self.tracks = {}  # {face_id: deque of (name, confidence)}
        self.next_id = 0
        self.max_frames = max_frames
        self.face_positions = {}  # {face_id: (x, y, w, h)}
    
    def update(self, faces_info):
        """Update tracking with new frame data"""
        current_positions = {}
        used_ids = set()
        
        # Match detected faces to existing tracks
        for face_info in faces_info:
            x, y, w, h = face_info['bbox']
            center = (x + w//2, y + h//2)
            
            # Find closest existing track
            best_match_id = None
            best_distance = float('inf')
            
            for track_id, old_pos in self.face_positions.items():
                if track_id in used_ids:
                    continue
                old_x, old_y, old_w, old_h = old_pos
                old_center = (old_x + old_w//2, old_y + old_h//2)
                dist = np.sqrt((center[0] - old_center[0])**2 + (center[1] - old_center[1])**2)
                
                # Increased tolerance for face movement (1.5x -> 2.0x face width)
                if dist < w * 2.0 and dist < best_distance:
                    best_distance = dist
                    best_match_id = track_id
            
            # Assign to track
            if best_match_id is not None:
                face_id = best_match_id
            else:
                face_id = self.next_id
                self.next_id += 1
                self.tracks[face_id] = deque(maxlen=self.max_frames)
            
            used_ids.add(face_id)
            current_positions[face_id] = (x, y, w, h)
            
            # Add detection to track
            self.tracks[face_id].append((face_info['name'], face_info['confidence']))
        
        # Remove old tracks
        self.face_positions = current_positions
        old_tracks = set(self.tracks.keys()) - used_ids
        for track_id in old_tracks:
            del self.tracks[track_id]
        
        # Get smoothed results
        smoothed_faces = []
        for face_id in used_ids:
            track = self.tracks[face_id]
            if len(track) == 0:
                continue
            
            # Vote on name (most common in recent frames)
            name_votes = defaultdict(int)
            confidence_sum = defaultdict(float)
            
            for name, conf in track:
                name_votes[name] += 1
                confidence_sum[name] += conf
            
            # Get most voted name - require at least 40% of frames to agree
            best_name = max(name_votes.items(), key=lambda x: x[1])[0]
            vote_percentage = name_votes[best_name] / len(track)
            
            # If less than 40% agreement, mark as Unknown
            if vote_percentage < 0.4 and best_name != "Unknown":
                best_name = "Unknown"
                avg_confidence = 0
            else:
                avg_confidence = confidence_sum[best_name] / name_votes[best_name]
            
            # Find original face_info for this track
            bbox = self.face_positions[face_id]
            for face_info in faces_info:
                if face_info['bbox'] == bbox:
                    smoothed_faces.append({
                        **face_info,
                        'name': best_name,
                        'confidence': int(avg_confidence)
                    })
                    break
        
        return smoothed_faces

face_tracker = FaceTracker()

# -----------------------------
# GLOBAL STATE
# -----------------------------
current_frame = None
current_faces = []
detected_faces_for_enrollment = []
action_queue = queue.Queue()
running = True

# FIXED ENROLLMENT STATE
enrollment_in_progress = False
enrollment_name = ""
enrollment_shots = []
enrollment_countdown = 0
enrollment_start_time = 0

# -----------------------------
# HELPER FUNCTIONS
# -----------------------------
def normalize_lighting(image):
    """Normalize lighting conditions using CLAHE"""
    lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
    l, a, b = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    l = clahe.apply(l)
    lab = cv2.merge([l, a, b])
    return cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)

def get_face_embedding_dlib(image, x, y, w, h):
    """Get face embedding using dlib"""
    rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    if rgb.dtype != np.uint8:
        rgb = rgb.astype(np.uint8)
    if not rgb.flags['C_CONTIGUOUS']:
        rgb = np.ascontiguousarray(rgb)
    
    rect = dlib.rectangle(x, y, x + w, y + h)
    shape = shape_predictor(rgb, rect)
    embedding = face_rec_model.compute_face_descriptor(rgb, shape)
    return np.array(embedding)

def get_face_embedding_opencv(image, x, y, w, h):
    """Fallback: extract face region as feature (simplified, matches working version)"""
    face = image[y:y+h, x:x+w]
    face_resized = cv2.resize(face, (128, 128))
    return face_resized.flatten().astype(np.float32) / 255.0

def get_face_embedding(image, x, y, w, h):
    """Get face embedding using available method"""
    if USE_DLIB:
        try:
            return get_face_embedding_dlib(image, x, y, w, h)
        except:
            return get_face_embedding_opencv(image, x, y, w, h)
    else:
        return get_face_embedding_opencv(image, x, y, w, h)

def save_face_image(name, image, x, y, w, h, shot_number):
    """Save actual face image to disk"""
    person_dir = os.path.join(KNOWN_FACES_DIR, name)
    os.makedirs(person_dir, exist_ok=True)
    
    # Extract face with padding
    padding = int(w * 0.1)
    y1 = max(0, y - padding)
    y2 = min(image.shape[0], y + h + padding)
    x1 = max(0, x - padding)
    x2 = min(image.shape[1], x + w + padding)
    
    face_img = image[y1:y2, x1:x2]
    
    # Save with timestamp
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"{name}_shot{shot_number}_{timestamp}.jpg"
    filepath = os.path.join(person_dir, filename)
    
    cv2.imwrite(filepath, face_img)
    print(f"  💾 Saved: {filepath}")
    return filepath

def save_face_multi_shot(name, shots_data):
    """Save multiple embeddings and images for one person"""
    print(f"🔍 DEBUG: save_face_multi_shot called with name='{name}', shots={len(shots_data)}")
    
    embeddings_list = []
    
    # Create person directory
    person_dir = os.path.join(KNOWN_FACES_DIR, name)
    print(f"🔍 DEBUG: person_dir = {person_dir}")
    
    if os.path.exists(person_dir):
        # Clear old images
        for old_file in os.listdir(person_dir):
            os.remove(os.path.join(person_dir, old_file))
        print(f"🔍 DEBUG: Cleared old files from {person_dir}")
    else:
        os.makedirs(person_dir, exist_ok=True)
        print(f"🔍 DEBUG: Created directory {person_dir}")
    
    # Save each shot
    for idx, (img, x, y, w, h) in enumerate(shots_data, 1):
        print(f"🔍 DEBUG: Processing shot {idx}: bbox=({x},{y},{w},{h})")
        
        # Save image
        filepath = save_face_image(name, img, x, y, w, h, idx)
        print(f"🔍 DEBUG: Saved image to {filepath}")
        
        # Save embedding
        embedding = get_face_embedding(img, x, y, w, h)
        embeddings_list.append(embedding)
        print(f"🔍 DEBUG: Generated embedding {idx}")
    
    # Save embeddings to pickle
    known_faces[name] = embeddings_list
    print(f"🔍 DEBUG: Added {name} to known_faces dict")
    
    try:
        with open(DATA_FILE, "wb") as f:
            pickle.dump(known_faces, f)
        print(f"🔍 DEBUG: Successfully saved to {DATA_FILE}")
    except Exception as e:
        print(f"❌ DEBUG: Failed to save pickle: {e}")
        return False
    
    print(f"✅ Saved {len(embeddings_list)} shots for '{name}'")
    return True

def delete_face(name):
    """Delete a saved face and its images"""
    if name in known_faces:
        del known_faces[name]
        with open(DATA_FILE, "wb") as f:
            pickle.dump(known_faces, f)
        
        # Delete image folder
        person_dir = os.path.join(KNOWN_FACES_DIR, name)
        if os.path.exists(person_dir):
            import shutil
            shutil.rmtree(person_dir)
        
        print(f"✅ Deleted '{name}'")
        return True
    return False

def calculate_confidence(distance, threshold):
    """Calculate confidence percentage based on distance and threshold"""
    if distance >= threshold:
        return 0
    
    # Scale: 0.0 = 100%, threshold = 0%
    confidence = max(0, min(100, int((1 - distance/threshold) * 100)))
    return confidence

def recognize_face(image, x, y, w, h):
    """Recognize face with multi-shot comparison"""
    if not known_faces:
        return "Unknown", 1.0, 0
    
    embedding = get_face_embedding(image, x, y, w, h)
    
    # Calculate distances to all known faces (comparing against all shots)
    best_match = None
    best_distance = float('inf')
    second_best_distance = float('inf')
    all_distances = []
    
    for name, embeddings_data in known_faces.items():
        # Handle both single embedding (old format) and list of embeddings (new format)
        embeddings_list = embeddings_data if isinstance(embeddings_data, list) else [embeddings_data]
        
        # Compare against all shots of this person
        person_distances = []
        for known_embedding in embeddings_list:
            distance = np.linalg.norm(known_embedding - embedding)
            person_distances.append(distance)
        
        # Use the BEST (minimum) distance from all shots
        min_distance = min(person_distances)
        avg_distance = np.mean(person_distances)
        
        # Weight: 70% best match, 30% average (to avoid outliers)
        weighted_distance = 0.7 * min_distance + 0.3 * avg_distance
        
        all_distances.append((name, weighted_distance, min_distance))
        
        # Track best and second-best matches
        if weighted_distance < best_distance:
            second_best_distance = best_distance
            best_distance = weighted_distance
            best_match = name
        elif weighted_distance < second_best_distance:
            second_best_distance = weighted_distance
    
    # Select appropriate threshold - auto-detect based on embedding size
    embedding_size = len(embedding)
    if USE_DLIB:
        threshold = RECOGNITION_THRESHOLD_DLIB
    else:
        # OpenCV creates large embeddings with high distances
        # Increased threshold for 1m distance (faces are smaller/lower resolution)
        threshold = 120.0  # Increased from 110 to handle 1m distance
    
    # Add confidence margin check: best match should be significantly better than second best
    confidence_margin = 0.08  # Reduced from 0.10 - more lenient for 3+ people scenarios
    if USE_DLIB:
        confidence_margin = 0.05  # Tighter margin for dlib (more accurate)
    
    # Debug output
    if DEBUG_MODE and all_distances:
        all_distances.sort(key=lambda x: x[1])
        print(f"\n🔍 Recognition distances (threshold={threshold}):")
        for name, dist, min_dist in all_distances[:3]:
            print(f"  {name}: weighted={dist:.3f}, best={min_dist:.3f}")
        print(f"  Confidence margin check: {best_distance:.3f} vs {second_best_distance:.3f}")
    
    if best_distance < threshold:
        # Check if match is confident enough (significantly better than second best)
        if second_best_distance == float('inf') or (best_distance * (1 + confidence_margin) < second_best_distance):
            confidence = calculate_confidence(best_distance, threshold)
            # Lower confidence threshold to 30% (was 40%)
            if confidence >= 30:
                return best_match, best_distance, confidence
    
    return "Unknown", best_distance, 0

def sort_faces_left_to_right(faces):
    """Sort faces by x-coordinate (left to right) for consistent ordering"""
    return sorted(faces, key=lambda f: f['bbox'][0])

# -----------------------------
# VIDEO PROCESSING THREAD
# -----------------------------
def reconnect_stream(url, max_retries=3):
    """Attempt to reconnect to the stream"""
    for attempt in range(max_retries):
        try:
            stream = urllib.request.urlopen(url, timeout=10)
            return stream
        except Exception as e:
            if attempt < max_retries - 1:
                time.sleep(2)
    return None

def video_processing_thread():
    global current_frame, current_faces, detected_faces_for_enrollment, running
    global enrollment_in_progress, enrollment_name, enrollment_shots
    global enrollment_countdown, enrollment_start_time
    
    stream = reconnect_stream(VIDEO_URL)
    if stream is None:
        print("❌ Failed to connect to stream")
        running = False
        return
    
    byte_buffer = b''
    stream_errors = 0
    max_stream_errors = 10
    
    print("✅ Connected to ESP32-CAM stream")
    
    while running:
        # Check for actions from GUI
        try:
            action = action_queue.get_nowait()
            if action['type'] == 'start_enrollment':
                enrollment_in_progress = True
                enrollment_name = action['name']
                enrollment_shots = []
                enrollment_countdown = COUNTDOWN_START
                enrollment_start_time = time.time()
                print(f"📸 Starting enrollment for '{enrollment_name}'...")
                print(f"⏱️  {COUNTDOWN_START} second countdown, then {SHOTS_PER_PERSON} shots with {SHOT_DELAY}s between each")
            elif action['type'] == 'delete':
                delete_face(action['name'])
        except queue.Empty:
            pass
        
        try:
            chunk = stream.read(4096)
            if not chunk:
                raise Exception("Empty chunk")
            byte_buffer += chunk
            stream_errors = 0
        except Exception as e:
            stream_errors += 1
            if stream_errors >= max_stream_errors:
                print("❌ Too many stream errors")
                break
            stream = reconnect_stream(VIDEO_URL, max_retries=2)
            if stream is None:
                break
            byte_buffer = b''
            continue
        
        a = byte_buffer.find(b'\xff\xd8')
        b = byte_buffer.find(b'\xff\xd9')
        
        if a != -1 and b != -1:
            jpg = byte_buffer[a:b+2]
            byte_buffer = byte_buffer[b+2:]
            
            if len(jpg) < 100:
                continue
            
            try:
                frame = cv2.imdecode(np.frombuffer(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
            except:
                continue
            
            if frame is None or frame.size == 0:
                continue
            
            # Detect faces with adjusted parameters for 1m distance
            gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
            # Reduced minSize to detect smaller faces at 1m distance
            faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))
            
            # Store detected faces for enrollment
            detected_faces_for_enrollment = []
            face_results = []
            
            # FIXED ENROLLMENT LOGIC
            if enrollment_in_progress:
                elapsed = time.time() - enrollment_start_time
                
                # Countdown phase
                if enrollment_countdown > 0:
                    countdown_remaining = COUNTDOWN_START - elapsed
                    if countdown_remaining <= 0:
                        enrollment_countdown = 0
                        enrollment_start_time = time.time()  # Reset for shots
                        elapsed = 0
                        print("✅ Countdown complete! Starting capture...")
                
                # Capture phase
                elif len(enrollment_shots) < SHOTS_PER_PERSON:
                    if len(faces) == 1:  # Only proceed if single face
                        # Check if it's time for next shot
                        shot_time = len(enrollment_shots) * SHOT_DELAY
                        if elapsed >= shot_time:
                            x, y, w, h = faces[0]
                            enrollment_shots.append((frame.copy(), x, y, w, h))
                            print(f"  📸 Captured shot {len(enrollment_shots)}/{SHOTS_PER_PERSON}")
                            
                            # Check if we just completed all shots
                            if len(enrollment_shots) == SHOTS_PER_PERSON:
                                print(f"💾 All shots captured! Saving {len(enrollment_shots)} shots for '{enrollment_name}'...")
                                success = save_face_multi_shot(enrollment_name, enrollment_shots)
                                if success:
                                    print(f"✅ Successfully enrolled '{enrollment_name}'")
                                else:
                                    print(f"❌ Failed to enroll '{enrollment_name}'")
                                enrollment_in_progress = False
                    elif len(faces) > 1:
                        print("  ⚠️ Multiple faces detected - waiting for single face...")
                    else:
                        print("  ⚠️ No face detected - waiting...")
            
            # Process each face for display
            for (x, y, w, h) in faces:
                detected_faces_for_enrollment.append((frame.copy(), x, y, w, h))
                
                # Recognition (skip during enrollment)
                if not enrollment_in_progress:
                    name, distance, confidence = recognize_face(frame, x, y, w, h)
                else:
                    name, distance, confidence = "ENROLLING", 0, 0
                
                face_results.append({
                    'name': name, 
                    'distance': distance,
                    'confidence': confidence,
                    'bbox': (x, y, w, h),
                    'center_x': x + w//2
                })
            
            # Sort faces left to right
            face_results = sort_faces_left_to_right(face_results)
            
            # Apply temporal smoothing (skip during enrollment)
            if not enrollment_in_progress:
                face_results = face_tracker.update(face_results)
            
            current_faces = face_results
            
            # Draw faces on frame
            for idx, face_info in enumerate(face_results):
                x, y, w, h = face_info['bbox']
                name = face_info['name']
                confidence = face_info['confidence']
                
                # ENROLLMENT VISUAL FEEDBACK
                if enrollment_in_progress:
                    if len(faces) != 1:
                        # Warning: wrong number of faces
                        color = (0, 0, 255)  # Red
                        label = "⚠️ SINGLE FACE ONLY!"
                    elif enrollment_countdown > 0:
                        # Countdown
                        elapsed = time.time() - enrollment_start_time
                        countdown_val = max(1, int(COUNTDOWN_START - elapsed + 1))
                        color = (0, 255, 255)  # Yellow
                        label = f"GET READY... {countdown_val}"
                    else:
                        # Capturing
                        shots_taken = len(enrollment_shots)
                        elapsed = time.time() - enrollment_start_time
                        next_shot_in = max(0, (shots_taken + 1) * SHOT_DELAY - elapsed)
                        
                        color = (0, 255, 0)  # Green
                        label = f"📸 {shots_taken}/{SHOTS_PER_PERSON} - Next in {next_shot_in:.1f}s"
                        
                        if next_shot_in < 0.5:
                            label = f"📸 CAPTURING {shots_taken+1}/{SHOTS_PER_PERSON}..."
                    
                    thickness = 5
                else:
                    # Normal recognition
                    if name != "Unknown":
                        if confidence >= 80:
                            color = (0, 255, 0)  # Bright green
                        elif confidence >= 60:
                            color = (0, 200, 200)  # Yellow-green
                        else:
                            color = (0, 165, 255)  # Orange
                    else:
                        color = (0, 0, 255)  # Red for unknown
                    
                    thickness = 3
                    
                    # Label
                    if len(face_results) > 1:
                        label = f"#{idx+1} {name}"
                    else:
                        label = name
                    
                    if name != "Unknown":
                        label += f" ({confidence}%)"
                
                # Draw box
                cv2.rectangle(frame, (x, y), (x+w, y+h), color, thickness)
                
                # Draw label background and text
                label_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)[0]
                cv2.rectangle(frame, (x, y-35), (x+label_size[0]+10, y), color, -1)
                cv2.putText(frame, label, (x+5, y-10),
                           cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 2, cv2.LINE_AA)
            
            current_frame = frame
    
    try:
        stream.close()
    except:
        pass

# -----------------------------
# GUI APPLICATION
# -----------------------------
class FaceRecognitionGUI:
    def __init__(self, root):
        self.root = root
        self.root.title("UY Scuti Face Recognition System - Fixed Multi-Shot")
        self.root.geometry("1000x700")
        
        # Create main container
        main_container = tk.Frame(root)
        main_container.pack(fill="both", expand=True)
        
        # Left side - Video feed
        video_frame = tk.LabelFrame(main_container, text="📹 Live Video Feed", padx=5, pady=5)
        video_frame.pack(side="left", padx=10, pady=10, fill="both", expand=True)
        
        self.video_canvas = tk.Canvas(video_frame, width=640, height=480, bg="black")
        self.video_canvas.pack()
        
        # Right side - Controls
        control_frame = tk.Frame(main_container, width=300)
        control_frame.pack(side="right", padx=10, pady=10, fill="both")
        
        # Title
        title = tk.Label(control_frame, text="🎯 Control Panel", font=("Arial", 16, "bold"))
        title.pack(pady=10)
        
        # Status
        self.status_var = tk.StringVar(value="🟢 System Running")
        status_label = tk.Label(control_frame, textvariable=self.status_var, font=("Arial", 11))
        status_label.pack(pady=5)
        
        # Method indicator
        method = "Dlib (High Accuracy)" if USE_DLIB else "OpenCV (Basic)"
        method_label = tk.Label(control_frame, text=f"Method: {method}", 
                               font=("Arial", 9), fg="blue")
        method_label.pack(pady=2)
        
        # Face count
        self.face_count_var = tk.StringVar(value=self.get_face_count_text())
        face_count = tk.Label(control_frame, textvariable=self.face_count_var, font=("Arial", 10))
        face_count.pack(pady=5)
        
        # Current detection
        detection_frame = tk.LabelFrame(control_frame, text="Current Detection", padx=10, pady=10)
        detection_frame.pack(padx=10, pady=10, fill="x")
        
        self.detection_var = tk.StringVar(value="No face detected")
        detection_label = tk.Label(detection_frame, textvariable=self.detection_var, 
                                   font=("Arial", 9), fg="blue", wraplength=250, justify="left")
        detection_label.pack()
        
        # Enrollment section
        enroll_frame = tk.LabelFrame(control_frame, text=f"📝 Enroll New Person", 
                                     padx=10, pady=10)
        enroll_frame.pack(padx=10, pady=10, fill="x")
        
        tk.Label(enroll_frame, text="Name:").pack()
        self.name_entry = tk.Entry(enroll_frame, font=("Arial", 11), width=22)
        self.name_entry.pack(pady=5)
        
        self.enroll_btn = tk.Button(enroll_frame, text=f"📸 Start {SHOTS_PER_PERSON}-Shot Capture", 
                                     command=self.start_multi_shot_enrollment,
                                     bg="#4CAF50", fg="white", font=("Arial", 10, "bold"), pady=5)
        self.enroll_btn.pack(pady=5)
        
        self.enrollment_status_var = tk.StringVar(value="")
        self.enrollment_status_label = tk.Label(enroll_frame, textvariable=self.enrollment_status_var,
                                                font=("Arial", 12, "bold"), fg="orange", 
                                                wraplength=250, height=3)
        self.enrollment_status_label.pack(pady=10)
        
        # Instructions
        instructions = tk.Label(enroll_frame, 
                              text=f"💡 Tips:\n"
                                   f"• Only ONE person in frame\n"
                                   f"• {COUNTDOWN_START}s countdown first\n"
                                   f"• Move head between shots\n"
                                   f"• {SHOT_DELAY}s between captures",
                              font=("Arial", 8), fg="gray", justify="left")
        instructions.pack(pady=5)
        
        # Management section
        manage_frame = tk.LabelFrame(control_frame, text="🔧 Manage Faces", padx=10, pady=10)
        manage_frame.pack(padx=10, pady=10, fill="both", expand=True)
        
        # Listbox with scrollbar
        list_frame = tk.Frame(manage_frame)
        list_frame.pack(fill="both", expand=True)
        
        scrollbar = tk.Scrollbar(list_frame)
        scrollbar.pack(side="right", fill="y")
        
        self.face_listbox = tk.Listbox(list_frame, yscrollcommand=scrollbar.set, 
                                       font=("Arial", 10), height=8)
        self.face_listbox.pack(side="left", fill="both", expand=True)
        scrollbar.config(command=self.face_listbox.yview)
        
        # Buttons
        btn_frame = tk.Frame(manage_frame)
        btn_frame.pack(pady=5)
        
        delete_btn = tk.Button(btn_frame, text="🗑️ Delete", command=self.delete_face,
                              bg="#f44336", fg="white", font=("Arial", 9), width=10)
        delete_btn.pack(side="left", padx=3)
        
        refresh_btn = tk.Button(btn_frame, text="🔄 Refresh", command=self.refresh_list,
                               bg="#FF9800", fg="white", font=("Arial", 9), width=10)
        refresh_btn.pack(side="left", padx=3)
        
        # Start updates
        self.refresh_list()
        self.update_video()
        self.update_status()
        self.check_enrollment()
    
    def get_face_count_text(self):
        total_shots = sum(len(embeddings) for embeddings in known_faces.values())
        return f"Saved: {len(known_faces)} people ({total_shots} shots)"
    
    def update_video(self):
        """Update video canvas with current frame"""
        global current_frame
        
        if current_frame is not None:
            frame_rgb = cv2.cvtColor(current_frame, cv2.COLOR_BGR2RGB)
            h, w = frame_rgb.shape[:2]
            target_w, target_h = 640, 480
            scale = min(target_w/w, target_h/h)
            new_w, new_h = int(w*scale), int(h*scale)
            frame_resized = cv2.resize(frame_rgb, (new_w, new_h))
            img = Image.fromarray(frame_resized)
            imgtk = ImageTk.PhotoImage(image=img)
            self.video_canvas.delete("all")
            self.video_canvas.create_image(target_w//2, target_h//2, image=imgtk, anchor=tk.CENTER)
            self.video_canvas.image = imgtk
        
        if running:
            self.root.after(30, self.update_video)
    
    def update_status(self):
        """Update status information"""
        if running:
            self.face_count_var.set(self.get_face_count_text())
            
            # Update current detection
            if current_faces:
                detection_text = ""
                for idx, face_info in enumerate(current_faces[:3]):
                    name = face_info['name']
                    confidence = face_info['confidence']
                    if name == "ENROLLING":
                        detection_text = "🔴 ENROLLMENT IN PROGRESS"
                        break
                    elif name != "Unknown":
                        detection_text += f"#{idx+1}: ✅ {name} ({confidence}%)\n"
                    else:
                        detection_text += f"#{idx+1}: ❓ Unknown\n"
                self.detection_var.set(detection_text.strip())
            else:
                self.detection_var.set("No face detected")
            
            self.root.after(500, self.update_status)
        else:
            self.status_var.set("🔴 System Stopped")
    
    def check_enrollment(self):
        """Check enrollment progress and update UI"""
        global enrollment_in_progress, enrollment_shots, enrollment_countdown, enrollment_start_time
        
        if enrollment_in_progress:
            elapsed = time.time() - enrollment_start_time
            
            # Countdown phase
            if enrollment_countdown > 0:
                countdown_val = max(1, int(COUNTDOWN_START - elapsed + 1))
                self.enrollment_status_var.set(f"⏱️ Get ready... {countdown_val}")
                self.enroll_btn.config(state="disabled")
            
            # Capture phase
            elif len(enrollment_shots) < SHOTS_PER_PERSON:
                shots_taken = len(enrollment_shots)
                next_shot_in = max(0, (shots_taken + 1) * SHOT_DELAY - elapsed)
                self.enrollment_status_var.set(f"📸 Shot {shots_taken}/{SHOTS_PER_PERSON} - Next in {next_shot_in:.1f}s")
                self.enroll_btn.config(state="disabled")
        
        # Check if enrollment just completed
        if not enrollment_in_progress and len(enrollment_shots) == SHOTS_PER_PERSON:
            self.enrollment_status_var.set("✅ Enrollment complete!")
            self.enroll_btn.config(state="normal")
            self.name_entry.delete(0, tk.END)
            self.refresh_list()
            
            # Reset enrollment_shots to prevent repeated refresh
            enrollment_shots.clear()
            
            # Clear status after delay
            self.root.after(3000, lambda: self.enrollment_status_var.set(""))
        
        if running:
            self.root.after(100, self.check_enrollment)
    
    def start_multi_shot_enrollment(self):
        """Start multi-shot enrollment process"""
        global enrollment_in_progress, enrollment_shots
        
        name = self.name_entry.get().strip()
        if not name:
            messagebox.showwarning("No Name", "Please enter a name first!")
            return
        
        if not detected_faces_for_enrollment:
            messagebox.showwarning("No Face", "No face detected! Position face in view.")
            return
        
        if len(detected_faces_for_enrollment) > 1:
            messagebox.showwarning("Multiple Faces", 
                                 "Multiple faces detected!\nPlease ensure only ONE person is in frame.")
            return
        
        if name in known_faces:
            if not messagebox.askyesno("Overwrite", 
                                      f"'{name}' already exists with {len(known_faces[name])} shots.\nOverwrite?"):
                return
        
        # Start enrollment
        self.enroll_btn.config(state="disabled")
        self.enrollment_status_var.set("Starting...")
        action_queue.put({'type': 'start_enrollment', 'name': name})
        
        messagebox.showinfo("Multi-Shot Capture", 
                          f"Starting {SHOTS_PER_PERSON}-shot capture!\n\n"
                          f"📋 Process:\n"
                          f"1. {COUNTDOWN_START} second countdown\n"
                          f"2. {SHOTS_PER_PERSON} shots with {SHOT_DELAY}s between each\n"
                          f"3. Move your head slightly between shots\n\n"
                          f"💡 Stay in frame and keep only ONE person visible!")
    
    def delete_face(self):
        """Delete selected face"""
        selection = self.face_listbox.curselection()
        if not selection:
            messagebox.showwarning("No Selection", "Please select a face to delete!")
            return
        
        name = self.face_listbox.get(selection[0]).split(" (")[0]  # Remove shot count
        if messagebox.askyesno("Confirm Delete", f"Delete '{name}' and all saved images?"):
            action_queue.put({'type': 'delete', 'name': name})
            self.root.after(500, self.refresh_list)
            messagebox.showinfo("Success", f"✅ Deleted '{name}' and all images!")
    
    def refresh_list(self):
        """Refresh the face list"""
        self.face_listbox.delete(0, tk.END)
        for name in sorted(known_faces.keys()):
            shot_count = len(known_faces[name])
            self.face_listbox.insert(tk.END, f"{name} ({shot_count} shots)")

# -----------------------------
# MAIN
# -----------------------------
if __name__ == "__main__":
    print("🚀 Starting UY Scuti Face Recognition System - Fixed Multi-Shot")
    print(f"⚙️  Recognition method: {'Dlib' if USE_DLIB else 'OpenCV'}")
    print(f"⚙️  Shots per person: {SHOTS_PER_PERSON}")
    print(f"⚙️  Shot delay: {SHOT_DELAY}s")
    print(f"⚙️  Countdown: {COUNTDOWN_START}s")
    print(f"⚙️  Confidence threshold: {CONFIDENCE_THRESHOLD}%")
    print(f"⚙️  Recognition threshold: {RECOGNITION_THRESHOLD_DLIB if USE_DLIB else RECOGNITION_THRESHOLD_OPENCV}")
    
    # Start video processing in separate thread
    video_thread = threading.Thread(target=video_processing_thread, daemon=True)
    video_thread.start()
    
    # Start GUI
    root = tk.Tk()
    app = FaceRecognitionGUI(root)
    
    try:
        root.mainloop()
    except KeyboardInterrupt:
        pass
    
    running = False
    video_thread.join(timeout=2)
    print("✅ Program closed cleanly")

As you can see, this program has almost a 1,000 lines of code in it, and this is where conversations with Claud can become invaluable when drafting, troubleshooting and fine tuning it for your particular purpose.

Before you run this code, you will need to first install in your Python project folder several dependencies, including OpenCV, NumPy, Pillow and dlib. Some of these library installations, especially dlib, involve complexities that require trial and error experimentation on your particular server environment, so once again I recommend you get Claude's or ChatGPT's help to get these libraries setup.

The final outcome would be a video stream and a surrounding desktop control interface, tuned to detect, enroll and identify multiple faces from a standard distance of around 1 meter. I leave it to your imagination to decide what to do next... enjoy!

with Ruwan Rajapakse

DIY facial recognition and identification for your home

Recent Posts

Comments