DIY facial recognition and identification for your home
- Ruwan Rajapakse
- Nov 16, 2025
- 16 min read
As part of a broader AI use case I’ve been exploring, I recently did a bit of R&D on facial recognition using low-cost components and lightweight methods. I’ll admit I didn’t have much confidence in “vibe coding” at the outset, thanks to a few negative experiences a couple of years ago when the tools were far less mature. But I was genuinely impressed by what I managed to build in just a couple of days last week, sitting in front of Claude and ChatGPT.
I powered through hardware quirks, library installations, network glitches and debug sessions to end up with a stable, robust smart camera that can identify and track who visits my cubicle, including multiple visitors at once. Its true I’m a former, somewhat mediocre programmer rather than a complete novice, but I haven’t built anything this involved in a longer while than I care to admit.
The solution uses an ESP32-CAM and two small, purpose-built programs: a camera web server running on the device itself (written in Arduino Wiring/C++), and a facial recognition server running on a laptop or Raspberry Pi (written in Python). This sort of DIY facial-recognition setup has been around for more than five years, but from what I’ve seen, the underlying technologies and libraries have matured significantly, making the results far more stable and reliable. It shows promise for utilization in my broader use case.

If you’re interested in the code and configuration steps, they’re included further down.
What I really want to highlight here, though, is just how delightful the overall experience was. I resolved more than two dozen issues, ranging from library incompatibilities to network-routing headaches to tuning the system for my specific deployment environment, and still produced a robust solution in only two days. ChatGPT cleared many of the simpler hurdles, such as network configuration, but Claude seems to have the edge when it comes to deeper programming and troubleshooting. I’d recommend it to anyone looking to assemble a quick, functional IOT/AI solution from popular components.
Here is the Arduino C++ program, which is a modification of the CameraWebServer program found in the examples coming with the ESP32 Wrover Module (File > Examples > ESP32 > Camera > CameraWebServer). This is the first tab of the modified CameraWebServer codebase.
#include "esp_camera.h"
#include <WiFi.h>
#include <ESPmDNS.h>
#include "esp_wifi.h" // ← ADD THIS LINE for esp_wifi_set_ps
#include "board_config.h"
// ===========================
// WiFi credentials
// ===========================
const char* ssid = "network name";
const char* password = "network password";
// No static IP needed! Router assigns 192.168.1.100 via DHCP reservation
bool faceDetectionEnabled = false;
bool faceRecognitionEnabled = false;
camera_fb_t* fb_for_detection = nullptr;
void startCameraServer();
void setup() {
Serial.begin(115200);
Serial.setDebugOutput(true);
Serial.println("\n=================================");
Serial.println("UY Scuti ESP32-CAM Starting...");
Serial.println("=================================");
// ===========================
// Camera Configuration
// ===========================
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sccb_sda = SIOD_GPIO_NUM;
config.pin_sccb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_JPEG;
config.frame_size = FRAMESIZE_VGA;
config.fb_location = CAMERA_FB_IN_PSRAM;
config.jpeg_quality = 12;
config.grab_mode = CAMERA_GRAB_LATEST;
if (psramFound()) {
config.fb_count = 2;
} else {
config.fb_count = 1;
}
#if defined(CAMERA_MODEL_ESP_EYE)
pinMode(13, INPUT_PULLUP);
pinMode(14, INPUT_PULLUP);
#endif
Serial.println("\n=== Camera Initialization ===");
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("❌ Camera init failed: 0x%x\n", err);
delay(5000);
ESP.restart();
return;
}
Serial.println("✅ Camera initialized");
sensor_t *s = esp_camera_sensor_get();
if (s->id.PID == OV3660_PID) {
s->set_vflip(s, 1);
s->set_brightness(s, 1);
s->set_saturation(s, -2);
}
s->set_framesize(s, FRAMESIZE_VGA);
#if defined(CAMERA_MODEL_M5STACK_WIDE) || defined(CAMERA_MODEL_M5STACK_ESP32CAM)
s->set_vflip(s, 1);
s->set_hmirror(s, 1);
#endif
#if defined(CAMERA_MODEL_ESP32S3_EYE)
s->set_vflip(s, 1);
#endif
Serial.printf("PSRAM: %s\n", psramFound() ? "✅ Available" : "❌ Not found");
// ===========================
// WiFi Connection (DHCP)
// ===========================
Serial.println("\n=== WiFi Connection ===");
WiFi.mode(WIFI_STA);
WiFi.setSleep(false);
// Disable WiFi power saving for stable streaming
esp_wifi_set_ps(WIFI_PS_NONE);
WiFi.begin(ssid, password);
Serial.print("Connecting");
int attempts = 0;
while (WiFi.status() != WL_CONNECTED && attempts < 30) {
delay(500);
Serial.print(".");
attempts++;
}
if (WiFi.status() != WL_CONNECTED) {
Serial.println("\n❌ WiFi failed. Restarting...");
delay(5000);
ESP.restart();
return;
}
Serial.println("\n✅ WiFi connected!");
Serial.print("MAC Address: ");
Serial.println(WiFi.macAddress());
Serial.print("IP Address: ");
Serial.println(WiFi.localIP());
Serial.print("Signal: ");
Serial.print(WiFi.RSSI());
Serial.println(" dBm");
// ===========================
// mDNS Setup
// ===========================
Serial.println("\n=== mDNS Setup ===");
delay(500);
if (MDNS.begin("uyscuti")) {
Serial.println("✅ mDNS: http://uyscuti.local");
MDNS.addService("http", "tcp", 80);
} else {
Serial.println("⚠️ mDNS failed");
}
// ===========================
// Start Camera Server
// ===========================
Serial.println("\n=== Starting Camera Server ===");
startCameraServer();
Serial.println("✅ Server started");
// ===========================
// Connection Summary
// ===========================
Serial.println("\n=================================");
Serial.println("✅ CAMERA READY!");
Serial.println("=================================");
Serial.println("Access via:");
Serial.print(" • http://");
Serial.println(WiFi.localIP());
Serial.println(" • http://uyscuti.local");
Serial.print(" • Stream: http://");
Serial.print(WiFi.localIP());
Serial.println(":81/stream");
Serial.println("=================================\n");
}
void loop() {
static unsigned long lastCheck = 0;
if (millis() - lastCheck > 10000) {
lastCheck = millis();
if (WiFi.status() != WL_CONNECTED) {
Serial.println("⚠️ WiFi lost. Reconnecting...");
WiFi.reconnect();
} else {
Serial.printf("✓ WiFi OK | RSSI: %d dBm | IP: %s\n",
WiFi.RSSI(), WiFi.localIP().toString().c_str());
}
}
delay(100);
}
void clearFaceOverlay(camera_fb_t* fb) {
if (!fb) return;
uint16_t* pixels = (uint16_t*)fb->buf;
if (!pixels) return;
for (size_t i = 0; i < (fb->len / 2); i++) {
pixels[i] = 0;
}
}The other tabs in the example codebase remain unchanged, except for uncommenting #define CAMERA_MODEL_AI_THINKER in the board_config.h tab.
If you encounter minor configuration problems when installing and connecting to the ESP32-CAM hardware through the Arduino IDE via the USB-C port in your laptop, I am confident that seeking a bit of assistance from Claud or even ChatGPT will help you overcome them with ease.
You would then have to setup an infinite DHCP lease for the camera static IP (192.168.1.100 in this example), and make sure to restart your router and test it. Also, make sure the lease is within your DHCP range on your router.
Then, your camera's control panel would become available on http://192.168.1.100 and the video stream on http://192.168.1.100:81/stream.
Now, you must create and run this Python facial recognition server program on your server device (can be a laptop, or a Raspberry Pi in your final implementation after testing). I used Visual Studio Code for writing the Python program.
import cv2
import numpy as np
import os
import pickle
import urllib.request
import sys
import tkinter as tk
from tkinter import ttk, simpledialog, messagebox
from PIL import Image, ImageTk
import threading
import queue
import time
from collections import defaultdict, deque
from datetime import datetime
# -----------------------------
# CONFIG
# -----------------------------
VIDEO_URL = "http://192.168.1.100:81/stream" # your ESP32-CAM stream
KNOWN_FACES_DIR = "known_faces"
DATA_FILE = "face_encodings.pkl"
# Recognition thresholds - CRITICAL for accuracy
RECOGNITION_THRESHOLD_DLIB = 0.6 # dlib typical good match: 0.3-0.4
RECOGNITION_THRESHOLD_OPENCV = 0.25 # OpenCV normalized distance
# Multi-shot enrollment - FIXED TIMING
SHOTS_PER_PERSON = 5 # Capture 5 different shots per person
SHOT_DELAY = 10.0 # 10 SECONDS between captures - gives time to move!
COUNTDOWN_START = 3 # 3 second countdown before first shot
# Temporal smoothing - reduces name flickering
FACE_TRACKING_FRAMES = 15 # Increased from 10 - track faces over more frames for stability
CONFIDENCE_THRESHOLD = 25 # Lowered from 30 - minimum confidence to display name
# Debug mode
DEBUG_MODE = True # Show distance values in console
os.makedirs(KNOWN_FACES_DIR, exist_ok=True)
# -----------------------------
# MODELS
# -----------------------------
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
try:
import dlib
shape_predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
face_rec_model = dlib.face_recognition_model_v1("dlib_face_recognition_resnet_model_v1.dat")
# Test if dlib actually works
test_img = np.zeros((100, 100, 3), dtype=np.uint8)
test_rect = dlib.rectangle(0, 0, 100, 100)
test_shape = shape_predictor(test_img, test_rect)
USE_DLIB = True
print("✅ Using dlib for face recognition")
except Exception as e:
print(f"⚠️ Dlib not available or not working ({e}), using OpenCV")
USE_DLIB = False
# -----------------------------
# KNOWN FACES (Multi-shot storage)
# -----------------------------
if os.path.exists(DATA_FILE):
with open(DATA_FILE, "rb") as f:
known_faces = pickle.load(f)
else:
known_faces = {} # Structure: {name: [embedding1, embedding2, ...]}
# -----------------------------
# FACE TRACKING
# -----------------------------
class FaceTracker:
"""Track faces across frames for temporal smoothing"""
def __init__(self, max_frames=FACE_TRACKING_FRAMES):
self.tracks = {} # {face_id: deque of (name, confidence)}
self.next_id = 0
self.max_frames = max_frames
self.face_positions = {} # {face_id: (x, y, w, h)}
def update(self, faces_info):
"""Update tracking with new frame data"""
current_positions = {}
used_ids = set()
# Match detected faces to existing tracks
for face_info in faces_info:
x, y, w, h = face_info['bbox']
center = (x + w//2, y + h//2)
# Find closest existing track
best_match_id = None
best_distance = float('inf')
for track_id, old_pos in self.face_positions.items():
if track_id in used_ids:
continue
old_x, old_y, old_w, old_h = old_pos
old_center = (old_x + old_w//2, old_y + old_h//2)
dist = np.sqrt((center[0] - old_center[0])**2 + (center[1] - old_center[1])**2)
# Increased tolerance for face movement (1.5x -> 2.0x face width)
if dist < w * 2.0 and dist < best_distance:
best_distance = dist
best_match_id = track_id
# Assign to track
if best_match_id is not None:
face_id = best_match_id
else:
face_id = self.next_id
self.next_id += 1
self.tracks[face_id] = deque(maxlen=self.max_frames)
used_ids.add(face_id)
current_positions[face_id] = (x, y, w, h)
# Add detection to track
self.tracks[face_id].append((face_info['name'], face_info['confidence']))
# Remove old tracks
self.face_positions = current_positions
old_tracks = set(self.tracks.keys()) - used_ids
for track_id in old_tracks:
del self.tracks[track_id]
# Get smoothed results
smoothed_faces = []
for face_id in used_ids:
track = self.tracks[face_id]
if len(track) == 0:
continue
# Vote on name (most common in recent frames)
name_votes = defaultdict(int)
confidence_sum = defaultdict(float)
for name, conf in track:
name_votes[name] += 1
confidence_sum[name] += conf
# Get most voted name - require at least 40% of frames to agree
best_name = max(name_votes.items(), key=lambda x: x[1])[0]
vote_percentage = name_votes[best_name] / len(track)
# If less than 40% agreement, mark as Unknown
if vote_percentage < 0.4 and best_name != "Unknown":
best_name = "Unknown"
avg_confidence = 0
else:
avg_confidence = confidence_sum[best_name] / name_votes[best_name]
# Find original face_info for this track
bbox = self.face_positions[face_id]
for face_info in faces_info:
if face_info['bbox'] == bbox:
smoothed_faces.append({
**face_info,
'name': best_name,
'confidence': int(avg_confidence)
})
break
return smoothed_faces
face_tracker = FaceTracker()
# -----------------------------
# GLOBAL STATE
# -----------------------------
current_frame = None
current_faces = []
detected_faces_for_enrollment = []
action_queue = queue.Queue()
running = True
# FIXED ENROLLMENT STATE
enrollment_in_progress = False
enrollment_name = ""
enrollment_shots = []
enrollment_countdown = 0
enrollment_start_time = 0
# -----------------------------
# HELPER FUNCTIONS
# -----------------------------
def normalize_lighting(image):
"""Normalize lighting conditions using CLAHE"""
lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
l, a, b = cv2.split(lab)
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
l = clahe.apply(l)
lab = cv2.merge([l, a, b])
return cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)
def get_face_embedding_dlib(image, x, y, w, h):
"""Get face embedding using dlib"""
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
if rgb.dtype != np.uint8:
rgb = rgb.astype(np.uint8)
if not rgb.flags['C_CONTIGUOUS']:
rgb = np.ascontiguousarray(rgb)
rect = dlib.rectangle(x, y, x + w, y + h)
shape = shape_predictor(rgb, rect)
embedding = face_rec_model.compute_face_descriptor(rgb, shape)
return np.array(embedding)
def get_face_embedding_opencv(image, x, y, w, h):
"""Fallback: extract face region as feature (simplified, matches working version)"""
face = image[y:y+h, x:x+w]
face_resized = cv2.resize(face, (128, 128))
return face_resized.flatten().astype(np.float32) / 255.0
def get_face_embedding(image, x, y, w, h):
"""Get face embedding using available method"""
if USE_DLIB:
try:
return get_face_embedding_dlib(image, x, y, w, h)
except:
return get_face_embedding_opencv(image, x, y, w, h)
else:
return get_face_embedding_opencv(image, x, y, w, h)
def save_face_image(name, image, x, y, w, h, shot_number):
"""Save actual face image to disk"""
person_dir = os.path.join(KNOWN_FACES_DIR, name)
os.makedirs(person_dir, exist_ok=True)
# Extract face with padding
padding = int(w * 0.1)
y1 = max(0, y - padding)
y2 = min(image.shape[0], y + h + padding)
x1 = max(0, x - padding)
x2 = min(image.shape[1], x + w + padding)
face_img = image[y1:y2, x1:x2]
# Save with timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"{name}_shot{shot_number}_{timestamp}.jpg"
filepath = os.path.join(person_dir, filename)
cv2.imwrite(filepath, face_img)
print(f" 💾 Saved: {filepath}")
return filepath
def save_face_multi_shot(name, shots_data):
"""Save multiple embeddings and images for one person"""
print(f"🔍 DEBUG: save_face_multi_shot called with name='{name}', shots={len(shots_data)}")
embeddings_list = []
# Create person directory
person_dir = os.path.join(KNOWN_FACES_DIR, name)
print(f"🔍 DEBUG: person_dir = {person_dir}")
if os.path.exists(person_dir):
# Clear old images
for old_file in os.listdir(person_dir):
os.remove(os.path.join(person_dir, old_file))
print(f"🔍 DEBUG: Cleared old files from {person_dir}")
else:
os.makedirs(person_dir, exist_ok=True)
print(f"🔍 DEBUG: Created directory {person_dir}")
# Save each shot
for idx, (img, x, y, w, h) in enumerate(shots_data, 1):
print(f"🔍 DEBUG: Processing shot {idx}: bbox=({x},{y},{w},{h})")
# Save image
filepath = save_face_image(name, img, x, y, w, h, idx)
print(f"🔍 DEBUG: Saved image to {filepath}")
# Save embedding
embedding = get_face_embedding(img, x, y, w, h)
embeddings_list.append(embedding)
print(f"🔍 DEBUG: Generated embedding {idx}")
# Save embeddings to pickle
known_faces[name] = embeddings_list
print(f"🔍 DEBUG: Added {name} to known_faces dict")
try:
with open(DATA_FILE, "wb") as f:
pickle.dump(known_faces, f)
print(f"🔍 DEBUG: Successfully saved to {DATA_FILE}")
except Exception as e:
print(f"❌ DEBUG: Failed to save pickle: {e}")
return False
print(f"✅ Saved {len(embeddings_list)} shots for '{name}'")
return True
def delete_face(name):
"""Delete a saved face and its images"""
if name in known_faces:
del known_faces[name]
with open(DATA_FILE, "wb") as f:
pickle.dump(known_faces, f)
# Delete image folder
person_dir = os.path.join(KNOWN_FACES_DIR, name)
if os.path.exists(person_dir):
import shutil
shutil.rmtree(person_dir)
print(f"✅ Deleted '{name}'")
return True
return False
def calculate_confidence(distance, threshold):
"""Calculate confidence percentage based on distance and threshold"""
if distance >= threshold:
return 0
# Scale: 0.0 = 100%, threshold = 0%
confidence = max(0, min(100, int((1 - distance/threshold) * 100)))
return confidence
def recognize_face(image, x, y, w, h):
"""Recognize face with multi-shot comparison"""
if not known_faces:
return "Unknown", 1.0, 0
embedding = get_face_embedding(image, x, y, w, h)
# Calculate distances to all known faces (comparing against all shots)
best_match = None
best_distance = float('inf')
second_best_distance = float('inf')
all_distances = []
for name, embeddings_data in known_faces.items():
# Handle both single embedding (old format) and list of embeddings (new format)
embeddings_list = embeddings_data if isinstance(embeddings_data, list) else [embeddings_data]
# Compare against all shots of this person
person_distances = []
for known_embedding in embeddings_list:
distance = np.linalg.norm(known_embedding - embedding)
person_distances.append(distance)
# Use the BEST (minimum) distance from all shots
min_distance = min(person_distances)
avg_distance = np.mean(person_distances)
# Weight: 70% best match, 30% average (to avoid outliers)
weighted_distance = 0.7 * min_distance + 0.3 * avg_distance
all_distances.append((name, weighted_distance, min_distance))
# Track best and second-best matches
if weighted_distance < best_distance:
second_best_distance = best_distance
best_distance = weighted_distance
best_match = name
elif weighted_distance < second_best_distance:
second_best_distance = weighted_distance
# Select appropriate threshold - auto-detect based on embedding size
embedding_size = len(embedding)
if USE_DLIB:
threshold = RECOGNITION_THRESHOLD_DLIB
else:
# OpenCV creates large embeddings with high distances
# Increased threshold for 1m distance (faces are smaller/lower resolution)
threshold = 120.0 # Increased from 110 to handle 1m distance
# Add confidence margin check: best match should be significantly better than second best
confidence_margin = 0.08 # Reduced from 0.10 - more lenient for 3+ people scenarios
if USE_DLIB:
confidence_margin = 0.05 # Tighter margin for dlib (more accurate)
# Debug output
if DEBUG_MODE and all_distances:
all_distances.sort(key=lambda x: x[1])
print(f"\n🔍 Recognition distances (threshold={threshold}):")
for name, dist, min_dist in all_distances[:3]:
print(f" {name}: weighted={dist:.3f}, best={min_dist:.3f}")
print(f" Confidence margin check: {best_distance:.3f} vs {second_best_distance:.3f}")
if best_distance < threshold:
# Check if match is confident enough (significantly better than second best)
if second_best_distance == float('inf') or (best_distance * (1 + confidence_margin) < second_best_distance):
confidence = calculate_confidence(best_distance, threshold)
# Lower confidence threshold to 30% (was 40%)
if confidence >= 30:
return best_match, best_distance, confidence
return "Unknown", best_distance, 0
def sort_faces_left_to_right(faces):
"""Sort faces by x-coordinate (left to right) for consistent ordering"""
return sorted(faces, key=lambda f: f['bbox'][0])
# -----------------------------
# VIDEO PROCESSING THREAD
# -----------------------------
def reconnect_stream(url, max_retries=3):
"""Attempt to reconnect to the stream"""
for attempt in range(max_retries):
try:
stream = urllib.request.urlopen(url, timeout=10)
return stream
except Exception as e:
if attempt < max_retries - 1:
time.sleep(2)
return None
def video_processing_thread():
global current_frame, current_faces, detected_faces_for_enrollment, running
global enrollment_in_progress, enrollment_name, enrollment_shots
global enrollment_countdown, enrollment_start_time
stream = reconnect_stream(VIDEO_URL)
if stream is None:
print("❌ Failed to connect to stream")
running = False
return
byte_buffer = b''
stream_errors = 0
max_stream_errors = 10
print("✅ Connected to ESP32-CAM stream")
while running:
# Check for actions from GUI
try:
action = action_queue.get_nowait()
if action['type'] == 'start_enrollment':
enrollment_in_progress = True
enrollment_name = action['name']
enrollment_shots = []
enrollment_countdown = COUNTDOWN_START
enrollment_start_time = time.time()
print(f"📸 Starting enrollment for '{enrollment_name}'...")
print(f"⏱️ {COUNTDOWN_START} second countdown, then {SHOTS_PER_PERSON} shots with {SHOT_DELAY}s between each")
elif action['type'] == 'delete':
delete_face(action['name'])
except queue.Empty:
pass
try:
chunk = stream.read(4096)
if not chunk:
raise Exception("Empty chunk")
byte_buffer += chunk
stream_errors = 0
except Exception as e:
stream_errors += 1
if stream_errors >= max_stream_errors:
print("❌ Too many stream errors")
break
stream = reconnect_stream(VIDEO_URL, max_retries=2)
if stream is None:
break
byte_buffer = b''
continue
a = byte_buffer.find(b'\xff\xd8')
b = byte_buffer.find(b'\xff\xd9')
if a != -1 and b != -1:
jpg = byte_buffer[a:b+2]
byte_buffer = byte_buffer[b+2:]
if len(jpg) < 100:
continue
try:
frame = cv2.imdecode(np.frombuffer(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
except:
continue
if frame is None or frame.size == 0:
continue
# Detect faces with adjusted parameters for 1m distance
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Reduced minSize to detect smaller faces at 1m distance
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))
# Store detected faces for enrollment
detected_faces_for_enrollment = []
face_results = []
# FIXED ENROLLMENT LOGIC
if enrollment_in_progress:
elapsed = time.time() - enrollment_start_time
# Countdown phase
if enrollment_countdown > 0:
countdown_remaining = COUNTDOWN_START - elapsed
if countdown_remaining <= 0:
enrollment_countdown = 0
enrollment_start_time = time.time() # Reset for shots
elapsed = 0
print("✅ Countdown complete! Starting capture...")
# Capture phase
elif len(enrollment_shots) < SHOTS_PER_PERSON:
if len(faces) == 1: # Only proceed if single face
# Check if it's time for next shot
shot_time = len(enrollment_shots) * SHOT_DELAY
if elapsed >= shot_time:
x, y, w, h = faces[0]
enrollment_shots.append((frame.copy(), x, y, w, h))
print(f" 📸 Captured shot {len(enrollment_shots)}/{SHOTS_PER_PERSON}")
# Check if we just completed all shots
if len(enrollment_shots) == SHOTS_PER_PERSON:
print(f"💾 All shots captured! Saving {len(enrollment_shots)} shots for '{enrollment_name}'...")
success = save_face_multi_shot(enrollment_name, enrollment_shots)
if success:
print(f"✅ Successfully enrolled '{enrollment_name}'")
else:
print(f"❌ Failed to enroll '{enrollment_name}'")
enrollment_in_progress = False
elif len(faces) > 1:
print(" ⚠️ Multiple faces detected - waiting for single face...")
else:
print(" ⚠️ No face detected - waiting...")
# Process each face for display
for (x, y, w, h) in faces:
detected_faces_for_enrollment.append((frame.copy(), x, y, w, h))
# Recognition (skip during enrollment)
if not enrollment_in_progress:
name, distance, confidence = recognize_face(frame, x, y, w, h)
else:
name, distance, confidence = "ENROLLING", 0, 0
face_results.append({
'name': name,
'distance': distance,
'confidence': confidence,
'bbox': (x, y, w, h),
'center_x': x + w//2
})
# Sort faces left to right
face_results = sort_faces_left_to_right(face_results)
# Apply temporal smoothing (skip during enrollment)
if not enrollment_in_progress:
face_results = face_tracker.update(face_results)
current_faces = face_results
# Draw faces on frame
for idx, face_info in enumerate(face_results):
x, y, w, h = face_info['bbox']
name = face_info['name']
confidence = face_info['confidence']
# ENROLLMENT VISUAL FEEDBACK
if enrollment_in_progress:
if len(faces) != 1:
# Warning: wrong number of faces
color = (0, 0, 255) # Red
label = "⚠️ SINGLE FACE ONLY!"
elif enrollment_countdown > 0:
# Countdown
elapsed = time.time() - enrollment_start_time
countdown_val = max(1, int(COUNTDOWN_START - elapsed + 1))
color = (0, 255, 255) # Yellow
label = f"GET READY... {countdown_val}"
else:
# Capturing
shots_taken = len(enrollment_shots)
elapsed = time.time() - enrollment_start_time
next_shot_in = max(0, (shots_taken + 1) * SHOT_DELAY - elapsed)
color = (0, 255, 0) # Green
label = f"📸 {shots_taken}/{SHOTS_PER_PERSON} - Next in {next_shot_in:.1f}s"
if next_shot_in < 0.5:
label = f"📸 CAPTURING {shots_taken+1}/{SHOTS_PER_PERSON}..."
thickness = 5
else:
# Normal recognition
if name != "Unknown":
if confidence >= 80:
color = (0, 255, 0) # Bright green
elif confidence >= 60:
color = (0, 200, 200) # Yellow-green
else:
color = (0, 165, 255) # Orange
else:
color = (0, 0, 255) # Red for unknown
thickness = 3
# Label
if len(face_results) > 1:
label = f"#{idx+1} {name}"
else:
label = name
if name != "Unknown":
label += f" ({confidence}%)"
# Draw box
cv2.rectangle(frame, (x, y), (x+w, y+h), color, thickness)
# Draw label background and text
label_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)[0]
cv2.rectangle(frame, (x, y-35), (x+label_size[0]+10, y), color, -1)
cv2.putText(frame, label, (x+5, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 2, cv2.LINE_AA)
current_frame = frame
try:
stream.close()
except:
pass
# -----------------------------
# GUI APPLICATION
# -----------------------------
class FaceRecognitionGUI:
def __init__(self, root):
self.root = root
self.root.title("UY Scuti Face Recognition System - Fixed Multi-Shot")
self.root.geometry("1000x700")
# Create main container
main_container = tk.Frame(root)
main_container.pack(fill="both", expand=True)
# Left side - Video feed
video_frame = tk.LabelFrame(main_container, text="📹 Live Video Feed", padx=5, pady=5)
video_frame.pack(side="left", padx=10, pady=10, fill="both", expand=True)
self.video_canvas = tk.Canvas(video_frame, width=640, height=480, bg="black")
self.video_canvas.pack()
# Right side - Controls
control_frame = tk.Frame(main_container, width=300)
control_frame.pack(side="right", padx=10, pady=10, fill="both")
# Title
title = tk.Label(control_frame, text="🎯 Control Panel", font=("Arial", 16, "bold"))
title.pack(pady=10)
# Status
self.status_var = tk.StringVar(value="🟢 System Running")
status_label = tk.Label(control_frame, textvariable=self.status_var, font=("Arial", 11))
status_label.pack(pady=5)
# Method indicator
method = "Dlib (High Accuracy)" if USE_DLIB else "OpenCV (Basic)"
method_label = tk.Label(control_frame, text=f"Method: {method}",
font=("Arial", 9), fg="blue")
method_label.pack(pady=2)
# Face count
self.face_count_var = tk.StringVar(value=self.get_face_count_text())
face_count = tk.Label(control_frame, textvariable=self.face_count_var, font=("Arial", 10))
face_count.pack(pady=5)
# Current detection
detection_frame = tk.LabelFrame(control_frame, text="Current Detection", padx=10, pady=10)
detection_frame.pack(padx=10, pady=10, fill="x")
self.detection_var = tk.StringVar(value="No face detected")
detection_label = tk.Label(detection_frame, textvariable=self.detection_var,
font=("Arial", 9), fg="blue", wraplength=250, justify="left")
detection_label.pack()
# Enrollment section
enroll_frame = tk.LabelFrame(control_frame, text=f"📝 Enroll New Person",
padx=10, pady=10)
enroll_frame.pack(padx=10, pady=10, fill="x")
tk.Label(enroll_frame, text="Name:").pack()
self.name_entry = tk.Entry(enroll_frame, font=("Arial", 11), width=22)
self.name_entry.pack(pady=5)
self.enroll_btn = tk.Button(enroll_frame, text=f"📸 Start {SHOTS_PER_PERSON}-Shot Capture",
command=self.start_multi_shot_enrollment,
bg="#4CAF50", fg="white", font=("Arial", 10, "bold"), pady=5)
self.enroll_btn.pack(pady=5)
self.enrollment_status_var = tk.StringVar(value="")
self.enrollment_status_label = tk.Label(enroll_frame, textvariable=self.enrollment_status_var,
font=("Arial", 12, "bold"), fg="orange",
wraplength=250, height=3)
self.enrollment_status_label.pack(pady=10)
# Instructions
instructions = tk.Label(enroll_frame,
text=f"💡 Tips:\n"
f"• Only ONE person in frame\n"
f"• {COUNTDOWN_START}s countdown first\n"
f"• Move head between shots\n"
f"• {SHOT_DELAY}s between captures",
font=("Arial", 8), fg="gray", justify="left")
instructions.pack(pady=5)
# Management section
manage_frame = tk.LabelFrame(control_frame, text="🔧 Manage Faces", padx=10, pady=10)
manage_frame.pack(padx=10, pady=10, fill="both", expand=True)
# Listbox with scrollbar
list_frame = tk.Frame(manage_frame)
list_frame.pack(fill="both", expand=True)
scrollbar = tk.Scrollbar(list_frame)
scrollbar.pack(side="right", fill="y")
self.face_listbox = tk.Listbox(list_frame, yscrollcommand=scrollbar.set,
font=("Arial", 10), height=8)
self.face_listbox.pack(side="left", fill="both", expand=True)
scrollbar.config(command=self.face_listbox.yview)
# Buttons
btn_frame = tk.Frame(manage_frame)
btn_frame.pack(pady=5)
delete_btn = tk.Button(btn_frame, text="🗑️ Delete", command=self.delete_face,
bg="#f44336", fg="white", font=("Arial", 9), width=10)
delete_btn.pack(side="left", padx=3)
refresh_btn = tk.Button(btn_frame, text="🔄 Refresh", command=self.refresh_list,
bg="#FF9800", fg="white", font=("Arial", 9), width=10)
refresh_btn.pack(side="left", padx=3)
# Start updates
self.refresh_list()
self.update_video()
self.update_status()
self.check_enrollment()
def get_face_count_text(self):
total_shots = sum(len(embeddings) for embeddings in known_faces.values())
return f"Saved: {len(known_faces)} people ({total_shots} shots)"
def update_video(self):
"""Update video canvas with current frame"""
global current_frame
if current_frame is not None:
frame_rgb = cv2.cvtColor(current_frame, cv2.COLOR_BGR2RGB)
h, w = frame_rgb.shape[:2]
target_w, target_h = 640, 480
scale = min(target_w/w, target_h/h)
new_w, new_h = int(w*scale), int(h*scale)
frame_resized = cv2.resize(frame_rgb, (new_w, new_h))
img = Image.fromarray(frame_resized)
imgtk = ImageTk.PhotoImage(image=img)
self.video_canvas.delete("all")
self.video_canvas.create_image(target_w//2, target_h//2, image=imgtk, anchor=tk.CENTER)
self.video_canvas.image = imgtk
if running:
self.root.after(30, self.update_video)
def update_status(self):
"""Update status information"""
if running:
self.face_count_var.set(self.get_face_count_text())
# Update current detection
if current_faces:
detection_text = ""
for idx, face_info in enumerate(current_faces[:3]):
name = face_info['name']
confidence = face_info['confidence']
if name == "ENROLLING":
detection_text = "🔴 ENROLLMENT IN PROGRESS"
break
elif name != "Unknown":
detection_text += f"#{idx+1}: ✅ {name} ({confidence}%)\n"
else:
detection_text += f"#{idx+1}: ❓ Unknown\n"
self.detection_var.set(detection_text.strip())
else:
self.detection_var.set("No face detected")
self.root.after(500, self.update_status)
else:
self.status_var.set("🔴 System Stopped")
def check_enrollment(self):
"""Check enrollment progress and update UI"""
global enrollment_in_progress, enrollment_shots, enrollment_countdown, enrollment_start_time
if enrollment_in_progress:
elapsed = time.time() - enrollment_start_time
# Countdown phase
if enrollment_countdown > 0:
countdown_val = max(1, int(COUNTDOWN_START - elapsed + 1))
self.enrollment_status_var.set(f"⏱️ Get ready... {countdown_val}")
self.enroll_btn.config(state="disabled")
# Capture phase
elif len(enrollment_shots) < SHOTS_PER_PERSON:
shots_taken = len(enrollment_shots)
next_shot_in = max(0, (shots_taken + 1) * SHOT_DELAY - elapsed)
self.enrollment_status_var.set(f"📸 Shot {shots_taken}/{SHOTS_PER_PERSON} - Next in {next_shot_in:.1f}s")
self.enroll_btn.config(state="disabled")
# Check if enrollment just completed
if not enrollment_in_progress and len(enrollment_shots) == SHOTS_PER_PERSON:
self.enrollment_status_var.set("✅ Enrollment complete!")
self.enroll_btn.config(state="normal")
self.name_entry.delete(0, tk.END)
self.refresh_list()
# Reset enrollment_shots to prevent repeated refresh
enrollment_shots.clear()
# Clear status after delay
self.root.after(3000, lambda: self.enrollment_status_var.set(""))
if running:
self.root.after(100, self.check_enrollment)
def start_multi_shot_enrollment(self):
"""Start multi-shot enrollment process"""
global enrollment_in_progress, enrollment_shots
name = self.name_entry.get().strip()
if not name:
messagebox.showwarning("No Name", "Please enter a name first!")
return
if not detected_faces_for_enrollment:
messagebox.showwarning("No Face", "No face detected! Position face in view.")
return
if len(detected_faces_for_enrollment) > 1:
messagebox.showwarning("Multiple Faces",
"Multiple faces detected!\nPlease ensure only ONE person is in frame.")
return
if name in known_faces:
if not messagebox.askyesno("Overwrite",
f"'{name}' already exists with {len(known_faces[name])} shots.\nOverwrite?"):
return
# Start enrollment
self.enroll_btn.config(state="disabled")
self.enrollment_status_var.set("Starting...")
action_queue.put({'type': 'start_enrollment', 'name': name})
messagebox.showinfo("Multi-Shot Capture",
f"Starting {SHOTS_PER_PERSON}-shot capture!\n\n"
f"📋 Process:\n"
f"1. {COUNTDOWN_START} second countdown\n"
f"2. {SHOTS_PER_PERSON} shots with {SHOT_DELAY}s between each\n"
f"3. Move your head slightly between shots\n\n"
f"💡 Stay in frame and keep only ONE person visible!")
def delete_face(self):
"""Delete selected face"""
selection = self.face_listbox.curselection()
if not selection:
messagebox.showwarning("No Selection", "Please select a face to delete!")
return
name = self.face_listbox.get(selection[0]).split(" (")[0] # Remove shot count
if messagebox.askyesno("Confirm Delete", f"Delete '{name}' and all saved images?"):
action_queue.put({'type': 'delete', 'name': name})
self.root.after(500, self.refresh_list)
messagebox.showinfo("Success", f"✅ Deleted '{name}' and all images!")
def refresh_list(self):
"""Refresh the face list"""
self.face_listbox.delete(0, tk.END)
for name in sorted(known_faces.keys()):
shot_count = len(known_faces[name])
self.face_listbox.insert(tk.END, f"{name} ({shot_count} shots)")
# -----------------------------
# MAIN
# -----------------------------
if __name__ == "__main__":
print("🚀 Starting UY Scuti Face Recognition System - Fixed Multi-Shot")
print(f"⚙️ Recognition method: {'Dlib' if USE_DLIB else 'OpenCV'}")
print(f"⚙️ Shots per person: {SHOTS_PER_PERSON}")
print(f"⚙️ Shot delay: {SHOT_DELAY}s")
print(f"⚙️ Countdown: {COUNTDOWN_START}s")
print(f"⚙️ Confidence threshold: {CONFIDENCE_THRESHOLD}%")
print(f"⚙️ Recognition threshold: {RECOGNITION_THRESHOLD_DLIB if USE_DLIB else RECOGNITION_THRESHOLD_OPENCV}")
# Start video processing in separate thread
video_thread = threading.Thread(target=video_processing_thread, daemon=True)
video_thread.start()
# Start GUI
root = tk.Tk()
app = FaceRecognitionGUI(root)
try:
root.mainloop()
except KeyboardInterrupt:
pass
running = False
video_thread.join(timeout=2)
print("✅ Program closed cleanly")As you can see, this program has almost a 1,000 lines of code in it, and this is where conversations with Claud can become invaluable when drafting, troubleshooting and fine tuning it for your particular purpose.
Before you run this code, you will need to first install in your Python project folder several dependencies, including OpenCV, NumPy, Pillow and dlib. Some of these library installations, especially dlib, involve complexities that require trial and error experimentation on your particular server environment, so once again I recommend you get Claude's or ChatGPT's help to get these libraries setup.
The final outcome would be a video stream and a surrounding desktop control interface, tuned to detect, enroll and identify multiple faces from a standard distance of around 1 meter. I leave it to your imagination to decide what to do next... enjoy!



