PyCCTV: A CCTV camera application with person detection and remote monitoring over Wi-Fi
The entire source code is available on github.
In this post, I’ll go through how to create a simple CCTV camera application with remote monitoring capabilities.
Dependencies
Please ensure the following libraries are available.
import os
import cv2
import keras
import shutil
import argparse
import numpy as np
from PIL import Image
from flask import Flask, Response
from multiprocessing import Process, Value
‘Person’ detection using Yolo V3 in Keras.
Let us look at how to easily detect a person in an image using Yolo V3 in Keras.
Note that Yolo V3 can also localize the object in the image for us, but we’re not interested in it. All we are interested in is knowing whether a person is present in the image or not. We are not concerned with where in the image the person is present.
The following Python class is all we need.
class YoloV3:
"""
A Yolo V3 "person" detection implementation in Keras.
Read more about the Yolo V3 model and output interpretation here:
https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b
https://www.kdnuggets.com/2018/05/implement-yolo-v3-object-detector-pytorch-part-1.html
"""
def __init__(self, model):
"""
Load the Yolo V3 model from disk.
"""
self.yolo_v3 = keras.models.load_model(model, compile=False)
self.input_size = (416, 416)
self.threshold = 0.9
def _resize_image(self, image):
"""
Resize image to self.input_size
"""
iw, ih = image.size
w, h = self.input_size
scale = min(w/iw, h/ih)
nw = int(iw * scale)
nh = int(ih * scale)
image = image.resize((nw, nh), Image.BICUBIC)
new_image = Image.new('RGB', self.input_size, (128, 128, 128))
new_image.paste(image, ((w - nw)//2, (h - nh)//2))
return new_image
def _predict(self, image):
"""
Make a prediction for the given image using Yolo V3.
"""
# Resize and normalize the image.
image = self._resize_image(image)
image_data = np.array(image, dtype='float32')
# Shape of image_data is now (416, 416, 3).
image_data /= 255.
image_data = np.expand_dims(image_data, axis=0)
# Shape of image_data is now (1, 416, 416, 3).
# Return the Yolo V3 prediction for the given image.
return self.yolo_v3.predict(image_data)
@staticmethod
def _sigmoid(x):
return 1 / (1 + np.exp(-x))
def contains_person(self, image):
"""
Check whether the given image contains a person using Yolo V3.
"""
# Get the Yolo V3 predictions for the given image.
predictions = self._predict(image)
# Check whether a person is detected in the image with high
# confidence in any of the three predictions made by Yolo V3 at
# different scales.
#
# Yolo V3 makes predictions at 3 different scales. The predictions have
# shape - (1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255).
# The number 255 comes from 3 * 85 i.e, 3 anchor boxes and 85 values per
# anchor box which consist of 4 box coordinates + 1 object confidence +
# 80 class confidences (4 + 1 + 80 = 85) in order.
#
# "person" is the first of the 80 classes (COCO dataset).
obj_conf_pos = [4, 89, 174]
person_cls_pos = [5, 90, 175]
for pred in predictions:
x, y = pred.shape[1:3]
pred = pred[0]
for i in range(x):
for j in range(y):
for (obj, person) in zip(obj_conf_pos, person_cls_pos):
if self._sigmoid(pred[i, j, obj]) > self.threshold and \
self._sigmoid(pred[i, j, person]) > self.threshold:
return True
return False
Let us investigate some of the methods in greater detail.
The __init__
method loads the pretrained Keras Yolo V3 model from disk.
This particular model was trained on the COCO dataset
containing 80 classes of which ‘person’ is one of the classes.
The _resize_image
method is responsible for resizing the image to a dimension
suitable for feeding into the Yolo V3 model.
The _predict
method does exactly that. It runs a forward pass through the Yolo V3
model and returns the predictions made by the model on the given image.
To read more about the structure of the Yolo V3 model and how to interpret the predictions returned by the Yolo V3 model, there are excellent resources here and here.
The contains_person
method checks whether there is a “person” class detected anywhere
in the given image from the prediction that the Yolo V3 model returned for that image.
The Yolo V3 model returns 3 different predictions at different scales. Each of these
are responsible for predicting objects of small, medium or big size in the image.
The shapes of the returned predictions are - (1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255).
The second and third shape indices refer to the grids. The last index contains the
prediction for that grid. There are 3 anchor boxes
and 85 predicted values per anchor box
for a total of 85 * 3 = 255
predicted values per grid cell.
What are the 85 predicted values per anchor box, you ask ? The first four are the box
coordinates and dimensions
(which we don’t need in our application). The fifth is the
object confidence
(confidence that an object has been detected in that grid cell). The
last 80 are the class confidences
for the 80 classes of the COCO dataset (“person” is
the first class).
Note that unlike previous versions of Yolo, there is no softmax over the class confidences.
In fact, the object confidences and class confidences are not in the range 0-1 in Yolo V3.
The values will have to be passed through the _sigmoid
method which will then return values
in the range 0-1.
Getting images from webcam and remote monitoring using a Flask web server
Let us build a class to perform the following 2 tasks:
-
Continuously capture frames from the webcam. Check whether a person is present in the image. If yes, save the image to disk for future analysis.
-
Create a web server that will allow us to remotely monitor the webcam for a “person” in its field of view over Wi-Fi.
class PyCCTV:
"""
A CCTV camera application with "person" detection and
remote monitoring over Wi-Fi.
"""
def __init__(self, model, output):
# Cleanup the output directory.
shutil.rmtree(output, ignore_errors=True)
if not os.path.exists(output):
os.makedirs(output)
self.model = model
self.output = output
def _web_server(self, output, image_num):
"""
Flask web server for remote monitoring of webcam over Wi-Fi.
"""
app = Flask("PyCCTV")
@app.route('/')
def index():
return "Welcome to PyCCTV!"
def read_image_from_disk():
disk_image_name = "image_%05d.jpg" % (image_num.value - 1,)
disk_image_path = os.path.join(output, disk_image_name)
if os.path.exists(disk_image_path):
im = cv2.imread(disk_image_path)
return cv2.imencode('.jpg', im)[1].tobytes()
@app.route('/image.jpg')
def generate_response():
return Response(read_image_from_disk(), mimetype='image/jpeg')
app.run(host='0.0.0.0')
def _webcam(self, model, output, image_num):
"""
Continuously capture frames from the webcam and detect the presence
of a person in the frame using Yolo V3.
"""
yolo = YoloV3(model)
while True:
cam = cv2.VideoCapture(0) # ls /sys/class/video4linux
# Hardware defaults for Lenovo T440s
cam.set(3, 1280) # Width
cam.set(4, 720) # Height
cam.set(10, 128/255) # Brightness (max = 255)
cam.set(11, 32/255) # Contrast (max = 255)
cam.set(12, 64/100) # Saturation (max = 100)
cam.set(13, 0.5) # Hue (0 = -180, 1 = +180)
# Read a frame from the webcam.
ret, image = cam.read()
cam.release()
if not ret:
raise Exception('Camera module not operational')
# Convert from cv2 to PIL image.
cv2_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
pil_image = Image.fromarray(cv2_image)
# If the frame contains a person, save it to disk.
if yolo.contains_person(pil_image):
disk_image_name = "image_%05d.jpg" % (image_num.value,)
disk_image_path = os.path.join(output, disk_image_name)
cv2.imwrite(disk_image_path, image)
image_num.value += 1
def run(self):
"""
Run the PyCCTV application.
"""
# Shared variable to keep track of the most recent image.
image_num = Value('d', 1)
# Create two processes.
# 1. webcam - Continuously capture frames from webcam and check for
# the presence of a person in the frame.
# 2. web_server - A Flask web server for remote monitoring over Wi-Fi.
processes = []
processes.append(Process(target=self._webcam,
args=(self.model, self.output, image_num)))
processes.append(Process(target=self._web_server,
args=(self.output, image_num)))
for p in processes:
p.daemon = True
p.start()
# Gracefully handle Ctrl-C.
try:
for p in processes:
p.join()
except KeyboardInterrupt:
for p in processes:
p.terminate()
Let us walk through the methods in greater detail.
The __init__
method prepares the output directory to store frames captured
from the webcam that contain a “person” in them. Since we store, only images
that contain a person in them, we are saving a lot of space.
The run
method is responsible for launching to sub-processes - one process that
continuously captures frames from the webcam and checks to see if a person is in the
frame
and another process that implements the web server using Flask
.
The _webcam
method is the one which continuously captures frames from the webcam
and uses the Yolo V3 model to predict whether a person is in the frame. If yes,
the frame is stored to disk at the user specified location.
The _web_server
method implements the Flask web-server for real-time
remote monitoring of the webcam over home Wi-Fi. All it does is read the most
recent image (containing a person) from the disk and serves it as a jpeg image
over the web which can be viewed remotely over any mobile device
.
Now, all that is needed to run our CCTV application is to instantiate an object of the PyCCTV class and call the run() method on that object.
if __name__ == "__main__":
# Argument parser.
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
help="path to yolo v3 model")
ap.add_argument("-o", "--output", required=True,
help="path to output image directory")
args = vars(ap.parse_args())
# Start the PyCCTV application.
cctv = PyCCTV(args['model'], args['output'])
cctv.run()
Running the CCTV application
-
Download py_cctv.py from the github project page.
-
Download the pretrained Keras Yolo V3 model from here.
-
Run the following command.
$ py_cctv.py --model <path to yolo.h5> --output <path to output image directory>
Remote monitoring over Wi-Fi
-
Note down your IP address (
192.168.1.8
in my case).varun@lenovo:~$ ifconfig wlp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.8 netmask 255.255.255.0 broadcast 192.168.1.255 ...
-
Direct your web browser at
192.168.1.8:5000/image.jpg
.