PyCCTV: A CCTV camera application with person detection and remote monitoring over Wi-Fi

The entire source code is available on github.

In this post, I’ll go through how to create a simple CCTV camera application with remote monitoring capabilities.

Dependencies

Please ensure the following libraries are available.

import os
import cv2
import keras
import shutil
import argparse
import numpy as np
from PIL import Image
from flask import Flask, Response
from multiprocessing import Process, Value

‘Person’ detection using Yolo V3 in Keras.

Let us look at how to easily detect a person in an image using Yolo V3 in Keras.

Note that Yolo V3 can also localize the object in the image for us, but we’re not interested in it. All we are interested in is knowing whether a person is present in the image or not. We are not concerned with where in the image the person is present.

The following Python class is all we need.

class YoloV3:
    """
    A Yolo V3 "person" detection implementation in Keras.

    Read more about the Yolo V3 model and output interpretation here:
    https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b
    https://www.kdnuggets.com/2018/05/implement-yolo-v3-object-detector-pytorch-part-1.html
    """

    def __init__(self, model):
        """
        Load the Yolo V3 model from disk.
        """
        self.yolo_v3 = keras.models.load_model(model, compile=False)
        self.input_size = (416, 416)
        self.threshold = 0.9


    def _resize_image(self, image):
        """
        Resize image to self.input_size
        """
        iw, ih = image.size
        w, h = self.input_size
        scale = min(w/iw, h/ih)
        nw = int(iw * scale)
        nh = int(ih * scale)
        image = image.resize((nw, nh), Image.BICUBIC)
        new_image = Image.new('RGB', self.input_size, (128, 128, 128))
        new_image.paste(image, ((w - nw)//2, (h - nh)//2))
        return new_image


    def _predict(self, image):
        """
        Make a prediction for the given image using Yolo V3.
        """

        # Resize and normalize the image.
        image = self._resize_image(image)
        image_data = np.array(image, dtype='float32')
        # Shape of image_data is now (416, 416, 3).
        image_data /= 255.
        image_data = np.expand_dims(image_data, axis=0)
        # Shape of image_data is now (1, 416, 416, 3).
        
        # Return the Yolo V3 prediction for the given image.
        return self.yolo_v3.predict(image_data)


    @staticmethod
    def _sigmoid(x):
        return 1 / (1 + np.exp(-x))


    def contains_person(self, image):
        """
        Check whether the given image contains a person using Yolo V3.
        """

        # Get the Yolo V3 predictions for the given image.
        predictions = self._predict(image)
        
        # Check whether a person is detected in the image with high
        # confidence in any of the three predictions made by Yolo V3 at
        # different scales.
        #
        # Yolo V3 makes predictions at 3 different scales. The predictions have
        # shape - (1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255).
        # The number 255 comes from 3 * 85 i.e, 3 anchor boxes and 85 values per
        # anchor box which consist of 4 box coordinates + 1 object confidence +
        # 80 class confidences (4 + 1 + 80 = 85) in order.
        #
        # "person" is the first of the 80 classes (COCO dataset).
        obj_conf_pos = [4, 89, 174]
        person_cls_pos = [5, 90, 175]

        for pred in predictions:
            x, y = pred.shape[1:3]
            pred = pred[0]
            
            for i in range(x):
                for j in range(y):
                    for (obj, person) in zip(obj_conf_pos, person_cls_pos):
                        if self._sigmoid(pred[i, j, obj]) > self.threshold and \
                           self._sigmoid(pred[i, j, person]) > self.threshold:
                            return True

        return False

Let us investigate some of the methods in greater detail.

The __init__ method loads the pretrained Keras Yolo V3 model from disk. This particular model was trained on the COCO dataset containing 80 classes of which ‘person’ is one of the classes.

The _resize_image method is responsible for resizing the image to a dimension suitable for feeding into the Yolo V3 model.

The _predict method does exactly that. It runs a forward pass through the Yolo V3 model and returns the predictions made by the model on the given image.

To read more about the structure of the Yolo V3 model and how to interpret the predictions returned by the Yolo V3 model, there are excellent resources here and here.

The contains_person method checks whether there is a “person” class detected anywhere in the given image from the prediction that the Yolo V3 model returned for that image.

The Yolo V3 model returns 3 different predictions at different scales. Each of these are responsible for predicting objects of small, medium or big size in the image. The shapes of the returned predictions are - (1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255). The second and third shape indices refer to the grids. The last index contains the prediction for that grid. There are 3 anchor boxes and 85 predicted values per anchor box for a total of 85 * 3 = 255 predicted values per grid cell.

What are the 85 predicted values per anchor box, you ask ? The first four are the box coordinates and dimensions (which we don’t need in our application). The fifth is the object confidence (confidence that an object has been detected in that grid cell). The last 80 are the class confidences for the 80 classes of the COCO dataset (“person” is the first class).

Note that unlike previous versions of Yolo, there is no softmax over the class confidences. In fact, the object confidences and class confidences are not in the range 0-1 in Yolo V3. The values will have to be passed through the _sigmoid method which will then return values in the range 0-1.

Getting images from webcam and remote monitoring using a Flask web server

Let us build a class to perform the following 2 tasks:

Continuously capture frames from the webcam. Check whether a person is present in the image. If yes, save the image to disk for future analysis.
Create a web server that will allow us to remotely monitor the webcam for a “person” in its field of view over Wi-Fi.

class PyCCTV:
    """
    A CCTV camera application with "person" detection and
    remote monitoring over Wi-Fi.
    """
    
    def __init__(self, model, output):
        # Cleanup the output directory.
        shutil.rmtree(output, ignore_errors=True)
        if not os.path.exists(output):
            os.makedirs(output)

        self.model = model
        self.output = output


    def _web_server(self, output, image_num):
        """
        Flask web server for remote monitoring of webcam over Wi-Fi.
        """
        app = Flask("PyCCTV")

        @app.route('/')
        def index():
            return "Welcome to PyCCTV!"

        def read_image_from_disk():
            disk_image_name = "image_%05d.jpg" % (image_num.value - 1,)
            disk_image_path = os.path.join(output, disk_image_name)
            if os.path.exists(disk_image_path):
                im = cv2.imread(disk_image_path)
                return cv2.imencode('.jpg', im)[1].tobytes()

        @app.route('/image.jpg')
        def generate_response():
            return Response(read_image_from_disk(), mimetype='image/jpeg')

        app.run(host='0.0.0.0')


    def _webcam(self, model, output, image_num):
        """
        Continuously capture frames from the webcam and detect the presence
        of a person in the frame using Yolo V3.
        """        
        yolo = YoloV3(model)

        while True:
            cam = cv2.VideoCapture(0)  # ls /sys/class/video4linux

            # Hardware defaults for Lenovo T440s
            cam.set(3, 1280)           # Width
            cam.set(4, 720)            # Height
            cam.set(10, 128/255)       # Brightness (max = 255)
            cam.set(11, 32/255)        # Contrast (max = 255)
            cam.set(12, 64/100)        # Saturation (max = 100)
            cam.set(13, 0.5)           # Hue (0 = -180, 1 = +180)

            # Read a frame from the webcam.
            ret, image = cam.read()
            cam.release()

            if not ret:
                raise Exception('Camera module not operational')

            # Convert from cv2 to PIL image.
            cv2_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            pil_image = Image.fromarray(cv2_image)

            # If the frame contains a person, save it to disk.
            if yolo.contains_person(pil_image):
                disk_image_name = "image_%05d.jpg" % (image_num.value,)
                disk_image_path = os.path.join(output, disk_image_name)
                cv2.imwrite(disk_image_path, image)
                image_num.value += 1


    def run(self):
        """
        Run the PyCCTV application.
        """

        # Shared variable to keep track of the most recent image.
        image_num = Value('d', 1)


        # Create two processes.
        # 1. webcam     - Continuously capture frames from webcam and check for
        #                 the presence of a person in the frame.
        # 2. web_server - A Flask web server for remote monitoring over Wi-Fi.
        processes = []

        processes.append(Process(target=self._webcam,
                                 args=(self.model, self.output, image_num)))
        processes.append(Process(target=self._web_server,
                                 args=(self.output, image_num)))

        for p in processes:
            p.daemon = True
            p.start()

        # Gracefully handle Ctrl-C.
        try:
            for p in processes:
                p.join()
        except KeyboardInterrupt:
            for p in processes:
                p.terminate()

Let us walk through the methods in greater detail.

The __init__ method prepares the output directory to store frames captured from the webcam that contain a “person” in them. Since we store, only images that contain a person in them, we are saving a lot of space.

The run method is responsible for launching to sub-processes - one process that continuously captures frames from the webcam and checks to see if a person is in the frame and another process that implements the web server using Flask.

The _webcam method is the one which continuously captures frames from the webcam and uses the Yolo V3 model to predict whether a person is in the frame. If yes, the frame is stored to disk at the user specified location.

The _web_server method implements the Flask web-server for real-time remote monitoring of the webcam over home Wi-Fi. All it does is read the most recent image (containing a person) from the disk and serves it as a jpeg image over the web which can be viewed remotely over any mobile device.

Now, all that is needed to run our CCTV application is to instantiate an object of the PyCCTV class and call the run() method on that object.

if __name__ == "__main__":

    # Argument parser.
    ap = argparse.ArgumentParser()
    ap.add_argument("-m", "--model", required=True,
                    help="path to yolo v3 model")
    ap.add_argument("-o", "--output", required=True,
                    help="path to output image directory")
    args = vars(ap.parse_args())


    # Start the PyCCTV application.
    cctv = PyCCTV(args['model'], args['output'])
    cctv.run()

Running the CCTV application

Download py_cctv.py from the github project page.
Download the pretrained Keras Yolo V3 model from here.

Run the following command.

 $ py_cctv.py --model <path to yolo.h5> --output <path to output image directory>

Remote monitoring over Wi-Fi

Note down your IP address (192.168.1.8 in my case).

 varun@lenovo:~$ ifconfig
 wlp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
     inet 192.168.1.8  netmask 255.255.255.0  broadcast 192.168.1.255
     ...

Direct your web browser at 192.168.1.8:5000/image.jpg.