I was discussing with a friend a pet door that would only let out certain animals. We had a few basic ideas (RFID, magnets, etc), but a big issue with a dumb sensor is the wrong animal sneaking out when the door opens.

Maybe a camera would work? Sounds like a good use case for machine learning. This project distinguishes between three pets.

Cody — Corgi/Monster mix Malloc —Nimble hunter Strcat — Loves strings, overflows buffers

This is my first attempt applying machine learning to a real problem beyond coursework and MNIST/CIFAR- if you stumble on this and can recommend better ways to do things, please reach out!

Collecting Data

I hooked a webcam to a Raspberry Pi and mounted it with a view of the backdoor. Every minute it take a picture and uploads it to Amazon S3.

I wrote some tools to quickly classify training images by hand and output a CSV.

The Software

The software is written with Tensorflow (with TFLearn), and Scikit.

It seemed like I could run into a problem of a network that just always predicts an empty image. Since the scene is usually empty, predicting that the scene is empty would actually be a pretty low cost network. To mitigate this, I decided to break the problem into three parts:

  1. Determine Night/Day
  2. Determine if Anything is Happening
  3. Determine the Specific Animal

Determining Night/Day

This uses logistic regression on the average brightness of the image.

Logistic regression handled this with ease.
from sklearn import linear_model
Y = np.array(Y) # array of training answers, 1 or 0
X = np.array([]) # array of image brightness
for filename in filenames:
    # load the images
    image = misc.imread(constants.IMAGE_64_PATH + '/' + filename, mode='L')
    avg_brightness = np.matrix(image).mean()
    X = np.append(X, avg_brightness)
X = np.array([X]).transpose()
clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X, Y)

Determining if Anything is Happening

The next step identifies if anyone is present in the image, or if it’s just an empty scene.

After a few attempts, I ultimately realized that this is a fixed position camera on a pretty limited scene. The scene can change a lot between night and day, but animals will always be in the same place- table, windowsill, floor. They won’t just be floating midair.

Well, usually cats aren't floating midair.

Given this, a fully-connected neural network worked fine. It doesn’t need to know what a cat looks like- just that a pixel may correspond to activity.

import tflearn
from tflearn.data_preprocessing import ImagePreprocessing
from tflearn.data_augmentation import ImageAugmentation
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_1d, max_pool_1d
from tflearn.layers.estimator import regression
img_prep = ImagePreprocessing()
img_aug = ImageAugmentation()
# Specify shape of the data, image prep
network = input_data(shape=[None, 52, 64],
# Since the image position remains consistent and are fairly similar, this can be spatially aware.
# Using a fully connected network directly, no need for convolution.
network = fully_connected(network, 2048, activation='relu')
network = fully_connected(network, 2, activation='softmax')
network = regression(network, optimizer='adam',
model = tflearn.DNN(network, tensorboard_verbose=0)

To handle issues of lighting and shadows changing throughout the day, I created an average image, which is the average of all daytime images. I subtracted this image from all training images, like zeroing a scale before weighing something. It’s not perfect, but it brings out a little more contrast in the photos. Notice the cat on the table becomes more visible when the average (center) is subtracted.

Original image with a cat Average of all daytime scenes Original image with cat subtracted

Determining Specific Animals

With the prep work out of the way, the final challenge is to identify the specific animal in the scene. This uses a convolutional neural network, trained only with image where animals are present.

Some results, with probabilities:

Strcat: 74% Malloc 22% Cody 2% Malloc: 92% Strcat 7% Cody 0% Cody: 78% Strcat: 19% Malloc: 1%
import tflearn
from tflearn.data_preprocessing import ImagePreprocessing
from tflearn.data_augmentation import ImageAugmentation
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_1d, max_pool_1d
from tflearn.layers.estimator import regression
img_prep = ImagePreprocessing()
img_aug = ImageAugmentation()
# Specify shape of the data, image prep
network = input_data(shape=[None, 52, 64],
# conv_2d incoming, nb_filter, filter_size
# incoming: Tensor. Incoming 4-D Tensor.
# nb_filter: int. The number of convolutional filters. 
# filter_size: 'intor list ofints`. Size of filters.   
network = conv_1d(network, 512, 3, activation='relu')
# (incoming, kernel_size)
# incoming: Tensor. Incoming 4-D Layer.
# kernel_size: 'intor list ofints`. Pooling kernel size.
network = max_pool_1d(network, 2)
network = conv_1d(network, 64, 3, activation='relu')
network = conv_1d(network, 64, 3, activation='relu')
network = max_pool_1d(network, 2)
network = fully_connected(network, 512, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 4, activation='softmax')
network = regression(network, optimizer='adam',
model = tflearn.DNN(network, tensorboard_verbose=0)


It worked! We can correctly identify animals. I don’t have a full understanding, but it’s satisfying to see it work with a real problem. That said, I wouldn’t feel comfortable hooking it up to a pet door just yet.