I was discussing with a friend a pet door that would only let out certain animals. We had a few basic ideas (RFID, magnets, etc), but a big issue with a dumb sensor is the wrong animal sneaking out when the door opens.
Maybe a camera would work? Sounds like a good use case for machine learning. This project distinguishes between three pets.
Cody — Corgi/Monster mix | Malloc —Nimble hunter | Strcat — Loves strings, overflows buffers |
This is my first attempt applying machine learning to a real problem beyond coursework and MNIST/CIFAR- if you stumble on this and can recommend better ways to do things, please reach out!
Collecting Data
I hooked a webcam to a Raspberry Pi and mounted it with a view of the backdoor. Every minute it take a picture and uploads it to Amazon S3.
I wrote some tools to quickly classify training images by hand and output a CSV.
The Software
The software is written with Tensorflow (with TFLearn), and Scikit.
It seemed like I could run into a problem of a network that just always predicts an empty image. Since the scene is usually empty, predicting that the scene is empty would actually be a pretty low cost network. To mitigate this, I decided to break the problem into three parts:
- Determine Night/Day
- Determine if Anything is Happening
- Determine the Specific Animal
Determining Night/Day
This uses logistic regression on the average brightness of the image.
Logistic regression handled this with ease.
from sklearn import linear_model
Y = np.array(Y) # array of training answers, 1 or 0
X = np.array([]) # array of image brightness
for filename in filenames:
# load the images
image = misc.imread(constants.IMAGE_64_PATH + '/' + filename, mode='L')
avg_brightness = np.matrix(image).mean()
X = np.append(X, avg_brightness)
X = np.array([X]).transpose()
clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X, Y)
Determining if Anything is Happening
The next step identifies if anyone is present in the image, or if it’s just an empty scene.
After a few attempts, I ultimately realized that this is a fixed position camera on a pretty limited scene. The scene can change a lot between night and day, but animals will always be in the same place- table, windowsill, floor. They won’t just be floating midair.
Given this, a fully-connected neural network worked fine. It doesn’t need to know what a cat looks like- just that a pixel may correspond to activity.
import tflearn
from tflearn.data_preprocessing import ImagePreprocessing
from tflearn.data_augmentation import ImageAugmentation
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_1d, max_pool_1d
from tflearn.layers.estimator import regression
img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()
img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
# Specify shape of the data, image prep
network = input_data(shape=[None, 52, 64],
data_preprocessing=img_prep,
data_augmentation=img_aug)
# Since the image position remains consistent and are fairly similar, this can be spatially aware.
# Using a fully connected network directly, no need for convolution.
network = fully_connected(network, 2048, activation='relu')
network = fully_connected(network, 2, activation='softmax')
network = regression(network, optimizer='adam',
loss='categorical_crossentropy',
learning_rate=0.00003)
model = tflearn.DNN(network, tensorboard_verbose=0)
To handle issues of lighting and shadows changing throughout the day, I created an average image, which is the average of all daytime images. I subtracted this image from all training images, like zeroing a scale before weighing something. It’s not perfect, but it brings out a little more contrast in the photos. Notice the cat on the table becomes more visible when the average (center) is subtracted.
Original image with a cat | Average of all daytime scenes | Original image with cat subtracted |
With the prep work out of the way, the final challenge is to identify the specific animal in the scene. This uses a convolutional neural network, trained only with image where animals are present.
Some results, with probabilities:
Strcat: 74% Malloc 22% Cody 2% | Malloc: 92% Strcat 7% Cody 0% | Cody: 78% Strcat: 19% Malloc: 1% |
import tflearn
from tflearn.data_preprocessing import ImagePreprocessing
from tflearn.data_augmentation import ImageAugmentation
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_1d, max_pool_1d
from tflearn.layers.estimator import regression
img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()
img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
# Specify shape of the data, image prep
network = input_data(shape=[None, 52, 64],
data_preprocessing=img_prep,
data_augmentation=img_aug)
# conv_2d incoming, nb_filter, filter_size
# incoming: Tensor. Incoming 4-D Tensor.
# nb_filter: int. The number of convolutional filters.
# filter_size: 'intor list ofints`. Size of filters.
network = conv_1d(network, 512, 3, activation='relu')
# (incoming, kernel_size)
# incoming: Tensor. Incoming 4-D Layer.
# kernel_size: 'intor list ofints`. Pooling kernel size.
network = max_pool_1d(network, 2)
network = conv_1d(network, 64, 3, activation='relu')
network = conv_1d(network, 64, 3, activation='relu')
network = max_pool_1d(network, 2)
network = fully_connected(network, 512, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 4, activation='softmax')
network = regression(network, optimizer='adam',
loss='categorical_crossentropy',
learning_rate=0.0003)
model = tflearn.DNN(network, tensorboard_verbose=0)
Conclusion
It worked! We can correctly identify animals. I don’t have a full understanding, but it’s satisfying to see it work with a real problem. That said, I wouldn’t feel comfortable hooking it up to a pet door just yet.