A cat prediction model using past behaviour and weather observations

A black cat facing an outstretched pointing finger
Photo by Humberto Arellano on Unsplash

With months of historic location and temperature data captured, this blog covers how to train a machine learning (ML) model to predict where my cat would go throughout her day.

For the impatient, you can skip directly to the prediction web-app here.

Does past cat behaviour predict future sleeping location?

With some inexpensive hardware (and a cat ambivalent to data privacy concerns) I wanted to see if I could train a machine learning (ML) model to predict where Snowy the cat would go throughout her day.

Home based location and temperature tracking allowed me to build up an extensive history of which room she used for her favourite sleeping spots. I had a theory that with sufficient data collected from her past I’d be able to train an ML model to predict where she was likely to go in the future.

Cat location prediction using Streamlit web apps (image by author)

Hardware for room level cat tracking

The first task was to collect a lot of data on where Snowy historically spent her time — along with environmental factors such as temperature and rainfall. I set an arbitrary target of collecting hourly updates for three months of movement in and around the house.

Cat locating with room beacons (image by author)

Cat’s aren’t great at data entry; so I needed an automated way of collecting her location. I asked Snowy to wear a Tile — a small, battery powered bluetooth tracker. This device on her collar simply transmits a regular and unique Bluetooth Low Energy (BLE) signal. I then used eight stationary receivers to listen for this BLE signal. These receiver nodes were ESP32 based presence detection nodes, placed in named rooms in and around the house (6 inside, 2 outside).

Each receiving node is constantly looking for the unique BLE signal of Snowy’s tile and measuring the received signal strength indicator (RSSI). The stronger the signal, the closer Snowy is to that beacon (either that, or she’s messing with the battery). If I got a few seconds of strong signal next to the study sensor for example, I could assume Snowy was likely very close to that room.

Ultimately I collected three months of location observations, with over 12 million location, temperature, humidity and rainfall observations (I may have gone over the top with data collection).

The question I’ve been trying to answer, can I use these historic observations to build a prediction model of where she is likely to go? How confident can I be using a machine to predict where a cat is likely to be at predicting the hiding spot for Snowy?

ML Bootcamp

Supervised learning is the ML task of creating a function that maps an input to an output based on example input-output pairs. In my case, I want to take historic observations about cat location, temperature, time of day etc., as inputs and find patterns … a function (inference) that predicts future cat location.

Temperature, time and day — can it map to location? (image by author)

My assumption is the problem can be generalised from this data; e.g. future data will follow some common pattern of past cat behaviour (for a cat — this assumption may be questionable) .

Cat location prediction (image by author)

The training uses past information to build a model that is a deployable artefact. Once a candidate model is trained, it can be tested for predication accuracy and finally deployed. In my case, I wish to create a web application to make predictions on where Snowy is likely to be napping.

What’s also important is that the model doesn’t have to explicitly output an absolute location, but can give its answer in terms of a confidence. If it output P(location:study) near 1.0 it’s confident, but values near 0.5 represent “unsure” about the confidence of predicting Snowy’s location.

Summarising data with dbt

My data platform Home Assistant stores each sensor update in the states table. This is really fine-grained, with updates added every few seconds from all the sensors (in my case, around 18,000 sensor updates a day). To simplify model training, I wanted to summarise the data into hourly updates — essentially a single (most prevalent) location, along with temperature and humidity readings.

Summarising lots of data into hourly summaries (image by author)

Initially I was manually running the data processing with a bunch of SQL statements (like this) to process the data. However, I found this fairly cumbersome as I wanted to retrain the model with newer location and environmental conditions. I settled on using the trusty data engineering tool dbt to simplify the creation of the SQL transformation in my database to make retraining more effective.

The dbt lineage graph showing the transformation of data (image by author)

dbt handles turning these my select statements into tables and views, performing the transforming data already inside of my postgres data warehouse.

Model training and evaluation

Once I had cleaned my data, I used a Scikit-learn random forest decision tree classification for my predictive model. Random forests creates decision trees from training data. For Snowy, this means the location prediction is a classification derived from multiple randomised iterations of voting — selecting an outcome location with the most votes.

If you look at the python notebook you can see the steps taken to assigns a class label to inputs, based on many examples it has been trained on from thousands of past observations of time of day, temperature and location.

Python code segment for visualizing feature importance (image by author)

One really cool thing about the Scikit-learn decision tree models is how easy it is to visualise what’s going on. By visualizing the model features (above) I can see that “hour of the day” is the most significant feature in the model.

Intuitively this makes sense — time of day is likely to have the most significant impact on where Snowy is likely to be. The second most significant feature in predicting Snowy’s location is outside air temperature. Again this makes sense — too hot or too cold is likely to change is she wants to be outside. What I found surprising was the least significant feature was the is-raining feature. One possible explanation is the feature only makes sense during daylight hours, the is-raining won’t have an effect on the model when Snowy is sleeping inside at night.

It’s also possible to visualize a decision tree from a random forest in Python using Scikit-Learn.

A visual decision tree showing the hour and day decision points (image by author)

Here in my display tree I can see the hour of the day is the initial decision point in the prediction — with 7:00am an interesting part of the algorithm. This is the time when alarm clocks go off in our household — and the cat is motivated to get up and look for food. Another interesting part of the tree is the “day of the week ≤ 5.5”. This equates to day of day of week being Monday through Friday — and again this part of the algorithm makes sense as we (and the cat) generally get up a bit later on week-ends

The cat predictor web-app in Streamlit

With the model created, I now wanted to build a web application to predict Snowy’s location based on a range of inputs. Streamlit is an open-source Python library that makes it easy to create web apps (without me having to learn a bunch of front-end frameworks). I added sliders and selection boxes to for feature values, such as day and temperature.

Web application — with inputs as slider controls (image by author)

And voila — with a bit more python code I’ve created a Cat Prediction App; a web-app that predicts the likely location of Snowy the cat. I found some excellent instructions to deploy my Streamlit app to Heroku.

So, can ML predict where my cat is now? Absolutely — and with surprising accuracy. Snowy is happily asleep in the garden, and as predicted with the Predicator app.

Links to code

Hope you find this blog and code helpful for all your pet location prediction needs


Can ML predict where my cat is now? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

By sahil

Leave a Reply

Your email address will not be published.