Relax, it doesn't matter how you get there

A new self-supervised approach for multi-timescale behavior analysis

¹ Georgia Institute of Technology, ² DeepMind, ³ Emory University

Introduction

Studying the dynamics of behavior data, especially in complex and naturalistic behavior, provides rich information about movement and decision making that can be used to build insights into the link between the brain and behavior.

→ Can we build models that can capture the dynamics of behavior, and help us derive insights from the data?

→ Can we leverage self-supervised learning to process large amounts of high-dimensional noisy behavior data, without relying on annotations?

We introduce a novel self-supervised learning framework called BAMS trained to predict the next action(s). BAMS is equiped with two latent spaces, the first captures the short-term dynamics, the second captures the long-term embeddings.

🐭 Mouse triplets: Our model ranks 🏆 1st overall in the MABe 2022 Multi-agent behavior mouse triplets benchmark.
🪰 Fruit Flies: Our model ranks 🏆 1st overall in the MABe 2022 Multi-agent behavior fruit flies benchmark.
🤖 Simulated legged robots: We introduce a new dataset of a heterogeneous population of quadrupeds traversing different terrains.
🎰 Humans decision making: Learning signatures of decision making from many individuals playing the same game.

🐭 Studying interacting mice

Dataset

The mouse triplet dataset (Link) is part of the Multi-Agent Behavior Challenge (MABe 2022). It consists of a set of trajectories from three mice interacting in an open-field arena. The recorded videos are processed to extract pose estimations and tracking data. We use these features as the input to our model.

Below, we show a sample sequence.

Figure 1: Visualization of the pose data for a sample recording clip.

Note that the data is noisy, contains missing values and frequent identity swap issues. This is representative of the real-world data we are interested in studying.

Results

To evaluate the quality of the learned representation, we use a set of 13 readout tasks that are designed after a wide range behavioral analysis tasks.

The model produces a d-dimensional represenation, to readout the desired label, we freeze the representation and train a single linear layer on top of the embedding. A good performance indicates that the label is linearly decodable from the learned embedding space, suggesting that the model is able to capture these factors that are known to be important.

We report the results for our model and other models below. Our model ranks first in the overall challenge and on all global tasks Link to Leaderboard.

Figure 2: Linear evaluation results.

Visualizing the embedding spaces

We use the learned embedding spaces to visualize the dynamics of the behavior. We show the embedding space of the first latent space (short-term dynamics) and the second latent space (long-term dynamics).

Figure 3: Projection of the embeddings using PCA after normalization.

There are two things we can note here:

- The long-term embedding is able to capture the different strains of mice. This is important because it shows that the behavior of the mice is not only driven by the environment, but also by their genetic background.

- While the long-term embedding quickly converges to a small region of the space, we can see more variability in the short-term embedding. This is expected because the short-term dynamics are used to capture momentary behavior like interactions between the mice.

We visualize the embeddings again but we use the lights on/off to color the points.

Figure 4: Projection of the embeddings using PCA after normalization.

- We can see that the long-term embedding is able to capture the effect of the lights on the behavior of the mice. This confirms what we know about mice being more active during the night and less active during the day.

- Interestingly, we find that the light condition is encoded along the first PC, while the mouse strain is encoded along the second PC.

🪰 Studying fruit flies

The fruit fly groups dataset is the second dataset in the MABe benchmark. It consists of tracking data of a group of 9 to 11 flies interacting in a small dish.

Precise neural activity manipulations are performed on certain neurons which, when activated, induce certain types of behavior including courtship, avoidance and female aggression. Additionally, the groups of flies are differentiated by various genetic mutations and tagged by sex. This along with other behavioral factors provide us with 50 different subtasks, both frame-level and sequence-level, that can be use to evaluate the learned representations.

Our model achieves stateof-the-art performance on the fly dataset. BAMS outperforms other models on both frame-level and sequence-level sub-tasks, and we note a significant boost in the average frame-level F1 score. This result further demonstrates the generalizability of our approach to new datasets and scaling to an even larger numbers of animals and frames.

🤖 Studying simulated legged robots

To test our model's ability to separate behavioral factors that vary in complexity and contain distinct multi-timescale dynamics, we introduce a new dataset generated from a heterogeneous population of quadrupeds traversing different terrains.

We use the NVIDIA's Isaac Gym simulator. We use two robots that differ by their morphology, ANYmal B and ANYmal C. To create heterogeneity in the population, we randomize the body mass of the robot as well as the target traversal velocity. We track a set of 24 proprioceptive features including linear and angular velocities of the robots' joints.

Figure 5: Sketch of robots walking on different terrains.

→ Why this dataset?

Simulation-based data collection enables access to information that is generally inaccessible or hard to acquire in a real-world setting. Unlike noisy measurements coming from the camera-based feature extractor in the case of the mouse dataset, physics engines do not suffer from the problem of noise. Instead, they provide accurate ground-truth information about the creature and the world state free of charge. Access to such information is at times critical for scrutinizing the capabilities of the learning algorithms.

🎰 Studying human decision making

BAMS is also applicable to human decision making. Our work will appear at the 11th International IEEE EMBS Conference on Neural Engineering (NER'23), Baltimore, Maryland, April 2023.

Cite our work

If you find this useful for your research, please consider citing our work:

@inproceedings{
    azabou2023relax,
    title={Relax, it doesn't matter how you get there: A new self-supervised approach for multi-timescale behavior analysis},
    author={Azabou, Mehdi and Mendelson, Michael and Ahad, Nauman and Sorokin, Maks and Thakoor, Shantanu and Urzay, Carolina and Dyer, Eva L.},    
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023}}

@misc{mendelson2023,
      doi = {10.48550/ARXIV.2302.11023},
      url = {https://arxiv.org/abs/2302.11023},
      author = {Mendelson, Michael J and Azabou, Mehdi and Jacob, Suma and Grissom, Nicola and Darrow, David and Ebitz, Becket and Herman, Alexander and Dyer, Eva L.},
      title = {Learning signatures of decision making from many individuals playing the same game},
      publisher = {arXiv},
      year = {2023}}