Relax, it doesn't matter how you get there

A new self-supervised approach for multi-timescale behavior analysis

Mehdi Azabou1 Michael Mendelson1 Nauman Ahad1 Maks Sorokin1 Shantanu Thakoor2 Carolina Urzay1 Eva L. Dyer1,3
1 Georgia Institute of Technology, 2 DeepMind, 3 Emory University


Studying the dynamics of behavior data, especially in complex and naturalistic behavior, provides rich information about movement and decision making that can be used to build insights into the link between the brain and behavior.

→ Can we build models that can capture the dynamics of behavior, and help us derive insights from the data?

→ Can we leverage self-supervised learning to process large amounts of high-dimensional noisy behavior data, without relying on annotations?

We introduce a novel self-supervised learning framework called BAMS trained to predict the next action(s). BAMS is equiped with two latent spaces, the first captures the short-term dynamics, the second captures the long-term embeddings.

Table of Contents

🐭 Studying interacting mice


The mouse triplet dataset (Link) is part of the Multi-Agent Behavior Challenge (MABe 2022). It consists of a set of trajectories from three mice interacting in an open-field arena. The recorded videos are processed to extract pose estimations and tracking data. We use these features as the input to our model.

Below, we show a sample sequence.

Figure 1: Visualization of the pose data for a sample recording clip.

Note that the data is noisy, contains missing values and frequent identity swap issues. This is representative of the real-world data we are interested in studying.


To evaluate the quality of the learned representation, we use a set of 13 readout tasks that are designed after a wide range behavioral analysis tasks.

The model produces a d-dimensional represenation, to readout the desired label, we freeze the representation and train a single linear layer on top of the embedding. A good performance indicates that the label is linearly decodable from the learned embedding space, suggesting that the model is able to capture these factors that are known to be important.

We report the results for our model and other models below. Our model ranks first in the overall challenge and on all global tasks Link to Leaderboard.

Figure 2: Linear evaluation results.

Visualizing the embedding spaces

We use the learned embedding spaces to visualize the dynamics of the behavior. We show the embedding space of the first latent space (short-term dynamics) and the second latent space (long-term dynamics).

Figure 3: Projection of the embeddings using PCA after normalization.

There are two things we can note here:

- The long-term embedding is able to capture the different strains of mice. This is important because it shows that the behavior of the mice is not only driven by the environment, but also by their genetic background.

- While the long-term embedding quickly converges to a small region of the space, we can see more variability in the short-term embedding. This is expected because the short-term dynamics are used to capture momentary behavior like interactions between the mice.

We visualize the embeddings again but we use the lights on/off to color the points.

Figure 4: Projection of the embeddings using PCA after normalization.

- We can see that the long-term embedding is able to capture the effect of the lights on the behavior of the mice. This confirms what we know about mice being more active during the night and less active during the day.

- Interestingly, we find that the light condition is encoded along the first PC, while the mouse strain is encoded along the second PC.

🤖 Studying simulated legged robots

To test our model's ability to separate behavioral factors that vary in complexity and contain distinct multi-timescale dynamics, we introduce a new dataset generated from a heterogeneous population of quadrupeds traversing different terrains.

We use the NVIDIA's Isaac Gym simulator. We use two robots that differ by their morphology, ANYmal B and ANYmal C. To create heterogeneity in the population, we randomize the body mass of the robot as well as the target traversal velocity. We track a set of 24 proprioceptive features including linear and angular velocities of the robots' joints.

Sketch of robots walking on different terrains.

Figure 5: Sketch of robots walking on different terrains.

→ Why this dataset?

Simulation-based data collection enables access to information that is generally inaccessible or hard to acquire in a real-world setting. Unlike noisy measurements coming from the camera-based feature extractor in the case of the mouse dataset, physics engines do not suffer from the problem of noise. Instead, they provide accurate ground-truth information about the creature and the world state free of charge. Access to such information is at times critical for scrutinizing the capabilities of the learning algorithms.

🎰 Studying human decision making

BAMS is also applicable to human decision making. Our work will appear at the 11th International IEEE EMBS Conference on Neural Engineering (NER'23), Baltimore, Maryland, April 2023.

Cite our work

If you find this useful for your research, please consider citing our work:
      doi = {10.48550/ARXIV.2303.08811},
      url = {},
      author = {Azabou, Mehdi and Mendelson, Michael and Ahad, Nauman and Sorokin, Maks and Thakoor, Shantanu and Urzay, Carolina and Dyer, Eva L.},
      title = {Relax, it doesn't matter how you get there: A new self-supervised approach for multi-timescale behavior analysis},
      publisher = {arXiv},
      year = {2023}}
      doi = {10.48550/ARXIV.2302.11023},
      url = {},
      author = {Mendelson, Michael J and Azabou, Mehdi and Jacob, Suma and Grissom, Nicola and Darrow, David and Ebitz, Becket and Herman, Alexander and Dyer, Eva L.},
      title = {Learning signatures of decision making from many individuals playing the same game},
      publisher = {arXiv},
      year = {2023}}