Hippocampus As a Predicitve Map

3 minute read

Published: February 25, 2024

Hippocampus As a Predicitve Map

A cognitive map of the environment can be instrumental in generating long-term reward predictions through the simulation of future states. This approach is akin to model-based reinforcement learning, where the cognitive map acts as a dynamic model to forecast outcomes from potential actions. However, this process is computationally demanding, as it theoretically requires an infinite amount of storage space. An alternative method for mapping long-term rewards involves trial and error, utilizing a value function that associates states with long-term reward predictions. Yet, this strategy becomes problematic in dynamic environments, where shifts in reward distribution necessitate a complete relearning of the value function.

A novel solution proposes utilizing the hippocampus to develop a predictive map that represents each state in terms of its successor states. The concept of successor representation (SR) introduces two key features:

Value Function (V(s)): This function represents the expected cumulative rewards an agent can anticipate, starting from state (s). It is computed as the expected sum of discounted rewards (R(s’, s)) from all future states (s’), employing a discount factor (\gamma) to diminish the value of future rewards.
Successor Representation (SR): The SR, denoted by (M), models the frequency at which one state is expected to lead to another, factoring in the discount rate. It acts as a predictive model for future state occupancy based on the current state.

Equations for clarification:

Equation (1)

[V(s) = E\left[\sum_{t=0}^{\infty} \gamma^t R(s’, s) \middle| s_0 = s\right]]

(V(s)): Value function of state (s).
(E): Expected value across all possible futures.
(\gamma^t): Discount factor over time, reducing future reward value.
(R(s’, s)): Reward from transitioning between states.
(s_0): Initial (current) state (s).

Equation (2)

[V(s) = \sum_{s’} M(s, s’)R(s’)]

(V(s)) and (M(s, s’)): Represent value function and SR, respectively.
(R(s’)): Immediate reward upon entering state (s’).

This demonstrates the value of a state as the sum of the product of the SR and the reward function.

Equation (3)

[M(s, s’) = E\left[\sum_{t=0}^{\infty} \gamma^t I(s_t = s’) \middle| s_0 = s\right]]

Defines SR calculation as the expected sum of discounted future state occurrences.

The hippocampus is considered an ideal candidate for housing the SR due to its role in encoding not just the current location of an animal, but its future locations as well, which are influenced by the animal’s policy and environmental factors. This predictive mapping aligns with the understanding of place cells in the hippocampus, which are traditionally seen as spatial markers but also play a crucial role in forecasting future locations based on policy constraints and environmental structure.

Furthermore, the involvement of grid cells in smoothing or regularizing the SR through low-dimensional projections, based on eigendecomposition, introduces a form of spectral regularization. This technique helps in filtering out high-frequency noise, offering a more stable and noise-resistant representation of the SR.

The perspective that the hippocampus encodes a predictive, rather than merely spatial, cognitive map offers a new understanding of its function in reinforcement learning and adaptive behavior. This predictive map encompasses expectations about future states, extending beyond spatial navigation to include a broader range of cognitive functions.

Share on

Twitter Facebook LinkedIn

What is Next?

less than 1 minute read

Published: January 01, 2199

🌟 Always Curious, Forever Learning 📚

Unveiling the Strategy of Information Optimization in Event Perception

2 minute read

Published: May 05, 2024

In the intricate dance of perception, the human mind is adept at navigating a flood of sensory stimuli, effortlessly filtering through the mundane to capture the essence of experience. At the heart of this cognitive ballet lies the intriguing concept of information optimization – a process where attention is delicately modulated to enhance processing efficiency.

Understanding Actions Across Perspective

8 minute read

Published: April 28, 2024

Specific Aims: Action recognition, a pivotal component of computer vision, finds diverse applications across several fields such as smart security (Hu et al., 2007), human-robot interaction (Akkaladevi & Heindl, 2015) and virtual reality (Bates et al., 2017). This technology plays a crucial role in enhancing surveillance systems by enabling efficient monitoring and security through the prompt detection of unusual behaviors. Despite significant advancements in human action recognition, state-of-the-art algorithms still face challenges, such as misclassifications caused by background noise mistaken for signals and the scarcity of annotated data. Moreover, two foundational issues call for non-technical solutions: intra- and inter-class variations in action labels, where the same action may be performed differently depending on motor capabilities, and the similarities across different action categories. For example, “running” and “walking” involve similar human motion patterns. Another challenge is regarding to the action vocabulary. Actions can also be categorized into different levels—movements, atomic actions, composite actions, events—creating an action hierarchy. Complex actions at higher levels of the hierarchy can be decomposed into a combination of actions at lower levels. Defining and analyzing these different types of actions is crucial. Additionally, humans can often effortlessly solve these challenges, highlighting the importance of examining the discrepancies between machine performance and human capabilities to improve the design of machine action recognition systems. Aim 1: Understand and quantify the Action Invariance attributes in State-of-the-Art Action Recognition Algorithms. This project will be examining the last layers’ embeddings from various SOTA action recognition networks with different kind of architectures ranging from supervised to unsupervised and self-supervised learning algorithms. These networks will be analyzed based on their performance when the input videos depict the same action executed from different angles and in diverse contexts with varying levels of abstraction. We will leverage the META dataset(Bezdek et al., 2022)—a large-scale, well-characterized collection of stimuli representative of such activities. This dataset consists of a structured and thoroughly instrumented set of extended event sequences performed in naturalistic settings, complete with hand-annotated timings of high-level actions. Additionally, it includes sequences of humans performing similar actions in a highly controlled manner, devoid of actual projects. Specifically, a Representation Dissimilarity Matrix analysis will be applied to assess the action invariance capabilities of these state-of-the-art algorithms. This approach will provide a quantitative measure of the action invariance features within the SOTA action recognition models. Furthermore, we will explore the relationship between these invariance abilities and the success of action classification. Aim 2: Examine the Correspondence Between Artificial Neural Networks and Biological Neural Representations of Actions. Participants will watch movies depicting everyday activities while undergoing simultaneous neural recording in an MRI scanner. The activation patterns in the brain networks will be compared with those produced by state-of-the-art action recognition algorithms. This comparison aims to investigate both the convergence and divergence of these patterns. Significance and Innovation: One of the principal objectives of artificial intelligence research is to develop machines that can accurately comprehend human actions and intentions, thereby enhancing their ability to assist us. Consider a scenario where a patient performs rehabilitation exercises at home, monitored by a robotic assistant. This robot is not only capable of recognizing the patient’s movements but also evaluates the accuracy of the exercises and prevents potential injuries. Such advanced technology could significantly reduce the need for in-person therapy visits, decrease medical expenses, and facilitate the feasibility of remote rehabilitation. Action recognition algorithms are pivotal to numerous practical applications, particularly in the realms of sports and entertainment where they enhance viewer engagement through detailed analytics and enriched interactive experiences. In healthcare, these algorithms are integral to precise patient monitoring and support physical therapy by ensuring movements are performed correctly. Cutting-edge technologies in this field (Feichtenhofer et al., 2017; Wang et al., 2016) have substantially minimized the need for manual video data analysis, providing insights into both present and predicted future activities within video sequences. However, the field of action recognition algorithms faces several distinct challenges. Some issues are primarily related to scale and engineering, such as background segmentation and the lack of sufficiently labeled data for various actions, which might find solutions in engineering advancements. Yet, there are two unique challenges in action recognition that do not have straightforward engineering solutions but have been effectively addressed by biological systems. These challenges include managing intra- and inter-class variations and the hierarchical representation of action recognition. It is widely recognized that individuals exhibit distinct behaviors when performing the same actions. For instance, the action categorized as “running” may vary significantly; a person may run quickly, slowly, or intersperse running with jumping. This indicates that a single action category can encompass various styles of human movement. Additionally, videos capturing the same action may be taken from multiple angles—frontal, lateral, or even aerial perspectives—introducing variations in appearance across different views (figure 1). Moreover, diverse individuals might assume different poses while performing identical actions. These factors contribute to substantial intra-class variations in appearance and pose, often confounding existing action recognition algorithms. Such variations are even more pronounced in real-world action datasets (Karpathy et al., 2014), necessitating the development of more sophisticated action recognition algorithms suitable for practical deployment. This aspect of action recognition, known as intra- and inter-class variations in computer vision literature, presents a critical challenge for the generatability of the action recognition algorithms. A primary objective of this research is to provide a detailed and precise quantification of the challenges current state-of-the-art action recognition technologies face in handling high intra- and inter-class variations. In addition, by exploring how biological systems overcome these issues, we aim to enhance and advance the development of future action recognition technologies. The invariance problem is not exclusive to machine models for action recognition.. Convolutional Neural Networks (CNNs) have significantly advanced the field of image recognition, yet they encounter notable challenges with the object invariance problem. This issue arises when objects are presented in slightly different ways or when negligible, seemingly insignificant features are introduced. In such scenarios, CNNs often struggle to maintain consistent classification accuracy. For instance, minor variations in object orientation, scale, or background changes can disproportionately affect the CNN’s ability to recognize the object correctly (Kar et al., 2019).
A key innovation of this research proposal is the attempt to address this computational vision model challenge by examining human behavior and biological systems’ solutions to similar problems. As a social species, humans rely on recognizing the actions of others in their everyday lives. We quickly and effortlessly extract action information from rich dynamic stimuli, despite variations in the visual appearance of action sequences due to transformations such as changes in size, position, actor, and viewpoint (e.g., determining whether a person is running or walking towards us, regardless of the direction they are coming from). This ability emerges early in development. Studies involving eye-gazing in four-month-old infants(Woodward & Sommerville, 2000) have demonstrated that infants possess an innate understanding of actions. They tend to gaze longer at the conclusion of action sequences that end unexpectedly compared to those that conclude as anticipated. Even when actions are minimally represented, such as through joint-position skeleton representations, Johansson discovered that individuals could recognize the action plans quickly and without error. Those behavioral findings indicated that challenges deemed significant in machine models are not as problematic for human models. This project aims to first quantify the action invariance capabilities of machine models and then explore human neural representations and strategies that could potentially address these challenges. Existing literature suggests that the existence of mirror neurons could be a potential solution to the action invariance problem. Mirror neurons(Gallese & Goldman, 1998), first discovered in the premotor cortex of macaques, are known to fire both when the animal performs an action and when it observes the same action performed by others. This discovery has been extrapolated to humans, suggesting a neurological basis for action understanding and imitation. Research indicates that mirror neurons may play a crucial role in not just recognizing but also predicting and interpreting the actions of others, thereby bridging the gap in action recognition across different perspectives and contexts. This neural mechanism underscores the potential for mirror neurons to facilitate a more robust understanding of action invariance, enhancing our ability to develop more accurate and adaptable action recognition technologies.

Adaptive Behavior and Environmental Influence

3 minute read

Published: April 22, 2024

The central debate between Davachi s and Zack’s stimulus types is not merely about one being more naturalistic than the other. Consider, for instance, the inherent oddity in signing up for a psychological experiment—our study primarily involves participants labeled as ‘WIRED,’ consisting of undergraduate students. Conversely, sequences of pictures with varying colored backgrounds might deviate from our everyday experiences, yet one might argue that, from a process perspective, both types of stimuli are processed by similar biological computations and could generalize to more naturalistic stimuli.

Sophie(Xing) Su

Hippocampus As a Predicitve Map

Hippocampus As a Predicitve Map

Equation (1)

Equation (2)

Equation (3)

Share on

You May Also Enjoy

What is Next?

🌟 Always Curious, Forever Learning 📚

Unveiling the Strategy of Information Optimization in Event Perception

Understanding Actions Across Perspective

Adaptive Behavior and Environmental Influence