Hippocampus As a Predicitve Map

3 minute read

Published:

Hippocampus As a Predicitve Map

A cognitive map of the environment can be instrumental in generating long-term reward predictions through the simulation of future states. This approach is akin to model-based reinforcement learning, where the cognitive map acts as a dynamic model to forecast outcomes from potential actions. However, this process is computationally demanding, as it theoretically requires an infinite amount of storage space. An alternative method for mapping long-term rewards involves trial and error, utilizing a value function that associates states with long-term reward predictions. Yet, this strategy becomes problematic in dynamic environments, where shifts in reward distribution necessitate a complete relearning of the value function.

A novel solution proposes utilizing the hippocampus to develop a predictive map that represents each state in terms of its successor states. The concept of successor representation (SR) introduces two key features:

  1. Value Function (V(s)): This function represents the expected cumulative rewards an agent can anticipate, starting from state (s). It is computed as the expected sum of discounted rewards (R(s’, s)) from all future states (s’), employing a discount factor (\gamma) to diminish the value of future rewards.

  2. Successor Representation (SR): The SR, denoted by (M), models the frequency at which one state is expected to lead to another, factoring in the discount rate. It acts as a predictive model for future state occupancy based on the current state.

Equations for clarification:

Equation (1)

[V(s) = E\left[\sum_{t=0}^{\infty} \gamma^t R(s’, s) \middle| s_0 = s\right]]

  • (V(s)): Value function of state (s).
  • (E): Expected value across all possible futures.
  • (\gamma^t): Discount factor over time, reducing future reward value.
  • (R(s’, s)): Reward from transitioning between states.
  • (s_0): Initial (current) state (s).

Equation (2)

[V(s) = \sum_{s’} M(s, s’)R(s’)]

  • (V(s)) and (M(s, s’)): Represent value function and SR, respectively.
  • (R(s’)): Immediate reward upon entering state (s’).

This demonstrates the value of a state as the sum of the product of the SR and the reward function.

Equation (3)

[M(s, s’) = E\left[\sum_{t=0}^{\infty} \gamma^t I(s_t = s’) \middle| s_0 = s\right]]

  • Defines SR calculation as the expected sum of discounted future state occurrences.

The hippocampus is considered an ideal candidate for housing the SR due to its role in encoding not just the current location of an animal, but its future locations as well, which are influenced by the animal’s policy and environmental factors. This predictive mapping aligns with the understanding of place cells in the hippocampus, which are traditionally seen as spatial markers but also play a crucial role in forecasting future locations based on policy constraints and environmental structure.

Furthermore, the involvement of grid cells in smoothing or regularizing the SR through low-dimensional projections, based on eigendecomposition, introduces a form of spectral regularization. This technique helps in filtering out high-frequency noise, offering a more stable and noise-resistant representation of the SR.

The perspective that the hippocampus encodes a predictive, rather than merely spatial, cognitive map offers a new understanding of its function in reinforcement learning and adaptive behavior. This predictive map encompasses expectations about future states, extending beyond spatial navigation to include a broader range of cognitive functions.