Spectral-RL

🧗 Introduction

Modern RL algorithms typically resort to function approximation (e.g., deep neural networks) when faced with large state and action spaces. However, these approximations come with significant trade-offs: they often incur optimization difficulties, limited theoretical guarantees, and substantial computational costs. This raises a natural question:

Can we develop reinforcement learning algorithms that are both provably efficient and practically effective?

In this survey, we provide an answer to this question through the lens of spectral representations.

💡 Overview of Spectral Representations

Inspired by linear MDPs, we consider a more generalized factorization. For any well-behaved transition operator $\mathbb{P}$ and reward function $r$, there exist two feature maps $\boldsymbol{\phi}: \mathcal{S}\times\mathcal{A}\to \mathcal{H}$, $\boldsymbol{\mu}:\mathcal{S}\to\mathcal{H}$ and a vector $\boldsymbol{\theta}_r\in\mathcal{H}$ for some proper Hilbert space $\mathcal{H}$, such that $$ \begin{aligned} \mathbb{P}(s'|s, a) &=\langle\boldsymbol{\phi}(s, a), \boldsymbol{\mu}(s')\rangle_{\mathcal{H}},\\ r(s, a)&=\langle \boldsymbol{\phi}(s, a), \boldsymbol{\theta}_r\rangle_{\mathcal{H}}.\\ \end{aligned} $$ In fact, such decomposition always exists. The factorization instantly leads to the linearity of $Q$-value functions: $$ \begin{aligned} Q^\pi(s, a) &= r(s, a)+\gamma \int_{\mathcal{S}} \mathbb{P}(s'|s, a)V(s')\mathrm{d}s'\\ &=\langle \boldsymbol{\phi}(s, a), \boldsymbol{\theta}_r\rangle_{\mathcal{H}}+\gamma \langle \boldsymbol{\phi}(s, a), \int_{\mathcal{S}} \boldsymbol{\mu}(s')V(s')\mathrm{d}s' \rangle_{\mathcal{H}}\\ &=\bigg\langle\boldsymbol{\phi}(s, a),\underbrace{\boldsymbol{\theta}_r+\gamma \int_{\mathcal{S}} \boldsymbol{\mu}(s')V(s')\mathrm{d}s'}_{\boldsymbol{\eta}^\pi}\bigg\rangle_{\mathcal{H}} \end{aligned} $$ i.e., $Q$-value functions lie in the linear function space spanned by the same feature map $\boldsymbol{\phi}(s, a)$. This implies that:

✔ Theoretically, we can reduce the complexity by only considering linear $Q$-value functions;

✔ Practically, $\boldsymbol{\phi}(s, a)$ can serve as a effective representation for $Q$-value functions;

✔ Intuitively, $\boldsymbol{\phi}(s, a)$ encodes the essential information of the system dynamics.

Given its significance, we build a systematic framework for utilizing $\boldsymbol{\phi}(s, a)$ for RL. We term $\boldsymbol{\phi}(s, a)$ as the spectral representations.

Formulations

We can instantiate the factorization by considering different formulations of the transition operators.

1. Linear Formulation

In this case, there exists $\boldsymbol{\phi}: \mathcal{S}\times\mathcal{A}\to \mathbb{R}^d$ and $\boldsymbol{\mu}:\mathcal{S}\to\mathbb{R}^d$ such that: $$ \begin{aligned} \mathbb{P}_{\text{Linear}}(s'|s, a) &=\boldsymbol{\varphi}(s, a)^\top\boldsymbol{\nu}(s'), \end{aligned} $$ and the spectral representation is $$ \boldsymbol{\phi}_{\text{Linear}}(s, a) = \boldsymbol{\varphi}(s, a). $$

○ Equivalent to linear MDPs
✗ The linear assumption hurts the expressiveness.

2. Latent Variable Formulation

The latent variable is $z\in\mathcal{Z}$ and there exist two probability measures $\varphi(z|s, a)$ and $\nu(s'|z)$ such that: $$ \begin{aligned} \mathbb{P}_{\text{LV}}(s'|s, a) &=\int_{\mathcal{Z}} \varphi(z|s, a)\nu(s'|z)\mathrm{d}z. \end{aligned} $$ In this case, the spectral representation is $$ \boldsymbol{\phi}_{\text{LV}}(s, a) = [\varphi(z_1|s, a), \varphi(z_2|s, a), \ldots]^\top\quad \text{ for } z_i\in \mathcal{Z}. $$

○ Infinite-dimensional but can be approximated by Monte-Carlo sampling;
✔ More expressive the the linear case.

3. Energy-Based Formulation

Energy-based models (EBMs) associate the transition probability with an energy function $E(s, a, s')$: $$ \begin{aligned} \mathbb{P}_{\text{EBM}}(s'|s, a)&=\frac{\exp(E(s, a, s'))}{Z(s, a)}=\frac{\exp(\boldsymbol{\varphi}(s, a)^\top\boldsymbol{\nu}(s'))}{Z(s, a)}, \end{aligned} $$ Through random Fourier features, we can construct $$ \boldsymbol{\phi}_{\text{EBM}}(s, a) = \frac{\exp(\|\boldsymbol{\varphi}(s, a)\|^2)}{Z(s, a)}\boldsymbol{\zeta}_N(\boldsymbol{\varphi}(s, a)), $$ where $\boldsymbol{\zeta}_N(\boldsymbol{\varphi}(s, a))$ is the random Fourier feature of $\boldsymbol{\varphi}(s, a)$.

✔ Most flexible in terms of expressiveness.

🎯 Learning Algorithms for Spectral Representations

Direct maximum likelihood estimation is intractable ... $$ \begin{aligned} &\max_{\boldsymbol{\phi}, \boldsymbol{\mu}} \ \mathbb{E}_{(\mathbf{s}, \mathbf{a}, \mathbf{s}') \sim \mathcal{D}}\left[\log \langle\boldsymbol{\phi}(s, a), \boldsymbol{\mu}(s')\rangle\right]\\ \end{aligned} $$ $$ \ \ \ \text{s.t.}\ \forall(s, a), \ \ \int_\mathcal{S} \langle\boldsymbol{\phi}(s, a), \boldsymbol{\mu}(s')\rangle\mathrm{d}s'=1, $$ But fortunately tractable alternatives exist for different formulations.

Ⅰ. Spectral Contrastive Learning: $$ \begin{aligned} &\min_{\theta} \ \Big\|\frac{\mathbb{P}(s, a, s')}{\sqrt{\mathbb{P(s, a)\mathbb{P}(s')}}}-\sqrt{\mathbb{P}(s, a)\mathbb{P}(s')}\boldsymbol{\varphi}_\theta(s, a)^\top\boldsymbol{\nu}_\theta(s')\Big\|^2\\ &=\min_{\theta} \ \mathbb{E}_{\mathbb{P}(s, a)\mathbb{P}(s')}\left[(\boldsymbol{\varphi}_\theta(s, a)^\top\boldsymbol{\nu}_\theta(s'))^2\right]-2\mathbb{E}_{\mathbb{P}(s, a, s')}\left[\boldsymbol{\varphi}_\theta(s, a)^\top\boldsymbol{\nu}_\theta(s')\right] \end{aligned} $$
Ⅱ. Variational Learning: $$ \begin{aligned} &\max_{\theta} \ \log \int \varphi_\theta(z|s, a)\nu_\theta(s'|z)\mathrm{d}z\\ &\geq \mathbb{E}_{z\sim q_\theta(\cdot|s, a, s')}\left[\log \nu_\theta(s'|z)\right] - D_{\mathrm{KL}}(q_\theta\| \varphi_\theta) \end{aligned} $$
Ⅲ. Score Matching: $$ \begin{aligned} &\min_{\theta}\ \mathbb{E}_{(s, a, s'), \tilde{s}'\sim\mathbb{P}(\cdot|s'; \beta)}\left[\left\|\boldsymbol{\varphi}_\theta(s, a)^\top\nabla_{\tilde{s}'}\boldsymbol{\nu}_\theta(\tilde{s}';\beta)-\nabla_{\tilde{s}'}\log \mathbb{P}(\tilde{s}'|s'; \beta)\right\|^2\right], \end{aligned} $$
Ⅳ. Noise Contrastive Estimation: $$ \begin{aligned} &\max_{\theta}\ \frac 1{MN}\sum_{m=1}^M\sum_{n=1}^N\log\frac{\exp(\boldsymbol{\varphi}_\theta(s_n,a_n)^\top\boldsymbol{\nu}_\theta(\tilde{s}'_n;\beta_m))}{\sum_{k=1}^K\exp(\boldsymbol{\varphi}_\theta(s_n,a_n)^\top\boldsymbol{\nu}_\theta(\tilde{s}'_n;\beta_m))} \end{aligned} $$

🧩 Integration with RL

Since our theory suggests that spectral representations can represent $Q$-value functions sufficiently, we build our $Q$-value functions as $Q_{\theta, \xi}(s, a)=Q_{\xi}(\boldsymbol{\phi}_\theta(s, a))$, where the specific form depends on the choice of spectral representations. This allows spectral representations to be seamlessly integrated into any reinforcement learning (RL) pipeline. Meanwhile, each of the learning methods described in the previous section naturally realizes an effective RL algorithm.

Extension to POMDPs: Assuming the POMDPs satisfy the $L$-decodability condition, which implies that the $L$-step history $x_t=(o_{t-L+1}, a_{t-L+1}, \ldots, o_t)$ is a sufficient statistic for the current system state, we can derive spectral representations for POMDPs using the $L$-step decomposition: $$ \begin{aligned} \mathbb{P}^\pi_L(x_{t+L}|x_t, a_t) &= \langle\boldsymbol{\varphi}(x_t, a_t), \boldsymbol{\nu}^{\tilde{\pi}}(x_{t+L})\rangle_{\mathcal{H}},\\ r^\pi_L(x_t, a_t)&=\sum_{i=0}^{L-1}\gamma^ir_{t+i}=\langle\boldsymbol{\varphi}(x_t, a_t), \boldsymbol{\theta}_r^{\tilde{\pi}}\rangle_{\mathcal{H}},\\ \end{aligned} $$ The $L$-step Bellman equation then leads to $Q^\pi(x_t, a_t)=\langle\boldsymbol{\varphi}(x_t, a_t), \boldsymbol{\theta}_r^{\tilde{\pi}}+\gamma^L\boldsymbol{\nu}^{\tilde{\pi}}(x_{t+L})\rangle_{\mathcal{H}}$, making $\boldsymbol{\varphi}(x_t, a_t)$ the spectral representation that can sufficiently express the $Q$-value function.

🧪 Empirical Evaluation

We consider the following instantiations of spectral representation-based RL algorithms:

`Speder`	1. Linear Formulation + Ⅰ. Spectral Contrastive Learning
`LV-Rep`	2. Latent Variable Formulation + Ⅱ. Variational Learning
`Diff-SR`	3. Energy-Based Formulation + Ⅲ. Score Matching
`CTRL-SR`	3. Energy-Based Formulation + Ⅳ. Noise Contrastive Learning

All of them are implemented based on the TD3 Algorithm.

DMControl with Proprioceptive Inputs

Representation learning matters.

Representation-based methods consistently outperform the model-free counterpart, especially on complex tasks like dog-* and humanoid-* tasks.

The more expressive, the better.

Among methods with spectral representations, the energy-based approaches is the most flexible one, probably because the representations are implicitly infinite-dimensional due to the Gaussian kernel transformation.

DMControl with Visual Inputs

Superior performance, but faster.

spectral representation-based methods, despite also being model-free, achieve performance comparable to that of leading model-based algorithms. Meanwhile, both Diff-SR and CTRL-SR require significantly less training time than model-based competitors, since the representation-based methods avoid the costly model-based planning procedure.

🏁 Closing Remarks

In this survey, we review spectral representations for efficient reinforcement learning, comparing methods in the online RL setting under a fair experimental protocol.
Spectral representations generalize beyond online RL—they can leverage offline/passive data, are highly transferable, and extend to off-policy evaluation, multi-agent RL, and goal-conditional RL.
Outside RL, these ideas inform modular design in generative models (LLMs, diffusion models), where pre-trained representations improve downstream fine-tuning and post-training tasks.

Spectral-RL: Spectral Representations for
Reinforcement Learning

We present the framework of spectral representations, which provide an effective abstraction of the system dynamics for policy optimization while also possess clear theoretical characterizations.

🧗 Introduction

💡 Overview of Spectral Representations

Formulations

1. Linear Formulation

2. Latent Variable Formulation

3. Energy-Based Formulation

🎯 Learning Algorithms for Spectral Representations

🧩 Integration with RL

🧪 Empirical Evaluation

DMControl with Proprioceptive Inputs

DMControl with Visual Inputs

🏁 Closing Remarks

BibTeX

Spectral-RL: Spectral Representations for Reinforcement Learning

We present the framework of spectral representations, which provide an effective abstraction of the system dynamics for policy optimization while also possess clear theoretical characterizations.

🧗 Introduction

💡 Overview of Spectral Representations

Formulations

1. Linear Formulation

2. Latent Variable Formulation

3. Energy-Based Formulation

🎯 Learning Algorithms for Spectral Representations

🧩 Integration with RL

🧪 Empirical Evaluation

DMControl with Proprioceptive Inputs

DMControl with Visual Inputs

🏁 Closing Remarks

BibTeX

Spectral-RL: Spectral Representations for
Reinforcement Learning