Catching up on time series foundation models — MOMENT

6 min readJul 31, 2024

Time series data is everywhere — from stock prices to heart rate monitors, weather patterns to website traffic. Analyzing this data effectively has always been a challenge, often requiring specialized models for each specific task or domain. But the recent interest (or hype) around large language models has also infected time series modeling. This has lead to an explosion of time series foundation models even though their ability to outperform classical methods is still an open question [1].

A new family of promising time-series foundation model is MOMENT developed by researchers at Carnegie Mellon University and the University of Pennsylvania [2]. This work represents a nice step in the right direction to developing these types of models. The authors don’t just show how their model performs relative to other foundation models, but also include a new compiled dataset, a more complete set of benchmarking tasks and metrics, and take a look at the learned representations of their model.

The core idea behind time-series foundation models

The core idea behind time-series foundation models, including MOMENT, is simple yet powerful: pre-train large models (usually large transformer models) on a diverse collection of time series data, enabling them to learn general patterns and characteristics that apply across various domains and tasks.

Key points of the MOMENT paper

The Time Series Pile: The researchers compiled a large, diverse dataset of public time series data. This dataset spans domains like healthcare, engineering, and finance. Data is often a huge bottleneck in machine learning — compiling these datasets makes it easier for researchers to compare results and ensure reproducibility.
Multi-task Pre-training: MOMENT models are pre-trained on a masked prediction task, similar to encoding models like BERT and other time-series foundation models like PatchTST. This task learns to reconstruct missing portions of time series. This is different from other methods for training time-series foundation models, which aim to predict the next token(s) or time point(s) (i.e. forecasting).
Flexible Architecture: The models use a transformer encoder based on the T5 architecture, with a lightweight reconstruction head. This allows for easy adaptation to various downstream tasks.
Data Preprocessing: To handle the diverse nature of time series data, the researchers employed strategies like reversible instance normalization [3], independent modeling of multivariate series channels, and breaking series into fixed-length patches.
Low-resource Performance: MOMENT shows impressive results on multiple tasks (forecasting, classification, anomaly detection, imputation) with minimal fine-tuning (e.g. using zero-shot learning or linear probing), often being on par with or outperforming specialized models.

Model Evaluation

The authors designed a comprehensive benchmark covering five tasks:

Long-horizon forecasting
Short-horizon forecasting
Classification
Anomaly detection
Imputation

They assessed the models in limited data and computation settings, comparing them to state-of-the-art deep learning and statistical models for each task. Take a look at the (massive) tables in the paper.

Summary of results

The MOMENT models demonstrated impressive performance across a range of time series tasks, often rivaling or surpassing specialized models.

In long-horizon forecasting, linear probing, which involves swapping the projection head and performing linear regression on the encoded time series, proved competitive with state-of-the-art models like PatchTST. This might make MOMENT a good candidate for adaptation to new (possibly data-scarce) domains, as linear probing does not require much data or compute.

Short-horizon forecasting presented a greater challenge, especially in zero-shot settings. However, MOMENT still managed to (slightly) outperform traditional statistical methods like ARIMA on several datasets. This suggests that while there’s room for improvement, the model has already captured meaningful short-term patterns across diverse time series.

In classification tasks, MOMENT’s unsupervised representations rivaled many supervised deep learning models that were trained specifically for each dataset. For imputation tasks, which involve filling in missing values in time series data, linear probing of MOMENT achieved the lowest errors compared to other methods. MOMENT also consistently outperformed both deep learning models and LLM-based approaches in anomaly detection tasks. This is a critical capability for many real-world applications, from fault detection in industrial systems to identifying unusual patterns in financial transactions or medical data. All three tasks demonstrate the model’s ability to learn generalizable features that are relevant across different types of time series, potentially reducing the need for task-specific model development. However, further work is required to better quantify and evaluate the generalization abilities of models like MOMENT before they go into production. Can we identify which domains times-series foundation models generalize to given our knowledge of what they were trained on?

Some of my favorite parts of the paper focus on interpretability. The researches show that MOMENT captures intuitive characteristics of time series data, such as frequency, trend, and amplitude. Moreover, it can learn distinct class-specific representations without supervision. This level of interpretability is crucial for building trust in the model’s predictions and understanding its decision-making process. It also makes it easier to build downstream modules that take the model’s output as input.

Scaling experiments demonstrated that increased model size and training data generally improved downstream performance. This aligns with observations in other domains like natural language processing, suggesting that time series analysis may benefit from the same “scaling laws” that have driven progress in other areas of AI.

Finally, one of the more intriguing findings was MOMENT’s cross-modal capabilities. The model can be repurposed for sequence modeling tasks beyond time series, performing comparably to same-sized language models on modalities such as text and images. This versatility hints at the potential for developing more general-purpose sequence models that incorporate time series, vision, and language into one model (a Large Time-Language-Vision Model — LaTiLaViM perhaps).

Conclusion and Directions

It’s important to note that this is still early-stage research. While the results are promising, more work is needed to fully understand the capabilities and limitations of these models in real-world applications. Some key areas for future research include:

Exploring fine-tuning and adaptation approaches to unlock the full potential of pre-trained models on downstream tasks.
Expanding the Time Series Pile to include more very long series and variable frequency data (Transformer models perform best with more data and more compute — scale it up!).
Further probing and quantifying the models’ understanding of higher-level features and structures to build more interpretable time series models. A potentially interesting direction would be to apply the mechanistic interpretability work from Anthropic to time-series foundation models [4].
Investigating how to effectively and responsibly deploy these models in high-stakes domains like healthcare and finance. How can we detect when these models fail? Are there similar failure modes in time-series foundation models as the “hallucinations” observed in LLMs?

MOMENT represents an exciting step forward for time series modeling, bringing together the power of foundation models with a more rigorous approach to model evaluation. While challenges remain, the strong zero-shot and few-shot capabilities demonstrated by these models could democratize and accelerate time series analysis across numerous domains. The researchers have open-sourced their data, code, and pre-trained models, inviting the community to build upon their work.

References

[1] Sarfraz, M.S., Chen, M., Layer, L., Peng, K., & Koulakis, M. (2024). Position: Quo Vadis, Unsupervised Time Series Anomaly Detection? ArXiv, abs/2405.02678.

[2] Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024). MOMENT: A Family of Open Time-series Foundation Models. ArXiv, abs/2402.03885.

[3] Kim, T., Kim, J., Tae, Y., Park, C., Choi, J., & Choo, J. (2022). Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. International Conference on Learning Representations.

[4] Bricken, et al., “Towards Monosemanticity: Decomposing Language Models With Dictionary Learning”, Transformer Circuits Thread, 2023.