Transformer Family on Time series

2023/05/30 21:14:00 2023/05/30 21:15:00 transformers survey time series


TODO: 完成重点论文和异常检测论文的精读


  1. N-BEATS(ICLR 2022)
  2. LogTrans(NeurIPS 2021)
  3. Informer ( AAAI 2021 Best Paper)📒Informer:用Transformer架构解决LSTF问题 - 知乎
  4. Autoformer(NeuraIPS 2021)📒细读好文 之 Autoformer - 知乎
    • 🌟其中基于Wiener-Khinchin定理的Auto-Correlation Mechanism有点意思,可以单独看看。
  5. FEDformer(ICML 2022)📒阿里达摩院最新FEDformer,长程时序预测全面超越SOTA - 知乎
  6. Pyraformer(ICLR 2022)📒时间序列预测@Pyraformer - 知乎
  7. Transformer embeddings of irregularly spaced events and their participants(ICLR 2022)
  8. TranAD(VLDB 2022)
  9. Probabilistic Transformer For Time Series Analysis(NeurIPS 2021)。

🌟 此外,我们还将依托论文Are Transformers Effective for Time Series Forecasting?探讨一个重要的问题,即Transformer网络在时间序列预测中的有效性。


Taxonomy of Transformers in Time Series

Yang, C., Mei, H., & Eisner, J. (2021). Transformer embeddings of irregularly spaced events and their participants. arXiv preprint arXiv:2201.00044.


Taxonomy of Transformers for time series modeling from the perspectives of network modifications and application domains

Network Modifications

Positional Encoding

  1. Learnable Positional Encoding
    1. THIS WORK introduces an embedding layer in Transformer that learns embedding vectors for each position index jointly with other model parameter.
    2. THIS WORK uses an LSTM network to encode positional embeddings, which can better exploit sequential ordering information in time series.
  2. Timestamp Encoding: Encoding calendar timestamps (e.g., second, minute, hour, week, month, and year) and special timestamps (e.g., holidays and events).
    1. Informer / Autoformer / FED former proposed to encode timestamps as additional positional encoding by using learnable embedding layers.

Attention Module

面向attention module的工作主要致力于减少self-attention module的时间、内存复杂度(原来为$\mathcal{O}(N^2)$)

  1. Introducing a sparsity bias into the attention mechanism: LogTrans, Pyraformer
  2. Exploring the low-rank property of the self-attention matrix to speed up the computation: Informer, FEDformer

Architecture-based Attention Innovation


  1. Introduce hierarchical architecture into Transformer to take into account the multi-resolution aspect of time series: Informer, Pyraformer
    1. Informer: Inserts max-pooling layers with stride 2 between attention blocks, which down-sample series into its half slice (block-wise multi-resolution learning)
    2. Pyraformer: designs a C-ary tree-based attention mechanism, in which nodes at the finest scale correspond to the original time series, while nodes in the coarser scales represent series at lower resolutions.
      • Pyraformer developed both intra-scale and inter-scale attentions in order to better capture temporal dependencies across different resolutions.
      • Hierarchical architecture also enjoys the benefits of efficient computation, particularly for long-time series.

Application Domains

上面的综述对Forecasting、anomaly detection、Classification的相关研究都给出了详尽的调研,这里只整理与anomaly detection相关的内容。Transformer架构为anomaly detection任务做出的主要贡献还是“improve the ablity of modeling temporal dependency”。除此之外针对异常检测任务,常见的模型融合方式有:

  1. Combine Transformer with neural generative models: VAE – MT-RVAE, TransAnomaly; GAN – TranAD.
  2. Combine Transformer with graph-based learning architecture for multivariate time series anomaly detection: GTA
  3. Combine Transformer with Gaussian prior-Association: AnomalyTrans