Related Papers in ICML 2020 (2020.07.12)
Anomaly
[大概看了看,不太推荐] Interpretable, Multidimensional, Multimodal Anomaly Detection with Negative Sampling for Detection of Device Failure
John Sipple
CONTRIBUTIONS:
- a scalable approach to detecting multidimensional outliers, robust under multimodal conditions,
- an alternative to one-class anomaly detection using negative sampling, and
- a novel application of Integrated Gradients for anomaly interpretation.
Robust Outlier Arm Identification
Yinglun Zhu · Sumeet Katariya · Robert Nowak
Sequence
Population-Based Black-Box Optimization for Biological Sequence Design
Christof Angermueller · David Belanger · Andreea Gane · Zelda Mariet · David Dohan · Kevin Murphy · Lucy Colwell · D. Sculley
A Chance-Constrained Generative Framework for Sequence Optimization
Xianggen Liu · Jian Peng · Qiang Liu · Sen Song
An EM Approach to Non-autoregressive Conditional Sequence Generation
Zhiqing Sun · Yiming Yang
CAUSE: Learning Granger Causality from Event Sequences using Attribution Methods
Wei Zhang · Thomas Panum · Somesh Jha · Prasad Chalasani · David Page
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan · Chitwan Saharia · Geoffrey Hinton · Mohammad Norouzi · Navdeep Jaitly
Incremental Sampling Without Replacement for Sequence Models
Kensen Shi · David Bieber · Charles Sutton
Sequence Generation with Mixed Representations
Lijun Wu · Shufang Xie · Yingce Xia · Yang Fan · Jian-Huang Lai · Tao Qin · Tie-Yan Liu
Time Series
[已读] Learning From Irregularly-Sampled Time Series: A Missing Data Perspective
Steven Cheng-Xian Li · Benjamin M Marlin
[已读] Set Functions for Time Series
Max Horn · Michael Moor · Christian Bock · Bastian Rieck · Karsten Borgwardt
所解决的问题:针对非规则采样的时间序列的分类问题,文中的大意是还可以进行其他下游任务,而非仅有分类(更换相应的loss函数即可)。Dataset: MIMIC-III Mortality Prediction, Physionet 2012 Mortality Prediction Challenge, Physionet 2019 Sepsis Early Prediction Challenge
一个非常规的做法,其理论依据是differentiable set function learning, 具备可解释性是其优点,
在具体做法方面,pipeline如下:*多维时间序列 -> encoding (不使用类似seq2seq的深度学习方法,而是使用一种类似频率域分解的固定encodeing方式) -> Embedding+Aggregation+Attention (这一步使用深度神经网络,attention为scaled dot-product attention with multiple heads) -> classification (神经网络)。*在性能方面,所提出的方法分类精度并不能达到state-of-the-art,只是和baseline有竞争性,而在可解释性、内存占用、运行时间上有优异的性能。
BTW, baseline: GRU-D, GRU-SIMPLE, IP-NETS, PHASED-LSTM, TRANSFORMER, SEFT-ATTNSpectral Subsampling MCMC for Stationary Time Series
Robert Salomone · Matias Quiroz · Robert kohn · Mattias Villani · Minh-Ngoc Tran
Temporal Logic Point Processes
Shuang Li · Lu Wang · Ruizhi Zhang · xiaofu Chang · Xuqin Liu · Yao Xie · Yuan Qi · Le Song
Gradient Temporal-Difference Learning with Regularized Corrections
Sina Ghiassian · Andrew Patterson · Shivam Garg · Dhawal Gupta · Adam White · Martha White
Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders
Ioana Bica · Ahmed Alaa · Mihaela van der Schaar
missing value
[已读] Learning from Irregularly-Sampled Time Series: A Missing Data Perspective
Steven Cheng-Xian Li · Benjamin M Marlin
解决的问题: 不规则采样、有缺失时间序列建模问题、上述时间序列的分类问题。
创新与独特: 提出P-VAE与P-GAN(Partial – VAE/GAN)。 将缺失值补全问题转化为一个插值问题
采用的方法:
- 在P-VAE中,将可观测到的点集合T视为从某个分布上采样的随机变量,将VAE中的关于x的概率分布转化为x.t的联合分布。并仅监督可观测部分的重建误差与生成误差
- 在P-GAN中,分别向判别中送入(x, t, z)对,以使得有缺失的真实数据与生成的完整数据有相似的隐空间表达、相似的可观测数据。
为何选择/如何应用: 本文所提出的P-VAE中的理论推导部分值的后续工作借鉴
数据集
- 图像数据集:MNIST、CelebA
- 时间序列数据集:MIMIC-II(分类任务)
Missing Data Imputation using Optimal Transport
Boris Muzellec · Julie Josse · Claire Boyer · Marco Cuturi
Learning with Missing Values [workshop]
Julie Josse · Jes Frellsen · Pierre-Alexandre Mattei · Gael Varoquaux
Full Law Identification in Graphical Models of Missing Data: Completeness Results
Razieh Nabi · Rohit Bhattacharya · Ilya Shpitser
Recurrent
A general recurrent state space framework for modeling neural dynamics during decision-making
David Zoltowski · Jonathan Pillow · Scott Linderman
Improving the Gating Mechanism of Recurrent Neural Networks
Albert Gu · Caglar Gulcehre · Thomas Paine · Matthew Hoffman · Razvan Pascanu
Approximating Stacked and Bidirectional Recurrent Architectures with the Delayed Recurrent Neural Network
Javier Turek · Shailee Jain · Vy Vo · Mihai Capotă · Alexander Huth · Theodore Willke
Transformation of ReLU-based recurrent neural networks from discrete-time to continuous-time
Zahra Monfared · Daniel Durstewitz
Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules
Sarthak Mittal · Alex Lamb · Anirudh Goyal · Vikram Voleti · Murray Shanahan · Guillaume Lajoie · Michael Mozer · Yoshua Bengio
VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing
Zoltán Milacski · Barnabás Póczos · Andras Lorincz
Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions
Ahmed Alaa · Mihaela van der Schaar
Interpretable
[已读] Interpretable, Multidimensional, Multimodal Anomaly Detection with Negative Sampling for Detection of Device Failure
John Sipple
[mark-interpre-2] The Many Shapley Values for Model Explanation
Mukund Sundararajan · Amir Najmi
- Shapley Values是一种基于博弈论的机器学习归因方法(几个玩家消费一个服务,即产生cost,shapley value即为cost分配到每个玩家身上的量,联系——模型被视为cost function,输入样本的feature即为players,则分摊到每个player的cost(shapley value)即为归因的权重)
- Shapley Values有时会输出不唯一的解释,这使得解释不适用,本论文解决的是基于Shapley Values进行输入归因并使其保持解释唯一性的问题。
[mark-interpre-1] Dispersed Exponential Family Mixture VAEs for Interpretable Text Generation
Wenxian Shi · Hao Zhou · Ning Miao · Lei Li
- 文章解决的是,当Exponential Family Mixture model作为隐空间的先验的时候可能出现的模式崩溃问题,用传统的VAE训练方法可能导致模式崩溃问题
- 该文章的解决方法是,在loss函数中增加一个分散项,使得隐空间中的每一簇之间能够更分散。
两个涉及到的概念:
- 后验崩溃(posterior collapse):尽管已有一个先验的隐空间分布,但模型生成的隐空间变量的后验分布远偏离于先验
- 模式崩溃(mode collapse):先验的在隐空间中有多个簇的分布,但模型训练后各个簇之间仅有微小的变化,使得多个簇最终坍缩到一个簇中
[mark-interpre-1: 可以看看] Explaining Groups of Points in Low-Dimensional Representations
Gregory Plumb · Jonathan Terhorst · Sriram Sankararaman · Ameet Talwalkar
这篇文章主要探究一个问题:在数据集样本的低维表征空间中,每一簇之间到底有什么差异?具体来讲,研究两个具体的问题:1)groupA与groupB的隐空间表征到底有什么区别?即:我们能否找到一种在显空间的变换,使得样本从groupA变换到groupB?2)对于groupA中的所有点,我们能否找出一种通用的描述,以回答“满足什么条件的样本能够算作是groupA的”,并且,这种描述在每个group上都是成立的(原话:we want to find a explanation that works for all of the points in Group A and because we want the complete set of explanations to be consistent (i.e., symmetrical and transitive) among all the groups.)。
[mark-interpre-3] DeepCoDA: personalized interpretability for compositional health data
Thomas Quinn · Dang Nguyen · Santu Rana · Sunil Gupta · Svetha Venkatesh
没太看懂讲了什么,但是大概就是基于self-explaining neural network构建了一种,在生物学上可解释的模型。并具有一些良好的性质。
- A model should select the best log-ratio transformation automatically.
- A model should have linear interpretability.
- A model should have personalized interpretability.
Proper Network Interpretability Helps Adversarial Robustness in Classification
Akhilan Boopathy · Sijia Liu · Gaoyuan Zhang · Cynthia Liu · Pin-Yu Chen · Shiyu Chang · Luca Daniel
开发一种解释方法,对输入上的微小扰动鲁棒,即即使有干扰,其生成的解释也不会大不一样,且使得网络的分类结果都对扰动鲁棒
[简单看了看,参考意义不大] Robust and Stable Black Box Explanations
Himabindu Lakkaraju · Nino Arsov · Osbert Bastani
文章本身做的是:针对数据分布偏移后的数据集,如何生成的鲁棒的可解释方法,这种偏移直觉上来讲是:在输入数据上加一个微小的扰动,可解释方法不应当因为这些扰动而有巨大的变化(然而现有的方法对这些扰动并不鲁棒)
- we propose a novel minimax objective that can be used to construct robust black box explanations for a given family of interpretable models. This objective encodes the goal of returning the highest fidelity explanation with respect to the
worst-case over a set of distribution shifts. - we propose a set of distribution shifts that captures our intuition about the kinds of shifts to which interpretations should be robust. In particular, this set includes shifts that contain perturbations to a small number of covariates.
- we propose algorithms for optimizing this objective in two settings: (i) explanations such as linear models with continuous parameters that can be optimized using gradient descent, in which case we use adversarial training (Goodfellow et al., 2015), and (ii) explanations such as decision sets with discrete parameters, in which case we use a sampling-based approximation in conjunction with submodular optimization (Lakkaraju et al., 2016).
- we propose a novel minimax objective that can be used to construct robust black box explanations for a given family of interpretable models. This objective encodes the goal of returning the highest fidelity explanation with respect to the
[简单看了看,参考意义不大] Unsupervised Discovery of Interpretable Directions in the GAN Latent Space
Andrey Voynov · Artem Babenko
GAN的隐空间中通常包含有一些具有语义信息的方向,在这些方向上进行人类可以理解的隐空间表示的操控,可以生成受控的图像变换(在某个方向上的移动可能代表了图像的饿zooming或者recoloring,在隐空间中找到这些方向,并在这些方向上生成样本,有助于我们方便的进行样本变换。),但是目前为止,发现这些方向的过程一般都是“in a supervised manner, requiring human labels, pretrained models, or some form of self-supervision”,
In this paper, we introduce an unsupervised method to identify interpretable directions in the latent space of a pretrained GAN model. By a simple model-agnostic procedure, we find directions corresponding to sensible semantic manipulations without any form of (self-)supervision. Furthermore, we reveal several non-trivial findings, which would be difficult to obtain by existing methods, e.g., a direction corresponding to background removal. (用于去除图片背景的方向)。
Interpretations are Useful: Penalizing Explanations to Align Neural Networks with Prior Knowledge
Laura Rieger · Chandan Singh · William Murdoch · Bin Yu
一个深度学习方法的解释,需要提供以下两种作用:1)provide insight into a model,2)suggest a corresponding action in order to achieve an objective。但一般的可解释方法做到第一步就停止了,即解释的内容并没有被用于提高网络的性能。在本文中,我们提出了上下文分解解释惩罚(CDEP),这是一种使从业人员能够利用现有解释方法来提高深度学习模型的预测准确性的方法。特别是,当显示出模型对某些feature的重要性分配不正确时,CDEP使从业人员可以通过解释将领域知识插入模型中来纠正这些错误。
[简单看了看,参考意义不大] Interpreting Robust Optimization via Adversarial Influence Functions
Zhun Deng · Cynthia Dwork · Jialiang Wang · Linjun Zhang
主要是用adversarial influence function来理解鲁棒优化的过程的,不是广义的神经网络的。
[讲Shapley-value-based explanations的缺点的,参考意义不大] Problems with Shapley-value-based explanations as feature importance measures
Indra Kumar · Suresh Venkatasubramanian · Carlos Scheidegger · Sorelle Friedler
摘要的翻译: 通过博弈论的方式归因网络输出到feature上的方法已成为“解释”机器学习模型的一种流行方式。这些方法定义了模型特征之间的协作游戏,并使用某种形式的游戏唯一Shapley值在这些输入元素之间分配影响。这些方法的论据有两个支柱:它们理想的数学特性,以及它们对特定解释动机的适用性。我们表明,当Shapley值用于特征重要性时会出现数学问题,缓解这些问题的解决方案必然会导致进一步的复杂性,例如因果推理的必要性。我们还利用其他文献来论证,夏普利价值观并未提供适合以人为中心的可解释性目标的解释(Shapley values do not provide explanations which suit human-centric goals of explainability)。
Explainable k-Means and k-Medians Clustering
Michal Moshkovitz · Sanjoy Dasgupta · Cyrus Rashtchian · Nave Frost
聚类是几何数据无监督学习的一种流行形式。不幸的是,许多聚类算法导致难以解释的聚类分配,部分原因是因为它们以复杂的方式依赖于数据的所有特征。为了提高可解释性,我们考虑使用较小的决策树将数据集划分为聚类,以便可以以简单的方式对聚类进行表征。我们从理论的角度研究这个问题,通过ķ-表示和 ķ-中位数目标:是否必须存在一个树木诱导的聚类,其成本可与最佳无约束聚类的成本相媲美;如果是,那么如何找到它?在负面结果方面,我们表明,首先,流行的自上而下的决策树算法可能会导致以任意大的代价进行聚类;其次,任何由树引起的聚类通常都必须导致Ω (对数k )与最佳聚类相比的近似因子。在积极方面,我们设计了一种有效的算法,该算法使用带有ķ树叶。对于两个均值/中位数,我们表明单个阈值削减足以实现常数因子近似,并且给出了几乎匹配的下限。对于一般ķ ≥ 2,我们的算法是 Ô (ķ ) 逼近最佳 ķ-中位数和 O (ķ2) 逼近最佳 ķ-手段。在我们进行工作之前,还没有任何算法能证明与尺寸和输入大小无关的保证。
Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis
Jung Yeon Park · Kenneth Carr · Stephan Zheng · Yisong Yue · Rose Yu
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
Omer Gottesman · Joseph Futoma · Yao Liu · Sonali Parbhoo · Leo Celi · Emma Brunskill · Finale Doshi-Velez
Explainable and Discourse Topic-aware Neural Language Understanding
Yatin Chaudhary · Hinrich Schuetze · Pankaj Gupta
When Explanations Lie: Why Many Modified BP Attributions Fail
Leon Sixt · Maximilian Granz · Tim Landgraf
Fairwashing explanations with off-manifold detergent
Christopher Anders · Plamen Pasliev · Ann-Kathrin Dombrowski · Klaus-robert Mueller · Pan Kessel
XXAI: Extending Explainable AI Beyond Deep Models and Classifiers [workshop]
Wojciech Samek · Andreas Holzinger · Ruth Fong · Taesup Moon · Klaus-robert Mueller
Autoencoder
Encoding Musical Style with Transformer Autoencoders
Kristy Choi · Curtis Hawthorne · Ian Simon · Monica Dinculescu · Jesse Engel
Forecasting sequential data using Consistent Koopman Autoencoders
Omri Azencot · N. Benjamin Erichson · Vanessa Lin · Michael Mahoney
Latent Bernoulli Autoencoder
Jiri Fajtl · Vasileios Argyriou · Dorothy Monekosso · Paolo Remagnino
Perceptual Generative Autoencoders
Zijun Zhang · Ruixiang ZHANG · Zongpeng Li · Yoshua Bengio · Liam Paull
Rate-distortion optimization guided autoencoder for isometric embedding in Euclidean latent space
Keizo Kato · Jing Zhou · Tomotake Sasaki · Akira Nakagawa
Educating Text Autoencoders: Latent Representation Guidance via Denoising
Tianxiao Shen · Jonas Mueller · Regina Barzilay · Tommi Jaakkola
Topological Autoencoders
Michael Moor · Max Horn · Bastian Rieck · Karsten Borgwardt
ControlVAE: Controllable Variational Autoencoder
Huajie Shao · Shuochao Yao · Dachun Sun · Aston Zhang · Shengzhong Liu · Dongxin Liu · Jun Wang · Tarek Abdelzaher
Learning Autoencoders with Relational Regularization
Hongteng Xu · Dixin Luo · Ricardo Henao · Svati Shah · Lawrence Carin
LSTM
Do RNN and LSTM have Long Memory?
Jingyu Zhao · Feiqing Huang · Jia Lv · Yanjie Duan · Zhen Qin · Guodong Li · Guangjian Tian
Data augmentation
- On the Generalization Effects of Linear Transformations in Data AugmentationSen Wu · Hongyang Zhang · Gregory Valiant · Christopher Re
- Improving Molecular Design by Stochastic Iterative Target AugmentationKevin Yang · Wengong Jin · Kyle Swanson · Regina Barzilay · Tommi Jaakkola
- VFlow: More Expressive Generative Flows with Variational Data Augmentation
Jianfei Chen · Cheng Lu · Biqi Chenli · Jun Zhu · Tian Tian
[已读] Self-supervised Label Augmentation via Input Transformations
Hankook Lee · Sung Ju Hwang · Jinwoo Shin
一种在全监督的设定下,做数据增强的方案。所提出的方法:
- 区别于传统的self-supervised learning task(multi-task leanring: 设定两个任务,一个原始任务一个附加任务,使用两个子网络进行实现,loss函数为两个任务的和),
本文提出的方法直接在同一个网络中同时进行原始任务+附加任务。
即:对于一个图片分类任务,传统方法:子网络1用于分类,子网络2用于区分数据增强方法。本文方法:训练一个$N+M$分类的网络,前者为数据类别数,后者为增强方法数。然后对网络的输出进行融合,学习一个条件概率分布,然后将该概率的loss加入到总的loss中。
- 区别于传统的self-supervised learning task(multi-task leanring: 设定两个任务,一个原始任务一个附加任务,使用两个子网络进行实现,loss函数为两个任务的和),
Distribution Augmentation for Generative Modeling
Heewoo Jun · Rewon Child · Mark Chen · John Schulman · Aditya Ramesh · Alec Radford ·
Ilya Sutskever