Here is Shaoyu

Post-hoc interpretability

2020/12/29 12:00:00 2020/12/29 12:00:00 paper list survey interpretability

研究问题描述

深度学习模型的事后（post-hoc）可解释方法：给定一个已训练的深度神经网络模型，如何对其输出进行解释。

领域现状

基于叠加的方法 (Superimposition-based explanation)

这类方法将网络的输出归因到网络的输入上，显式的指出网络输入中的每个维度对网络输出的贡献程度，

优点：直观
缺点：有时会生成令人误解的解释，如当网络将已知的无关输入视为重要判断依据时

[1]LIME：通过对输入施加轻微的扰动，以探测黑盒模型的输出变化，优化一个可解释模型（线性模型或tree-based model）局部近似黑盒模型的预测。

[2]SHAP：基于博弈理论（shapley value, a game-theory based method），计算网络输入的每一维对网络输出的贡献（也是将黑盒模型做局部近拟，使其具有可解释性）

[3]saliency map：训练一个遮罩模型（masking model）以识别最影响分类器决断的输入特征。

[4]Integrated Gradients：以输入特征在某个路径上的梯度积分（ Integrated Gradients ）作为该特征的重要性得分。

基于例子的方法 (example-based explanation)

这类方法针对某个待解释的案例，依照某种策略生成一组案例（a set of example），这组案例一般是支持网络对待解释案例做出判断的主要依据。通过人类直观的对比这组案例与待解释的案例，总结出的差异与共同点将被视为一种直接的解释。

优点：易于理解
缺点：依赖于训练集的质量与数量

[10]: 通过在训练样本上施加一个微小的扰动，观察该扰动对所训练出的模型权重的影响，以此来确定网络在对给定样本进行预测时，是哪些训练样本起到了决定性作用。

[11]: 搜索给定样本在深度学习模型的每一层特征空间中的最近邻，其邻居构成的合集即为用于解释给定样本的训练集（给定样本与该集合中的样本共性即为他们获得相同标签的原因）。

代表性论文10篇

基于叠加的方法

经典方法

[1]: Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In SIGKDD. ACM, 1135–1144. (LIME)

[2]: Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In NeurIPS. 4765–4774. (SHAP)

[3]: Piotr Dabkowski and Yarin Gal. 2017. Real Time Image Saliency for Black Box Classifiers. In NeurIPS. 6967–6976. (saliency map)

[4]: Sundararajan, M., Taly, A., & Yan, Q. (2017, August). Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 3319-3328).

其他工作

[5]: Ismail, A., Gunady, M., Bravo, H., & Feizi, S. (2020). Benchmarking Deep Learning Interpretability in Time Series Predictions. Advances in Neural Information Processing Systems Foundation (NeurIPS).

[6]: Giurgiu, I., & Schumann, A. (2019). Explainable failure predictions with rnn classifiers based on time series data. arXiv preprint arXiv:1901.08554.

[7]: Mujkanovic, F., Doskoč, V., Schirneck, M., Schäfer, P., & Friedrich, T. (2020). timeXplain–A Framework for Explaining the Predictions of Time Series Classifiers. arXiv preprint arXiv:2007.07606.

[8]: Nguyen, T. T., Le Nguyen, T., & Ifrim, G. (2020, September). A Model-Agnostic Approach to Quantifying the Informativeness of Explanation Methods for Time Series Classification. In International Workshop on Advanced Analytics and Learning on Temporal Data (pp. 77-94). Springer, Cham.

[9]: Shankaranarayana, S. M., & Runje, D. (2019, November). ALIME: Autoencoder based approach for local interpretability. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 454-463). Springer, Cham.

基于例子的方法

经典方法

[10]: Koh, P. W., & Liang, P. (2017, July). Understanding Black-box Predictions via Influence Functions. In International Conference on Machine Learning (pp. 1885-1894).

[11]: Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765, 2018.

其他工作

[12]: Kim, B., Rudin, C., & Shah, J. A. (2014). The bayesian case model: A generative approach for case-based reasoning and prototype classification. Advances in neural information processing systems, 27, 1952-1960.

[13]: Jeyakumar, J. V., Noor, J., Cheng, Y. H., Garcia, L., & Srivastava, M. (2020). How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods. Advances in Neural Information Processing Systems, 33.

[14]: Papernot, N., & McDaniel, P. (2018). Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765.

[15]: Ming, Y., Xu, P., Qu, H., & Ren, L. (2019, July). Interpretable and steerable sequence learning via prototypes. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 903-913).

[16]: Ma, D., Wang, Z., Xie, J., Guo, B., & Yu, Z. (2020, November). Interpretable Multivariate Time Series Classification Based on Prototype Learning. In International Conference on Green, Pervasive, and Cloud Computing (pp. 205-216). Springer, Cham.

[17]: Keane, M. T., & Kenny, E. M. (2019, September). How case-based reasoning explains neural networks: A theoretical analysis of XAI using post-hoc explanation-by-example from a survey of ANN-CBR twin-systems. In International Conference on Case-Based Reasoning (pp. 155-171). Springer, Cham.

经典论文or强相关论文

[11]: Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765, 2018.

异同点

[11, 13]: 分析的是有监督分类模型中的基于案例的可解释问题，但仅能说明某样本为何被判定为某类，而不能做出反事实（counterfactual:）解释，即“某样本为何不是某类”。无法从中提取出可以用于分类的语义信息

[11, 13]: 和我们的工作均从隐空间入手，分析待测样本的邻居，而我们进一步的分析了待测样本与其邻居、聚类中心样本之间的差异，以帮助我们总结有助于区分正异常的语义信息。

Self-explaining DNN

1. 研究问题描述
2. 领域现状
1. 2.1. 基于叠加的方法 (Superimposition-based explanation)
2. 2.2. 基于例子的方法 (example-based explanation)
3. 代表性论文10篇
1. 3.1. 基于叠加的方法
2. 3.2. 基于例子的方法
4. 经典论文or强相关论文