Explainable AI (XAI) Methods for Convolutional Neural Networks

Introduction

This is the page of the undergraduate thesis of Antonio Fernando Silva e Cruz Filho e João Gabriel Andrade de Araujo Josephik, advised by Professor Nina Hirata.

The goal of the project is to study, explore and compare different explainability methods for Convolutional Neural Networks.

Proposal

Modern neural networks are very complex, formed by billions of parameters spread on multiple layers. Therefore, those systems are often seen as “black box” models: it’s possible to visualize the input and output, but not the internal mechanisms of the network.

In this context, multiple methods were developed over the years to shed a light in the “reasoning” behind the system decisions. We aim to study methods developed specifically for Convolutional Networks.

Throughout the course of this research project, several important topics were (or will be) be studied. These topics may include:

Feature Visualization models, such as DeepDream
Pixel attribution models, implementing GradCam
Using interpretable models to create explanations about Convolutional Neural Networks, using techniques like LIME.

Using the tools studied by this research, this project aims to contribute to existing knowledge about explainability and to possibly create new tools and techniques for explaining deep learning models.

Important Links

References

Selvaraju, Ramprasaath R., Cogswell, Michael, Das, Abhishek, Vedantam, Ramakrishna, Parikh, Devi, & Batra, Dhruv. (2019). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision, 128(2), 336-359. DOI: 10.1007/s11263-019-01228-7.
He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, & Sun, Jian. (2015). Deep Residual Learning for Image Recognition. CoRR, abs/1512.03385.
Russakovsky, Olga, Deng, Jia, Su, Hao, Krause, Jonathan, Satheesh, Sanjeev, Ma, Sean, Huang, Zhiheng, Karpathy, Andrej, Khosla, Aditya, Bernstein, Michael, Berg, Alexander C., & Fei-Fei, Li. (2015). ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575.
Erhan, Dumitru, Bengio, Y., Courville, Aaron, & Vincent, Pascal. (2009). Visualizing Higher-Layer Features of a Deep Network. Technical Report, Université de Montréal.
Hornik, Kurt, Stinchcombe, Maxwell, & White, Halbert. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359-366. DOI: 10.1016/0893-6080(89)90020-8.
Le, Hung, & Borji, Ali. (2017). What are the Receptive, Effective Receptive, and Projective Fields of Neurons in Convolutional Neural Networks?. CoRR, abs/1705.07049.
O’Shea, Keiron, & Nash, Ryan. (2015). An Introduction to Convolutional Neural Networks. CoRR, abs/1511.08458.
Ribeiro, Marco Tulio, Singh, Sameer, & Guestrin, Carlos. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. arXiv:1602.04938.
Molnar, Christoph. (2024). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. [Online; accessed 27-January-2025].
Olah, Chris, Mordvintsev, Alexander, & Schubert, Ludwig. (2017). Feature Visualization: How neural networks build up their understanding of images. Distill, [Online; accessed 27-January-2025].
Mordvintsev, Alexander, Olah, Christopher, & Tyka, Mike. (2015). Inceptionism: Going Deeper into Neural Networks. [Online; accessed 27-January-2025].
Simonyan, Karen, & Zisserman, Andrew. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556.
Miller, Tim. (2018). Explanation in Artificial Intelligence: Insights from the Social Sciences. arXiv:1706.07269.