Deep neural network (DNN)-powered systems and services hold great promise to fundamentally transform the way people live, work and play. Yet, to fully unleash this potential, it is critical to improve their interpretability to make them more trustworthy and easy-to-use. The transformative nature of this project is to completely rethink how to define and implement the interpretation of DNNs and how to exploit this interpretability as a bridge to understand and control the DNN behaviors. The success of this project will not only improve the reliability, interactivity and operability of DNN-powered systems, but also promote more principled practice of building and using machine learning systems in general. The research products will be applicable to fields including machine learning, cyber-security and human-computer interaction.
This project aims to develop RIDDLE, a new interpretable deep learning framework that is reliable, because it deploys built-in defenses against adversarial manipulations; interactive, because it provides interfaces and mechanisms to perform in-depth, interactive analysis of DNN dynamics; and debuggable, because it employs interpretability as the lens for users to effectively control DNN behaviors. Along the three directions, the specific tasks of this project include: exploring the vulnerabilities of existing interpretation models to adversarial manipulations, uncovering their root causes, and developing practical defense mechanisms, designing an expressive interpretation algebra framework to allow users to flexibly construct interactive analysis tools for a variety of DNNs and tasks, which circumvents the "one-size-fits-all" challenge, and building interpretation-based model debugging techniques, which allow users to effectively localize and fix model defects.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.