Trustworthy Machine Learning from Untrusted Models

Sponsoring Agency
National Science Foundation


Many of today's machine learning-based systems are not built from scratch, but are "composed" from an array of pre-trained, third-party models. Paralleling other forms of software reuse, reusing models can both speed up and simplify the development of ML-based systems. However, a lack of standardization, regulation, and verification of third-party ML models raises security concerns. In particular, ML models are subject to adversarial attacks in which third-party attackers or model providers themselves might embed hidden behaviors that are triggered by pre-specified inputs. This project aims at understanding the security threats incurred by reusing third-party models as building blocks of ML systems and developing tools to help developers mitigate such threats throughout the lifecycle of ML systems. Outcomes from the project will improve ML security in applications from self-driving cars to authentication in the short term while promoting more principled practices of building and operating ML systems in the long run.

One major type of threat incurred by reusing third-party models is model reuse attacks, in which maliciously crafted models ("adversarial models") force host ML systems to malfunction on targeted inputs ("triggers") in a highly predictable manner. This project develops rigorous yet practical methods to proactively detect and remediate such backdoor vulnerabilities. First, it will empirically and analytically investigate the necessary conditions and invariant patterns of model reuse attacks. Second, leveraging these insights, it will develop a chain of mitigation tools that detect potential backdoors, pinpoint triggers, and provide mechanisms to fortify adversarial models against these attacks. Third, it will establish a unified theory of adversarial models and adversarial inputs to deepen more general understanding of adversarial ML. Finally, it will implement all the proposed techniques and system designs in the form of a prototype testbed, which provides a unique research facility for investigating a range of attack and defense techniques. New theories and techniques developed in this project will be integrated into undergraduate and graduate education and used to raise public awareness of the importance of ML security.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.