Advanced Trace-oriented Binary Code Analysis


Sponsoring Agency
National Science Foundation


Binary code analysis is very attractive from a security viewpoint. First, in many tasks such as malware analysis, software plagiarism detection, and vulnerability exploration, the source code of the program under examination is often absent, and the analysis has to be done on binary code initially. Second, even if the source code is available, binary analysis allows us to reason the real instructions executed on hardware and avoid the well-known “What You See Is Not What You Execute” problem. Third, some program behaviors such as cache access only exhibit in the low-level code.

Binary code analysis is faced with an increasing challenge caused by emerging, readily available code obfuscation techniques. Traditional signature-based malware detection is often problematic as it relies on file hashes and bye (or instruction) signatures which are not very resilient to obfuscation. This project tackles the challenge by proposing several advanced methods that combine techniques from behavior and semantics perspectives. The proposed methods leverage formal program semantics, symbolic execution, automated constraint solving, and algorithmic memorization of code semantics that form solid foundations with rigorous resilience properties to latest attacks.