Scientists use figures---such as bar charts, pie charts, or line charts---to explain concepts and show results. In scholarly articles, figure captions are essential to getting the message across effectively. Captions that are too generic or poorly written represent missed opportunities to communicate science effectively to readers. Unfortunately, low-quality captions still occur in published articles. Readers can only read what was given to them, and writers do not receive any in-situ support to help them compose clear, compelling figure captions. More fundamentally, writing captions for \sfigs is a challenging task: it requires making sense of the figure's imagery (which comes in various forms), deciding on an intended message the figure aims to convey, and composing a caption that fits the paper. All these steps happen in a highly specialized scientific domain, such as physics or computer science. It is hard, if not impossible, to provide meaningful, intelligent supports for such advanced writing tasks. This proposal tackles the grand challenge of automating captioning for scientific figures in scholarly articles. We aim to (i) help readers better understand scientific figures and papers, (ii) support writers to compose better captions, and (iii) push the boundaries of AI's ability to automatically generate well-written captions. We will develop neural vision-to-language models that produce highquality captions for scientific figures at scale. These captioning models will be developed using large-scale real-world data such as captions and figures extracted from CiteSeerX and arXiv.