While researchers continue to study the effects of disproportionate minority contact with law enforcement on a range of health-related outcomes, a recent review of this work questions the methodological validity of most studies on this topic. Many of these concerns focus on (a) unrealistic assumptions about police behavior and (b) poor quality data. This project addresses both by introducing a human development based model of law enforcement officer (LEO) behavior and applying this model to study how LEOs identify with male minority youth (MMY) using a novel publicly available data source: broadcast police communications (BPC). Our long-term goals are (1) to assess the viability of BPC for understanding how LEOs perceive-reflective procedural language used before and during LEO-MMY LEO identification with minority youth, and (2) to determine if BPC may be used in lieu of non-public data sources to study police behavior by developing a novel computational strategy for extracting meaningful information from BPC.
The Penn State team will investigate two research directions at the intersection of privacy and natural language processing: (1) Identification and mitigation of privacy concerns in police radio communication transcripts: transcripts collected by team members at the University of Chicago will contain personally identifiable information (PII) for officers, individuals they interact with, and bystanders. The transcripts also indirectly reveal personal identities when combined with other public information sources. The dataset poses a challenge for privacy research: its source is publicly monitorable radio frequencies, yet its scope and its searchability mean that releasing the transcripts could result in harms for individuals who are identifiable in it. Our work will determine the appropriate level of sensitivity to treat the data, given the need to balance between privacy risks and the goals of open, reproducible science. (2) Processing of police transcripts: Our work will develop methods for detecting cues in vocabulary and discourse that predict outcomes in policing, with particular attention to adverse outcomes for police interactions with minority male youth. This will be a text classification task involving word embeddings, discourse parsing, sequence prediction, and an exploration of supervised and unsupervised machine learning methods to determine the most suitable models for the data. Although prior efforts have used natural language processing toward interpreting speech transcripts, none to our knowledge have explored a topic with acute ramifications for policing and law enforcement policy.