Cities Should Look At Los Angeles’ History of Big Data Policing and Avoid Its Mistakes

By guest author Jonathan Hofer
November 4, 2019

Nearly ten years ago, the Los Angeles Police Department was at the forefront of what was called a “data revolution” in law enforcement techniques. The strategy was ambitious—combine machine learning, data mining, and predictive analytics to predict certain crimes, “much like scientists forecast earthquake aftershocks”.

At a time when bulk data collection and analysis was experiencing rapid growth, “predictive policing” offered police departments a new tool to help better direct their resources. Predictive programs would display “hotspots” around the city where the algorithm predicted a crime was likely to occur. Prior to Los Angeles’ formal adoption of the program, the Santa Cruz Police Department did a trial run for the PredPol predictive policing program in 2011. It was seen as a revolutionary step forward in the evolution of law enforcement and was even named one of the best inventions of the year by Time Magazine.

Los Angeles’ quickly became the national poster child for predictive policing, which made the police department inspector general’s recent admission that the program may have had no effect on crime all the more stunning. Los Angeles is far from alone in that regard. A few California cities have already dropped out of using predictive techniques after weighing the costs and benefits, having found little reason to believe the program was worthwhile.

Inefficacy and Bias In Predictive Policing

This admission by the police department emboldened privacy activists who were critical of the program and the possible racial biases inherent in the system. Critics of big data policing have long pointed out that while the algorithms themselves are digitally generated, the data for the algorithms relies on human collection. This makes the predictions subject to a number of potential problems.

Of particular concern for predictive policing, identified by The Electronic Frontier Foundation, is a vulnerability to “feedback loops”. That is, when one feeds a predictive model historic data, it can amplify any potential biases in that data set. This can happen when suspects are not treated equally. For example, black drivers are generally stopped by police at a higher rate than white drivers and some police departments disproportionately dispatch license plate cameras to places of worship or to low income areas. Because these populations are subject to police interactions at a higher frequency, a self-fulfilling prophecy of criminal suspicion can occur.

To the extent that communities are patrolled at a higher rate relative to other communities, that data will be overrepresented in the statistics. The algorithm is then liable to direct police officers to patrol such areas more heavily, and the cycle continues. For instance, if the predictive model is trained on the data of minority male criminal suspects, the output “will inevitably determine a 19 year old black male from a poor neighborhood is a criminal”, irrespective of that individual’s true propensity for crime.

When this happens, the predictive output of the algorithm effectively becomes a new input for the algorithm. This could lead to a higher prediction of criminality in low income communities, ethnic enclaves, or religious or political minorities, even if their actual crime rate is low. A ProPublica report found that offender based modeling, which is used to predict recidivism, frequently misidentifies low-risk black defendants as high-risk for reoffending.

The predictive policing civilian oversight committee in Los Angeles specifically raised this concern, questioning whether or not the program disproportionately targeted black and Latino residents. The sentiment was echoed by members of the police commission, who also questioned why more information on the program was unavailable. In response, Los Angeles Police Chief Moore said he disagreed that the program targeted certain racial groups, but that the police department would adopt reforms when needed.

Despite these concerns, the trend of using machine learning and analytics for law enforcement purposes shows no signs of slowing down. A recent report found that the rate at which artificial intelligence equipped surveillance technology was adopted by law enforcement agencies exceeded expert projections. A 2014 survey of police department representatives reported that 70 percent of departments expected to implement the PredPol program within the next five years. With the wide scale adoption of predictive policing, safeguards are necessary and other cities should take note of Los Angeles’ reforms as a guide in what not to do.

Proposals for Reform

In response to the inspector general’s report and concerns over potential biases, the Los Angeles Police Department announced it would be implementing reforms in the coming years and maintain program oversight.

Yet, the proposed changes lack teeth. In addition to creating a new unit to seek input from community groups before considering new data programs, the only other change of note is that officers will no longer log onto computers to record the time they spend in the “hotspots” of the predictive crime “heatmap”.

The “changes” proposed by Los Angeles Police Department leaders do not adequately address the issue of efficacy or potential biases. Advocates of predictive policing downplay the risk of systematic bias by insisting that demographic information is not fed to the algorithm. However, this objection is of little relevance when predictive policing is specifically concerned with directing police resources to specific parts of the city. People who share the same ethnicity, income, religion, and political affiliation tend to live in proximity to one another. Not entering the timestamp on an officer’s computer when they are in a hotspot does nothing to address this issue.

Instead, best practices of other types of municipal data collection and surveillance should also be applied to predictive policing. There needs to be transparency. The public should be confident that the methodology used is not relying on pseudoscience. If there are no regular disclosures of internal audits and independent oversight, there can be no assurance that the program is effective.

Second, collected data should be limited and its retention capped. Other types of data, such as biometrics like facial recognition technology or license plate reader technology should not be stored on the same database. The combination of different datasets enables invasive surveillance and should be limited to specific investigations. Furthermore, data should not be kept in perpetuity. This helps mitigate the effects of possibly tainted historical data from skewing the algorithm. Basically, data needs to be regularly “cleared”.

Herein lies the ultimate problem: Los Angeles adopted a big data policing program without having a proper framework in place first. Cities should instead be proactive in crafting policies that govern the use of these technologies prior to adoption. Policies that protect people’s rights and require government accountability should be required before implementation. Ineffective and liberty-threatening programs should not be beta tested on residents. Cities should also note how municipalities without these policies put taxpayers at risk by exposing the city to big lawsuits.

Jonathan Hofer is an Editorial and Acquisitions Intern at the Independent Institute.