AI will be the judge of that - predictive algorithms in the public sector

Algorithms have been choosing our fate for years. From automated mortgage decisions to dating apps assessing our compatibility with a potential mate, machines are running the show. In spite of this, the controversy over UK exam grades being predicted by an algorithm has led many to question our reliance on computers over human judgement. Many feel troubled by the concept of algorithms assessing whether a criminal is likely to reoffend. However, by helping tackle the deep-seated problem of unconscious bias in the judicial system, could they be working towards the greater good?

Racial bias in the justice system is well documented. In England and Wales, black people are 40 times more likely to be subject to a ‘random’ stop and search than white people. Many have been calling for more attention to be given to addressing unconscious bias within the justice system, from policing through to the judiciary. The seminal book on understanding human decision making processes is ‘Thinking Fast and Slow’, by Daniel Kahneman. ‘Thinking fast’ is akin to gut instinct and is very susceptible to unconscious bias. If this type of thinking is what police tend to use when identifying a target to stop and search, this may partly explain the propensity for racial bias in stop and search procedures.

Judges have the luxury of ‘thinking slow’ when weighing up finely balanced decisions, which may mean they can take more measures to identify and counteract bias. However, making accurate decisions is fraught with issues. In ‘Thinking Fast and Slow’, Kahneman looks at parole judges in Israel. The default decision is denial of parole; only 35% of requests are approved. After each meal break, about 65% of requests are granted. The approval rate drops steadily, to about zero just before the meal. When the human decision making process is so frail, propping it up with computerised assistance is an attractive proposition.

Can algorithms make better decisions than judges?

One of the interesting things about using algorithms to determine whether an offender is likely to offend again is that humans are poor predictors of this. In fact, studies suggest that a trained judge predicts recidivism accurately 54% of the time; only slightly more accurate than a coin toss. By contrast, the COMPAS system, first developed in 1998, has an accuracy level of around 65% at assessing the likelihood of reoffending.

Whilst COMPAS is slightly better at predicting recidivism than a judge, it has been subject to serious criticism. The algorithm is complex, requiring 137 data points. This has made it susceptible to errors in data entry which have led to dangerous criminals being freed. Several academic studies have created models that provide a 65% success rate at assessing the likelihood of reoffending through considering only a small number of data points, including a defendant’s age and the number of previous convictions. This suggests a commercial incentive to build complexity into decision support systems in order to make them difficult to replicate or to give an appearance of sophistication.

The company that owns COMPAS have hidden it’s inner workings to protect their intellectual property. In 2016, an investigation by ProPublica analysed COMPAS assessments for more than 7,000 arrestees in Broward County, Florida, and published an investigation claiming that the algorithm was biased against African Americans. Specifically, while the algorithm predicted an individual’s risk of recidivism with equal accuracy across all groups, it was more likely to misclassify a black defendant as being in a higher risk band than it was a white defendant. Many papers have since been written speculating on the internals of the COMPAS algorithm, the most recent of which refutes the original study outright, indicating that ProPublica made incorrect assumptions on the method COMPAS uses to predict risk.

Regardless of the ground truth, why are we spending so much effort on reverse-engineering and debating the internals of COMPAS when we could be creating better recidivism prediction systems that are fully open for analysis? Lack of transparency is also one of the reasons the A-level prediction algorithm has infuriated so many. The Royal Statistical Society wrote to Ofqual, the exam regulator, in April with a warning about some of the challenges involved in estimating student grades. They called for an advisory panel with independent statisticians but were told by Ofqual that they would need to sign an NDA, which they were not prepared to do.

What next for predictive algorithms?

If predictive algorithms are to regain the public trust, transparency is vital. Systems that decide the fate of our citizens should never be purchased as black boxes from private companies. They should be public goods, developed with public money, open sourced and subject to continuous scrutiny and reevaluation. We should encourage simpler models with fewer inputs, so we can better understand what has been factored into a given prediction and can know when to override it. We must also consider the inevitable tradeoffs when using a particular predictive process to enable us to make informed decisions.

There is no silver bullet for assessing recidivism or for accurate grade prediction. However, we must see the opportunity presented by formalising predictive processes into algorithms. There is an opportunity to challenge the bias in human processes, to define as a society how we want these important decisions to be made and to ensure that the rules are applied consistently. We can only do this if there is transparency in the development of these systems and broad engagement in this process from a diverse range of perspectives.