Recently, significant effort has gone into deciding what is the appropriate space for algorithmic decision making in domestic law.[1] From discussions about the constitutionality of police officers’ use of algorithms to justify probable cause, to debates about use of recidivism algorithms in parole hearings, and the use of machine learning to aid judges in deciding on relevant precedents – domestic legal scholars have been actively discussing the consequences of a changing world to our domestic norms.[2] International legal scholars and international humanitarian law (IHL) need to follow suit.[3]
Key issues that emerge for domestic law with respect to algorithmic decision-making include issues of transparency, explainability, bias, fairness and effectiveness. Similar issues can also present an obstacle for our ability to apply key principles of IHL, principles like discrimination, proportionality, and precaution. As the effectiveness of actions in war, due to reliance on more and more sophisticated machine learning algorithms, goes up, the space for human decision making, or even our understanding of why someone is assessed as ‘civilian’ or how a potential course of action was generated by an algorithm, goes down. IHL must keep up and it must do that in two ways. First, there need to be international legal norms that impose limits on the type of algorithms that are used and the way machine algorithms are trained whenever they are used to augment military decision-making. Second, we need to reassess how we assign responsibility for actions in war. One way to start on these important tasks is to look to domestic law and the ways in which domestic law has dealt with algorithmic decision-making.
For example, in a recent paper on probable cause jurisprudence Kiel Brennan-Marquez asks us to imagine a Contraband Detector (CD)- an algorithm that a police officer has on their phone that would allow them to point at a person and identify whether or not they are likely to have contraband on them.[4] An algorithm that would likely run behind such technology would be a machine learning algorithm trained on big data sets over variables like neighborhood crime statistics, changes in posture or angles of clothes when hiding contraband, gaze detection that identifies “shifty” behavior, etc. Centrally for our purposes the question that arises is whether a judge should accept data from CD as grounds for a probable cause search by the officer. With a CD or something like CD used in the process of decision-making by the officer, the judge would have no way to assess whether the variables used and the way they were used in the process of identifying someone as potentially having contraband violate our Constitution and thus she would have no way to balance various juridical goods. This example illustrates to what extent and why there is a certain level of transparency and explainability that is needed when law is applied to algorithmic decision-making.
Scholars have also recently written on issues regarding using algorithmic pre-trial risk assessment, used to predict the likelihood that a defendant will have some specific pre-trial outcome if released.[5] Audits of such algorithms have repeatedly shown that they produce more false positives for Black men. In addition to such statistical bias, some of these algorithms have also been shown to have societal bias. Both statistical and societal bias can affect whether we are justified in trusting and using an algorithm for decision-making.
In addition to these two within law examples of how machine learning algorithms have been affecting legal decision-making, there are also numerous examples of domestic law trying to catch up with regulating algorithmic decision-making in health care, HR hiring decisions, banking, surveillance technology, etc. For example, we have seen significant issues with using racially biased algorithms in health care. Skewed and unrepresentative training data sets resulted in well-known cases of racially biased health care and bad outcomes.[6] Similarly, cases abound of biased HR hiring algorithms. Such algorithms often replicate social injustices and exacerbate inequalities.[7] Domestic law has been trying to build auditing frameworks and regulation to minimize harms of bias and to assure appropriate levels of transparency and explainability. IHL needs to follow suit and try to anticipate some of these key issues as well. IHL’s key principles like discrimination, proportionality, precaution (necessity) might all be affected by the presence of algorithmic decision making. More precisely, our ability to post hoc adjudicate whether an action was discriminate or proportionate and necessary might be affected by the presence of algorithms in the decision-making chain. This is because the application of IHL at times depends on ability to understand why a certain strategic or tactical decision was made.
To make sense of this consider how the so-called OODA (observe, orient, decide, act) loop is affected by the presence of algorithms. A pilot’s (or a drone operator’s) observations might be augmented in a number of ways through algorithmic processes and big data. The information they receive might be using infra-red data or audible data processed in a way that presents to the pilot as a claim that there are X number of people on the ground with or without weapons- or by specifying who is or is not a civilian given a range of factors only “known” to the algorithm. In the orientation step, the algorithm might aid a pilot in assessing the likelihood of killing one of those people, or the likelihood that their considered action will result in desired outcome (e.g. that a weapons cache will be destroyed and that enemy combatants will be killed). More advanced algorithms might further present alternative courses of action to the decision-maker and/or recommend one of them. In each step of the OODA loop that uses a machine learning algorithm, that algorithm narrows the range of options presented to the end user/decision-maker. In cases when the algorithm is actually capable of creating alternative courses of action, such narrowing of choices is the starkest, but even simple algorithmic augmentation of how data is gathered and processed in the orientation step also filters down the range of alternative courses of action.
When the filtering of alternatives and the narrowing of choices and the way they are presented to the end user is sufficiently narrow and opaque it is hard to know how to assign responsibility for actions, or understand why and whether something was in fact proportionate, necessary, intentional, or appropriately discriminatory. This can be for a number of reasons, but primarily it is because increasingly sophisticated machine learning algorithms have lower transparency and therefore lower explainability. And often, much like in domestic law, explainability is necessary to judge the legality of some action. For example, if I am an officer trying to decide whether I am willing to risk the life of person X who is 60% likely to be a civilian, it would be useful to know why I should think she is a civilian. Such knowledge might help identify best ways to mitigate the harms to that potential civilian, or remedy unintended harms, as well as adjudicate whether I was justified in pursuing a course of action that risked the life of that putative civilian. There is a number of way that the IHL can change in light of these worries, including:
- Require relative effectiveness: Only use algorithmic augmentation of decision-making when it significantly increases operational effectiveness. International community ought to impose specific (numerical) ways of assessing increased effectiveness as compared to non-machine-learning-algorithm or AI augmented decision-making.
- Require Transparency and Explainability: Only use algorithms that have sufficient meta-tools (that meet well-specified and varied conditions of explainability). When civilians are wrongly killed in an action where assessments of proportionality or necessity were augmented by AI it might be that the machine wrongly identified a civilian as a combatant and therefore the human in the loop executed an action on flawed facts, or it might be that the machine rightly identified person as a civilian, but the machine (or the person aided by the machine in a substantial way) decided wrongly that this civilian can be killed (e.g. flawed assessment of proportionality or necessity/precaution). An explainability condition on all AI weaponry would allow us to decide who is responsible for a mistake in each case and whether it is a violation of discrimination condition or proportionality or precaution principle.
- Require Fairness: Only use algorithms that use variables compatible with basic tenets of IHL and have comparable false positive/false negative rates across protected categories.
- Require Clear Responsibility Chains: Only use algorithms in settings where there are clear responsibility chains for each decision. Militaries’ increased reliance on AI to make decisions requires that for each new place where a human decision is replaced or augmented in a meaningful way by algorithms the assignment of responsibility rules get updated in appropriate ways. IHL should make explicit responsibility chains a requirement for use of decision-making algorithms.
As we delegate more and more of life-or-death decision-making to machines, we ought to make sure our ways of working, policies, ROEs and laws are appropriately changed to reflect such changes in decision-making. The above proposal is a sketchy proposal for starting on such changes.
Notes:
[1] Kiel Brennan-Marquez, “Plausible Cause: Explanatory Standard in the Age of Powerful Machines,” Vanderbilt Law Review vol. 70, 2017; Alexandra Chouldekova, Kristian Lum “The Present and Future of AI in Pre-Trial Risk Assessment Instruments,” http://www.safetyandjusticechallenge.org/wp-content/uploads/2020/06/AI-in-Pre-Trial-Risk-Assessment-Brief-June-2020-R2.pdf?fbclid=IwAR1y0U5HpwaHdlLm4jRTzEVrhqlCNHMV8T255qhelciximyAG9R5JHd0Jpo, Diamantis, Mihailis, “Algorithms Acting Badly,” 89 Geo Wash L Rev, 2020; Brookings Institute, “Fairness in Algorithmic Decision-Making,” https://www.brookings.edu/research/fairness-in-algorithmic-decision-making/; Selbst, Andrew, “Disparate effects in big data policing,” 52 GA L Rev 109, 196, 2017.
[2] Brennan-Marquez, “Plausible Cause,”, Chouldekova, et.al, “Present and Future of AI in Pre-Trial Risk Assessment Instruments,” Angwin, et.al, “Machine Bias,” in ProPublica https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing; Eckhouse, L., Lum, K., Conti-Cook, C., & Ciccolini, J. (2018). Layers of Bias: A Unified Approach for Understanding Problems With Risk Assessment. Criminal Justice and Behavior, 46(2), 185–209. https://doi.org/10.1177/0093854818811379; Carlson, A. M. (2017). The Need for Transparency in the Age of Predictive Sentencing Algorithms. Iowa Law Review; Avery, J. (2019). An Uneasy Dance with Data: Racial Bias in Criminal Law. Southern California Law Review; Mckay, C. (2019). Predicting risk in criminal procedure: actuarial tools, algorithms, AI and judicial decision-making. Current Issues in Criminal Justice, 32(1), 22–39. https://doi.org/10.1080/10345329.2019.1658694
[3] I use the terms LOAC and IHL interchangeably here.
[4] Brennan-Marquez, “Probably Cause.”
[5] Chouldekova, Lum “The Present and Future of AI in Pre-Trial Risk Assessment Instruments,” page 2.
[6] Grote and Beret “Algorithmic decision making in health-care,”: Morley “Debate on the Ethics of AI in Medicine,” https://philpapers.org/archive/MORTDO-58.pdf
[7] Miranda Bogen, “All the ways hiring algorithms can introduce bias,” Harvard Business Review, 2019 https://hbr.org/2019/05/all-the-ways-hiring-algorithms-can-introduce-bias