Recent advancements in Artificial Intelligence (AI) have yielded groundbreaking change in many sectors, particularly in science and medicine. With these innovations, however, AI’s creators have publicly recognized that poorly conceived AI can also have some real downsides, particularly with respect to discrimination. Discrimination could be caused by bias built into AI at any stage due to bad or misinterpreted data, algorithmic errors, false assumptions, or incorrect conclusions.
Jim DeMarco, Director of Industry Digital Strategy, Insurance & Worldwide Financial Services, Microsoft
By design, AI evolves with every new use as the AI algorithm gains data to train for the next use. What happens if the AI algorithm initially operates as intended, but evolves into something inappropriate? Does a company’s responsibility for preventing discrimination in AI end at release, or even at intended use? Simply put, it does not—companies have an obligation and should be accountable to continue to monitor their AI after deployment “in the wild” to ensure the AI performs not only to purpose but also responsibly.
From the start, AI must be designed to serve a reasonable business purpose AND to provide a human benefit.
All technologies are developed to solve a particular problem. If the technology solves that problem, then it is suited to purpose; whether that purpose is reasonable depends on the problem. If a business designs technology to meet sales objectives that ultimately denies service to customers based on race or gender, then the reasonableness of that technology must be called into question.
AI should also serve a second purpose: it should also provide a positive human (including societal/economic) benefit. Human benefits of AI vary widely and can be slippery to define. Broadly speaking, seeking human benefits means that AI should not simply be developed to replace human judgment. AI can serve a human purpose by freeing people from unengaging tasks (automating data entry), reducing danger (self-driving cars), or working with more data or delicacy than a human could alone (fraud detection or cancer screening). Importantly, AI also fails the human benefit test if it introduces unnecessary bias in pursuit of the business purpose. From initial design through algorithm creation, applying Responsible AI principles means ensuring the AI passes the business purpose and human benefit tests. For instance, when facial recognition AI is applied to determine whether a truck driver is distracted or drowsy, it serves both the business purpose of ensuring safe transmittal of goods and the human benefit of protecting the driver’s and others’ safety.
The same AI may be responsible in one context and irresponsible in another.
Once an AI algorithm is released, by design the algorithm learns and changes with each new experience, possibly impacting fitness for purpose. This expansion takes two forms: (i) increasing utilization of AI in the same use case yielding different results (e.g., Microsoft’s experimental chatbot Tay, which in early 2016 was manipulated by outside users into issuing racist tweets); and (ii) applying otherwise acceptable AI to new use cases with unintended consequences. In both cases, the AI algorithm could continue to suit a reasonable business purpose but start to learn how to discriminate unfairly, failing human benefit.
Of particular concern is the extension of an AI algorithm to a new use case because such use cases are often hard to anticipate and detect. If the driver attention-monitoring AI noted above is applied to detect attention span for a fighter pilot in the cockpit, it would be expected that the algorithm will provide similar human and business benefits over time since the scenarios are highly similar.
If that same algorithm is then applied to detect distraction and attention span in an online job interview, however, the AI will generate a result that seems to fulfill a reasonable business purpose (identifying if candidates can remain engaged). However, it is easy to imagine how it could fail on human purpose because of a variety of unintended consequences. Neuro-diverse job candidates who cannot continue to look at the screen, people with small children actively playing in the background, people with poor internet connections, or even introverts can be evaluated as not paying sufficient attention. The ultimate result would be a blanket preference for hiring neurotypical, extroverted candidates in quiet locations with great internet connections, an unnecessary bias with clearly negative human impact. Extending AI to new use cases necessarily requires reconsideration of its fitness for business and human benefit.
Responsible AI doesn’t stop at the front door.
Discrimination by AI is challenging to detect and prevent because it is typically not known a bias exists until it is identified in the AI’s output. Even then, a simple audit of the AI in use may not always turn up discriminatory outputs, as audits are frequently conducted to determine only fitness for business purpose. AI should be continually and critically examined to determine whether the AI is generating an unfair result for the people with whom the AI is interacting. Companies should strive to create structures that ensure reasonable concerns raised about unintended consequences, whether by developers, users (in the above case, the job candidate), or even on social media, can be seriously considered.
Ethics-by-design is an approach to design that builds in ethical protection at the early stages of developing a product or service but continues throughout its use in the field. Risk assessments, such as Ethics Data Impact Assessments (EDIAs) as proposed by the Information Accountability Foundation, can be used to evaluate the potential ethical impacts of an AI algorithm before it is released, helping avoid negative unwanted consequences like racial discrimination. Establishing a governance structure through a Responsible AI committee would also create accountability for ethical risks and provide an appropriate forum to redress them.
Since discrimination can appear at various stages of an AI project, however, companies should consider conducting periodic risk assessments and leveraging AI monitoring tools post-production as well, asking whether fitness for purpose has yielded discrimination or other harmful human impacts. In the initial stages of the AI project, questions could address the source and population of the input data to assess its integrity; once the AI is in the field, risk assessments and tools could investigate the outputs to identify unexpected and unintended bias. AI practitioners can leverage current best practices such as glass-box optimization to ensure choices made by AI are explainable and traceable to intent.
If AI is found to be discriminatory in practice, it should be repaired; if repair is not possible, it should be removed.
Responsible AI must have a bright line: if the AI serves a reasonable business purpose but has unintended, discriminatory human consequences, it must be repaired or at least mitigated. If business and human purpose come into conflict, humans win, no matter what: AI that fails to support a human purpose must be repaired, remediated or retracted.
One example comes from a 2019 study published by the Haas School of Business at the University of California, Berkeley showing that while AI-driven decision-making can reduce implicit and explicit bias in mortgage underwriting, it can also produce disparate impacts against Latin and African-American borrowers. The Berkeley study speculates that, while AI reduces rate disparities by fully a third for these borrowers and completely eliminated discrimination in overall loan application acceptance, it also must rely on market factors that perpetuate some impermissible rate disparity, costing minorities more than $750M in extra interest fees annually. In this case, removing the AI would cause a worse result than mitigating its risk. Mitigation can take two forms: addressing the weakness in the model as applied to the data, or if that is not possible, providing ”post-AI” remediation such as a rate improvement program for customers meeting the criteria that cause the model weakness.
Once AI is released into the wild, its fitness for business purpose may make it hard to replace. But if there is a failure of fitness for human purpose, even if that occurs gradually over time, AI owners must nevertheless fix, mitigate, or ultimately replace the AI.
Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily represent the official policy or opinion of Microsoft.
About the Expert:
Jim DeMarco is Director of Industry Digital Strategy, Insurance, Worldwide Financial Services at Microsoft. As a leader in the financial services digital strategy team, Jim works with the executive leadership of Microsoft’s top insurance customers to develop deep, enduring digital partnerships that can transform the sector. Jim’s work focuses on real time insurance, cyber risk mitigation, data science ethics, reinventing customer experiences, and digitally transforming underwriting and claims. Collaborating with customers on their core business direction, Jim has also led strategic engagements supporting intentional cultural change in the digital age. Jim has been with Microsoft since 2015, building on a career in highly regulated industries. Prior roles include being CTO of multiple businesses, running strategic business units, product management and marketing organizations.