Identifying and Measuring Artificial Intelligence (AI) Bias for Enhancing Health Equity

As part of the Artificial Intelligence (AI) Program in the FDA’s Center for Devices and Radiological Health (CDRH), the goal of this regulatory science research is to understand and measure bias and improve assessment of AI model generalizability.

Overview

Bias has a range of definitions in technical literature, law, and everyday usage. In this Artificial Intelligence Program, we define bias as a systematic difference in treatment of certain objects, people, or groups in comparison to others, where treatment is any kind of action, including perception, observation, representation, prediction, or decision (ISO/IEC TR 24027:2021). Health equity is a priority for CDRH, and we recognize it as advancing the development of knowledge and safe and effective technologies to meet the needs of all patients and consumers.

There is considerable concern in the AI community that AI models may (typically inadvertently) worsen inequalities in health care delivery. A major regulatory science gap in the regulation of AI-enabled medical devices include fundamental methods that analyze training and test methods to understand, measure, and minimize bias, and characterize performance for subpopulations. This is closely related to the generalizability and robustness of the AI-enabled models, where one is interested in preserving model performance under naturally induced variations, including variations between subpopulations. There is a need to understand the conditions under which AI-enabled medical devices provide generalizable and robust output to reasonably assure their safety and effectiveness.

Projects

Tackling Sex Bias in AI for Severity Assessment of COVID-19
Visual Feature Auditing of Imaging Classification Models to Identify Subgroups with Poor Performance
Unsupervised Deep Clustering for Subgroup Identification within Medical Image Datasets

Overview of the proposed approach to characterize a model’s decision regions. A vicinal distribution of virtual samples is created using linear interpolation along the plane between a triplet of three samples in the input space. Model classification of the virtual samples allows for the mapping of the decision space to the input space, and visualization of a region of the decision space. Aggregation of the composition of decision regions from a multitude of triplets provides insight into the model’s behavior on samples beyond the available data set.

Resources

“DRAGen: Decision Region Analysis for Generalizability,” Catalog of Regulatory Science Tools, 2024.
Burgon A, Petrick N, Sahiner B, Pennello G, Cha K and Samala RK. Decision Region Analysis for Generalizability (DRAGen) of AI models: Estimating model generalizability in the case of cross-reactivity and population shift, Journal of Medical Imaging, 11(1), 014501-014501. 2024.
Sidulova M, Sun X, and Gossmann A. “Deep Unsupervised Clustering for Conditional Identification of Subgroups Within a Digital Pathology Image Set.” In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2023), pp. 666-675. 2023.

For more information, email OSEL_AI@fda.hhs.gov.

Overview

Projects

Resources

Subscribe to CDRH Science