Trust in Artificial Intelligence: Clinicians Are Essential

127 © Springer Nature Switzerland AG 2021 A. B. Bhatt (ed.), Healthcare Information Technology for Cardiovascular Medicine, Health Informatics, https://doi.org/10.1007/978-3-030-81030-6_10 Chapter 10 Trust in Artificial Intelligence: Clinicians Are Essential Umang Bhatt and Zohreh Shams 10.1  Introduction Artificial intelligence (AI) or the use of computational technologies that are inspired by human cognitive functions, is changing the fabric of daily life [1]. From natural language processing for computer readable electronic health records to medical image processing for clinical decision support systems, advances in AI are poised to change the method by which healthcare is delivered [2]. Like other safety and security critical domains, a lack of transparency in AI systems prevents the widespread use of these systems in day-to-day clinical practice. For AI systems to be fully integrated in healthcare practices, they need to be transparent so that the healthcare practitioners can judge when to trust an AI systems’ recommendation [3]. In this chapter, we begin with a brief overview of AI, then discuss mechanisms for AI systems to display trustworthiness to external stakeholders and examine the role of the cardiac practitioners in the development and deployment of AI systems. 10.2  Overview of Artificial Intelligence In the summer of 1956, a handful of scientists convened for the Dartmouth Summer Research Project on Artificial Intelligence. Most agree that this was when the term “artificial intelligence” was coined. To this day, the goal of AI remains the same: to build machines that simulate human intelligence [4, 5]. While simulating human intelligence is an ambitious goal, AI systems in their current form are best suited to augment, not automate humans [6]. U. Bhatt (*) · Z. Shams University of Cambridge, Cambridge, UK e-mail: usb20@cam.ac.uk; zs315@cam.ac.uk

128 For healthcare, the utility of AI does not lie in the ability to replace healthcare practitioners, but rather the ability to augment a healthcare practitioners’ expertise. The ideal AI system for a healthcare practitioner is one that works alongside the clinician to learn from their behavior and decisions. Several generations of complementaryAI systems have been developed and broadly divided into rule-based expert systems, Machine Learning (ML) Driven systems, and hybrid systems that combine expert and ML-Driven systems. 10.2.1  Expert Systems The role of an expert system is to mirror the reasoning used by an expert when making a decision. The main components are its knowledge base and a reasoning engine, which applies a set of If-Then rules to facts in the knowledge base in order to infer new facts [7, 8]. Example: One of the earliest expert systems, MYCIN [9], was a clinical system developed to identify bacteria causing severe infections, such as meningitis. Based on the diagnosis, MYCIN recommended antibiotics tailored to patients (i.e., adjusted for patients’ body weight). 10.2.2  Machine Learning (ML)-Driven Systems Machine learning applies a hybrid of statistics, computer science and electrical engineering to solve complex problems using large datasets [10]. Unlike expert systems, which rely on experts to provide the desired reasoning, ML-driven systems infer reasoning from data by extracting patterns and identifying interactions from observations and building models that extrapolate to unseen data. This is conducted by identifying interaction patterns among variables. This mechanism does not replace expert reasoning, however, can prove more valuable than user-generated rubric systems. Therefore, in clinical practice, ML can automate decision systems to help physicians make predictions with increased accuracy. Example: The data used for ML applications in healthcare could include patient information from an electronic health record, medical images, or a clinician’s notes. Sometimes labels (or targets) may be provided alongside training data, which indicate the true outcome corresponding to a given input. For example, suppose our training data is a set of medical images, say chest X-rays taken in a hospital. Each chest X-ray is labeled to indicate whether the patient has pneumonia (1) or does not have pneumonia (0). Given these input images and corresponding binary output labels, a ML model can be trained to predict whether a new chest X-ray image contains evidence of pneumonia [11]. The majority of current ML research focuses on creating algorithms to learn an accurate model that performs well on the training data and generalizes to unseen data. The ML-Driven systems in cardiology have followed the U. Bhatt and Z. Shams

129 same pipeline as explained above and have proven to be able to extract patterns from the data that generalize very well, for example in the diagnosis of heart failure [12]. 10.2.3  Hybrid Systems The most recent generation of healthcare AI systems attempt to rely on both expert knowledge and ML to make recommendations that exploit the known knowledge of practitioners while learning the unknown knowledge emerging from the data. Example: In [12] a hybrid system for the diagnosis of heart failure is proposed that first formalizes the decision-making of expert clinicians as a set of rules and then it augments that with the rules that come from ML algorithms modeling the cohort of patients with and without heart failure. The hybrid system has proven to be useful, particularly in absence of access to a heart failure specialist. The use of formalized expert knowledge in the format of ontologies (i.e., Gene Ontology [13]) and biological networks is also very common in hybrid approaches [14–17]. 10.3  Machine Learning in Healthcare There are three popular paradigms of machine learning: unsupervised, supervised and reinforcement learning. In supervised learning, algorithms use a dataset that has been labeled by clinicians, to predict a known outcome. Although supervised learning is ideal for classification and regression problems, it requires a lot of data and is time-consuming because the data has to be labeled by humans. The chest X-ray is a classic example of supervised learning. The ML model is trained on previous chest X-ray data and learns a relationship between the images and the pneumonia labels to then generalize when new chest X-rays are presented. Unsupervised learning involves training models from data without labels. This is used to parse out insight from the training data itself [18, 19]. Unsupervised learning seeks to identify novel disease mechanisms, genotypes, or phenotypes from patterns in the data, independent of human interpretation. Shah et al. [20] developed an unsupervised learning model to predict the survival of patients with heart failure with preserved ejection fraction (HFpEF). There were 46 distinct variables analyzed which led to three distinct groups. Supervised learning with human input was then utilized to predict the difference in desired outcomes (mortality and hospitalization) among the groups. A significant limitation of unsupervised learning is that the initial cluster pattern must be validated against other cohorts. Hedman et al. [21] also used machine learning to analyze 32 echocardiograms and 11 clinical and laboratory variables collected from 320 HFpEF outpatients in the Karolinska-Rennes cohort study (56% female, median 78 years; IQR: 71–83) identifying 6 phenogroups. 10 Trust in Artificial Intelligence: Clinicians Are Essential

130 Reinforcement learning can be crudely seen as a hybrid of supervised and unsupervised learning that aims to maximize the accuracy of algorithms using trial and error. It is well-suited for sequential decision making problems [22, 23]. A full survey of RL in healthcare that ranges from systems that decide treatments for chronic diseases to those that allocate resources within hospitals can be found in [24]. 10.3.1  Clinical Data Interpretation Powers AI Although ML quickly gained traction through the production of large datasets, it has taken more time to be adopted by the healthcare sector [25]. Access to electronic medical records, remote patient data and digital patient-driven data streams can assist in clinical decision making, exploration and discovery when assessed with powerful analytics. As a result of continued collaborations between clinicians and ML researchers, much progress has been made to develop robust methods that adequately consider the variety of data contained in medical data [26, 27]. While these efforts have increased the use of AI in healthcare, the availability of healthcare data is the keystone to unlocking the power of AI in healthcare. Medical data vary per case and are inherently complex. They can include anything from time series data to discrete measures, consisting of signal frequencies, medical images, or text descriptions. Each type of data requires a different type of preprocessing before being encoded as input features for an ML model. Since clinicians take all of these variables into consideration when deciding diagnosis, prognosis, and treatment, AI systems intended to help clinicians come to these decisions must learn parameters that reflect these considerations. Electronic health records contain rich data that are easily available however, they may be too complex to process [28]. Concurrently, neglecting potentially valuable data can lead to false diagnoses, which can incur unnecessary treatment expenses or cost patient lives. Nevertheless, researcher-clinician collaborations have proved to combat these challenges by informing data collection and processing with clinician expertise. Using a single data structure constructed using the entirety of each patient’s chart, Rajkomar et al. [28] were able to predict important clinical outcomes and measure readmission probability. 10.3.1.1  Decision Support In the healthcare sector, we can augment clinical decision-making by using automated AI systems. Decision support AI systems suggest courses of action but do not implement any actions, therefore, the decision-making power remains with the healthcare practitioner [23]. Augmenting healthcare intelligence takes many forms and clinicians can use AI system outputs to diagnose more efficiently and accurately at a lower cost. U. Bhatt and Z. Shams

131 To build a productive healthcare practitioner-AI system, the healthcare practitioner needs to be aware of how the AI system works including its input, model, and output. Historically, decision support systems in clinical decision making have not demonstrated improvements in patient outcomes during randomized control trials [29]. However, the ability for computers to efficiently handle multi-modal data and the predictive power of AI systems has improved significantly. Modern decision support systems can now provide personalized healthcare [30] in various domains [31–33]. AI-enabled support for clinical decision-making based on imaging (e.g., X-ray, mammogram) is particularly advanced. De Fauw et al. [34] propose a referral recommendation whose performance in diagnosing retinal disease reaches or exceeds that of experts on a range of visually impairing retinal diseases. In cardiology, the interpretation of echocardiograms using ML has recently shown considerable potential [35]. ML has also been employed to identify genotypes associated with common symptoms of heart disease [36, 37]. While these support systems have been promising, personalized cardiovascular medicine delivered by AI can go beyond simple image interpretation and genotype association. 10.3.1.2  Exploration Exploration refers to the use of AI systems to explore new biological processes and aid scientific understanding of medicine [38, 39]. In exploration, healthcare practitioners need to guide AI systems to answer noteworthy questions: Is there a relationship that a practitioner wants to test? Is there a pattern in phenotypic data that one can mine from genomic data? Poplin et al. [40] were able to make hypotheses about risk factors for cardiovascular disease using retinal fundus photographs. The implemented ML model extracted unforeseen features of importance when predicting cardiovascular risk factors: this augments the ability of a healthcare practitioner by directing future research and guiding diagnostic practices. Exploration aims to verify conjectures that healthcare practitioners have based on experience. The main paradigms of learning for exploration are supervised and reinforcement learning. Exploration takes advantage of the massive amount of patient data recorded in health records and collected during clinical trials. Supervised learning can extract behavioral insights from sensor-collected data, e.g., heart monitors. Exploration A study of participant-reported physical activity and sleep duration from a wrist activity monitor was used [41] to train a ML model that identified the activity the participant was engaged in. Ground-truth data was acquired from a camera, which was annotated with the activity of interest providing ground truth labels for supervised learning. The model identifies high-level trends in lifestyle health behaviors, which the authors suggest can influence future public health guidelines. 10 Trust in Artificial Intelligence: Clinicians Are Essential

132 10.3.1.3  Discovery In discovery, AI can help reveal unknown patterns and motivate healthcare practitioners to develop randomized control trials to verify patterns observed in the data. Healthcare practitioners can provide guidelines that dictate what data is fed into the AI system and how the discoveries are made. Some relevant questions might be: Is there a space of potential drugs in a large search space (i.e., the space of all possible chemical compounds) wherein one may find a cure? Is there a pattern one can learn from clustering similar data together? Discovery aims to reveal unknown patterns within large datasets [42]. Differing characteristics across subgroups enable AI systems to model their underlying distinctions. Sometimes the patterns mined from the data could lead to new scientific discoveries. 10.4  Trustworthiness Mechanisms AI systems can either deliver or fail on the task at hand. In successful cases, people expect AI to act in a verifiably correct and predictable manner; such behavior highlights the trustworthiness of AI. In the case of failure, people should hold an AI accountable for its actions: either AI should transparently provide explanations for why it did what it did, or it should convey its limitations a priori. Trustworthiness ensures systems are predictable, transparent, and robust. It is comprised of competence, reliability, and honesty [45]. Transparency is a mechanism via which AI systems can display their trustworthiness to stakeholders. It allows stakeholders to audit systems to see if the system behaves as desired. Transparency into the training procedures/setup and into AI system innards are equally important. 10.4.1  Predictability Imagine a diagnostic AI system that leverages electronic health records to make suggested diagnoses to healthcare practitioners [46]. If the system performs well Discovery A genomic score was created [43] to stratify individuals based on their risk trajectories for coronary artery disease. Based on an ML model, they suggested early life genomic screening as an additional risk assessment tool for coronary artery disease. The ability to develop a genomic score, which has practical utility, lies in advances in genome sequencing and lies in AI for genomics. Recent work has also found that AI can be used for discovering novel “genotypes and phenotypes in heterogeneous cardiovascular diseases, such as Brugada syndrome, HFpEF, Takotsubo cardiomyopathy, HTN, pulmonary hypertension, familial atrial fibrillation, and metabolic syndrome” [44]. U. Bhatt and Z. Shams

133 over time (that is, provides the correct diagnosis most of the time) the healthcare practitioners will begin to view the system’s behavior as predictably correct and trustworthy. As long as an AI system can convey why it has failed, healthcare practitioners are likely to accept an under-performing AI system [47]. Predictability captures the ability for the AI system to correctly complete the task it was trained to do. Predictable AI systems behave in line with a stakeholder’s mental model. Repeat interactions with AI systems help foster trust between the system and the practitioner [48]. The AI system may be uncertain upon seeing a new patient that is unlike any patients in the training set. When faced with this new example, it is important that the system communicates its predictive uncertainty of its diagnosis with the healthcare practitioner [49]. In order to build a trustworthy relationship with healthcare practitioners, AI systems ought to account for real-world uncertainties in deployment. In the face of uncertainty, the healthcare practitioner making a decision can intervene and revert to their clinical judgement. The AI community has developed algorithms that may be suitable for conveying this predictive uncertainty to non-ML-expert stakeholders [50–53]. If a traditional AI system was tasked with predicting if a chest X-ray had pneumonia or not (two options), these newer systems include a third option, a reject option (also called abstain option or colloquially an “I don’t know” option [54]. Selective prediction (also known as reject option classification or learning with abstention) could have large promise in clinical decision making [55]. When an AI system is ambiguous, it is important it defers to healthcare practitioners, who can leverage their expertise to make a decision accordingly. 10.4.2  Procedural Transparency Procedural transparency entails conveying information about how an AI system is trained [56]. It may expose proprietary information, however disclosures like these can help healthcare practitioners understand the functionalities and limitations of the AI systems in question. Procedural transparency is paramount to building practitioner-AI trust and to ensuring safe adoption of AI in healthcare. Procedural Transparency includes properties of the model used (developers, version, licensing, etc.), intended use cases for the model (primary use, out-of-scope use cases, intended users), details about the training data used (diversity, preprocessing, feature selection), performance metrics (decision thresholds, qualitative results, unitary/intersectional analyses), and ethical considerations [57–60]. 10.4.3  Algorithmic Transparency Algorithmic transparency generally refers to explainability, but also encompasses other concepts such as uncertainty [61]. Algorithmic transparency can provide information on a global level that summarizes the model behavior for multiple 10 Trust in Artificial Intelligence: Clinicians Are Essential

134 data points or the entire training dataset. Algorithmic transparency can also provide information on a local level explaining an individual prediction [62] via multiple methods. Feature importance asks which features are important to the model when doing prediction [63–65]. Sample importance answers which training points were most important to a particular prediction [66–68]. Counterfactual explanations note what needs to change in an input in order to change the outcome [69–71]. Explainability develops models that provide information about how the model came to its decision [72]. One study interviewed healthcare practitioners to explore their clinical decision-making in practice. The participants were given a case study [47] where an ML model was embedded in the electronic health record system and monitored patients in the ICU to assess the likelihood that they would experience cardiac arrest. The practitioners prioritized identifying any discrepancies between the subset of input features responsible for the model’s outcome (feature importance) and their clinical judgement. Explainability can also be applied to augment a healthcare practitioner’s decision-­ making and analysis, such as echocardiogram interpretation. Alsharqi et al. [73] were able to use ML to efficiently, accurately, and reliably interpret echocardiograms, with performance comparable to that of a clinician alone. Zhang et al. [74] were able to support serial patient tracking and analyze echocardiograms on a large scale using a novel analysis pipeline, including identification, image segmentation, structure and function quantification, and disease detection. Their model was able to successfully segment cardiac chambers using a convolutional neural network trained per the segmentation method. 10.4.4  Robustness In addition to transparency and predictability, AI systems must be robust to shifts in input distributions [79] and to adversarial attacks [80]. Robustness is a system’s ability to withstand outliers and other variation during system deployment Distribution Algorithm Research in deep learning for medical diagnostics has also developed explanations that reason like clinicians [75]. Similar to how a ML model uses input features to reach an output, medical professionals learn how to proactively search for risk predictors upon seeing a patient [76]. Research in AI is now trying to mirror how medical professionals use current data as well as past experiences with patients to inform decision making. For example, if a doctor treated a rare disease over a decade ago, then that patient can be crucial when attributes alone are uninformative about how a doctor should proceed [77]. This is the equivalent to using “close” training points (past patients) to explain an unseen test point (current patient) [78]. Thus, algorithmic transparency provides much promise in healthcare. U. Bhatt and Z. Shams

135 shift refers to when the inputs received in deployment differ from the inputs that the AI system was trained on. Covariate shift captures what happens when the input features differ from training to testing [79]. Label shift refers to when the output distributions vary from training to test [79]. For example, if a ML model is trained on geriatric patients but then deployed in the neonatal ward, the model may not perform as expected on younger patients. In some cases, the effects of age may be negligible to the model; however, it is important for the healthcare practitioner to clarify if such a shift is acceptable. While it is possible to create a model for a specific task or problem, recent work has shown that a hospital-specific approach with tailored models for each institution is the most efficient way to robustly augment clinician performance [81]. Since hospitals usually have unique systems for electronic health records, one model may not generalize from one hospital to another. 10.5  Artificial Intelligence Alongside Healthcare Practitioners Healthcare practitioners play a pivotal role in building and deploying trustworthyAI systems [82]. They can intervene in input engineering, model development, clinical deployment, and model correction. 10.5.1  Input Engineering Healthcare professionals are essential in ensuring that the inputs and features learned by an AI system are biologically relevant and feasible. They are most familiar with context specific information and can help fill in the gaps where the training data might be lacking. The combination of data analysis and clinical intuition plays a fundamental role in cardiovascular disease management. Input engineering can incorporate human domain expertise into the learning process through collaboration with cardiovascular clinicians. 10.5.2  Model Development As ML engineers develop models for use in medical contexts, healthcare professionals have the opportunity to assist in model development. Most importantly, healthcare professionals can help with verifying what the model has learnt is biologically relevant by inspecting the explanation generated for its predictions. In healthcare, these techniques can pave the way for clinicians to actively participate in the model selection and to ensure that ultimately the model not only quantitatively be accurate but also it is qualitatively relevant. Feature engineering, where the expertise of the 10 Trust in Artificial Intelligence: Clinicians Are Essential

136 domain experts is considered in what the model pays the most attention to is another way of having healthcare professionals involved in model development [83]. 10.5.3  Clinical Deployment Having clinical experts integrally involved in developing models and engineering the input that feeds the model plays an important role in the trustworthiness of the developed system. Upon AI deployment, healthcare practitioners should use their clinical judgment about the system’s recommendation and whether it should be taken, rejected or adjusted. Conveying the degree of predictive certainty by the clinical decision support system when making a prediction can guide this judgment, especially if the system is equipped with explanation facilities. Such explanation techniques facilitate the interaction of the users with the system to gain more insight into the recommendation proposed and to subsequently trust it. Such interactions can help the system self-correct and learn from the adjustment or overrides administered by experts [84]. 10.5.4  Model Correction If a deployed AI system contains errors, there is an opportunity for a healthcare practitioner to intervene and correct the systems’ behavior. The learned models may have high accuracy but can accidentally learn spurious correlations between features (that is, the true signal might be masked by noise in the data) [85]. Interactive ML comes to the rescue to involve healthcare practitioners directly in the model correction phase. Enabling healthcare practitioners to identify how to correct the errors (or how to change the model’s reasoning on specific inputs) is crucial to successful deployment of AI systems in healthcare. 10.6  Conclusion The potential of AI in healthcare is limitless. By encouraging interaction between healthcare practitioners and AI systems, society unlocks more potential since AI extends a healthcare practitioner’s acumen with the predictive power of machines. Aligned with core human values, an AI system earns a healthcare practitioner’s trust and gives them the agency to do more. While current applications of AI systems in clinical settings are limited to ML-driven systems, AI systems can be used to explore more than just datasets for insights. They can be used to discover new drugs or treatments, or to provide decision support to practitioners. For an AI system to show its trustworthiness to healthcare practitioners, the AI system must display U. Bhatt and Z. Shams

137 predictability, procedural transparency, algorithmic transparency, and robustness. Healthcare practitioners are essential to the successful deployment of AI systems: they can engineer inputs to feed into models, can guide model selection, can assess models in deployment (overriding as needed), and can correct model behavior. The merit of AI in healthcare comes from AI deployed responsibly with the healthcare practitioner in mind. Acknowledgments UB acknowledges support from DeepMind and the Leverhulme Trust via the Leverhulme Centre for the Future of Intelligence and from the Partnership on AI. References 1. Grace K, Salvatier J, Dafoe A, Zhang B, Evans O. When will AI exceed human performance? evidence from AI experts. J Artif Intell Res. 2018;62:729–54. 2. Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2(10):719–31. 3. LaRosa E, Danks D. Impacts on trust of healthcare AI. In: Proceedings of the 2018 AAAI/ ACM Conference on AI, Ethics, and Society, ACM; 2018. p. 210–5. 4. McCarthy J, Minsky ML, Rochester N, Shannon CE. A proposal for the Dartmouth summer research project on artificial intelligence, August 31, 1955. AI Mag. 2006;27(4):12. 5. Engelbart DC. Augmenting human intellect: a conceptual framework, Menlo Park, CA. 1962. 6. Pasquinelli M. Augmented intelligence. Critical keywords for the digital humanities. 2014. 7. Lucas P, van der Gaag L. Principles of expert systems. Boston (MA): Addison-Wesley Longman Publishing Co., Inc.; 1991. 8. Ledley RS, Lusted LB. Reasoning foundations of medical diagnosis symbolic logic, probability, and value theory aid our understanding of how physicians reason. Science. 1959;130(3366):9–21. 9. Shortlie E, Buchanan B. A model of inexact reasoning in medicine. Math Biosci. 1975;23:351–79. 10. Bishop CM. Pattern recognition and machine learning. New York: Springer; 2006. 11. Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz C, Shpanskaya K, et al. Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv [Preprint] arXiv:1711.05225. 2017. 12. Choi D-J, Park JJ, Taqdir A, Lee S. Artificial intelligence for the diagnosis of heart failure. NPJ Digit Med. 2020;3:54. 13. The Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Res. 2018;47(D1):D330–8. 14. Jaber MI, Song B, Taylor C, et al. A deep learning image-based intrinsic molecular subtype classier of breast tumors reveals tumor heterogeneity that may a detect survival. Breast Cancer Res. 2020;22:12. 15. Ma T, Zhang A. Incorporating biological knowledge with factor graph neural network for interpretable deep learning. arXiv [Preprint] arXiv:1906.00537. 2019. p. 11. 16. Crawford J, Greene CS. Incorporating biological structure into machine learning models in biomedicine. Curr Opin Biotechnol. 2020;63:126–34. 17. Rhee S, Seo S, and Kim S. Hybrid approach of relation network and localized graph convolutional ltering for breast cancer subtype classification. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18; AAAI Press; 2018. p. 3527–34 18. Raza K, Singh NK. A tour of unsupervised deep learning for medical image analysis. arXiv [Preprint] arXiv:1812.07715. 2018. 10 Trust in Artificial Intelligence: Clinicians Are Essential

138 19. Alashwal H, El Halaby M, Crouse JJ, Abdalla A, Moustafa AA. The application of unsupervised clustering methods to alzheimer’s disease. Front Comput Neurosci. 2019;13:31. 20. Shah SJ, Katz DH, Deo RC. Phenotypic spectrum of heart failure with preserved ejection fraction. Heart Fail Clin. 2014;10(3):407–18. 21. HedmanÅK, et al. Identification of novel pheno-groups in heart failure with preserved ejection fraction using machine learning. Heart. 2020;106(5):342–9. 22. Yauney G and Shah P. Reinforcement learning with action-derived rewards for chemotherapy and clinical trial dosing regimen selection. In: Proceedings of the 3rd machine learning for healthcare conference, volume 85 of proceedings of machine learning Research; PMLR; 2018. p. 161–226 23. Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit Med. 2020;3(1):1–10. 24. Yu C, Liu J, Nemati S. Reinforcement learning in healthcare: a survey. arXiv [Preprint] arXiv:1908.08796. 2019. 25. Kuan R. Adopting AI in health care will be slow and difficult. 2019. https: //hbr.org/2019/10/ adopting-ai-in-health-care-will-be-slow-and-difficult 26. Oh J, Wang J, Tang S, Sjoding M, Wiens J. Relaxed parameter sharing: Effectively modeling time-varying relationships in clinical time-series. arXiv [Preprint] arXiv:1906.02898. 2019. 27. Goyal D, Syed Z, and Wiens J. Clinically meaningful comparisons over time: an approach to measuring patient similarity based on subsequence alignment. arXiv [Preprint] arXiv:1803.00744. 2018. 28. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1(1):18. 29. Anchala R, Pinto MP, Shrou A, Chowdhury R, Sanderson J, Johnson L, Blanco P, Prabhakaran D, Franco OH. The role of Decision Support System (DSS) in prevention of cardiovascular disease: a systematic review and meta-analysis. PLoS One. 2012;7(10):e47064. 30. Yoon J, Davtyan C, van der Schaar M. Discovery and clinical decision support for personalized healthcare. IEEE J Biomed Health Inform. 2016;21(4):1133–45. 31. Epstein AS, Zauderer MG, Gucalp A, Seidman AD, Caroline A, Fu J, Keesing J, Hsiao F, Megerian M, Eggebraaten T, et al. Next steps for IBM Watson oncology: scalability to additional malignancies. 2014. 32. Gilbert FJ, Astley SM, McGee MA, Gillan MGC, Boggis CRM, Griths PM, Duy SW. Single reading with computer-aided detection and double reading of screening mammograms in the United Kingdom National Breast Screening Program. Radiology. 2006;241(1):47–53. 33. Baek J-H, Ahn S-M, Urman A, KimYS, Ahn HK, Won PS, Lee W-S, Sym SJ, Park HK, Chun Y-S, et al. Use of a cognitive computing system for treatment of colon and gastric cancer in South Korea. J Clinical Oncol. 2017;35 34. De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, Askham H, Glorot X, O’Donoghue B, Visentin D, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24(9):1342–50. 35. Ghorbani A, Ouyang D, Abid A, et al. Deep learning interpretation of echocardiograms. NPJ Digit Med. 2020;3:10. 36. Oguz C, Sen SK, Davis AR, Fu Y-P, O’Donnell CJ, Gibbons GH. Genotype-driven identification of a molecular network predictive of advanced coronary calcium in ClinSeq® and Framingham Heart Study cohorts. BMC Syst Biol. 2017;11(1):99. 37. Burghardt TP, Ajtai K. Neural/bayes network predictor for inheritable cardiac disease pathogenicity and phenotype. J Mol Cell Cardiol. 2018;119:19–27. 38. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform. 2017;22(5):1589–604. 39. Gil Y, Greaves M, Hendler J, Hirsh H. Amplify scientific discovery with artificial intelligence. Science. 2014;346(6206):171–2. U. Bhatt and Z. Shams

139 40. Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, Peng LH, Webster DR. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. 2018;2:158–64. 41. Willetts M, Hollowell S, Aslett L, Holmes C, Doherty A. Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK biobank participants. Sci Rep. 2018;8(1):1–10. 42. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23(6):1241–50. 43. Inouye M, Abraham G, Nelson CP, Wood AM, Sweeting MJ, Dudbridge F, Lai FY, Kaptoge S, Brozynska M, Wang T, Ye S, Webb TR, Rutter MK, Tzoulaki I, Patel RS, Loos RJF, Keavney B, Hemingway H, Thompson J, Watkins H, Deloukas P, Emanuele Di Angelantonio, Adam S. Butterworth, John Danesh, Nilesh J. Samani, and . Genomic risk prediction of coronary artery disease in 480,000 adults. J Am Coll Cardiol, 72(16):1883–1893, 2018. ISSN 0735–1097. doi: https://doi.org/10.1016/j.jacc.2018.07.079. https://www.onlinejacc.org/content/72/16/1883 44. Krittanawong C, Zhang H, Wang Z, Aydar M, Kitai T. Artificial intelligence in precision cardiovascular medicine. J Am Coll Cardiol. 2017;69(21):2657–64. 45. O’Neill O. Linking trust to trustworthiness. Int J Philos Stud. 2018;26(2):293–300. 46. Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J. Doctor AI: predicting clinical events via recurrent neural networks. In: Machine Learning for Healthcare Conference; 2016. p. 301–18 47. Tonekaboni S, Joshi S, McCradden MD, Goldenberg A. What clinicians want: contextualizing explainable machine learning for clinical end use. In: Machine learning for healthcare conference; 2019. p. 359–80 48. Ferrario A, Loi M, Vigano E. In AI we trust incrementally: a multi-layer model of trust to analyze human-artificial intelligence interactions. Philos Technol. 2019:1–17. 49. Kale A, Kay M, and Hullman J. Decision-making under uncertainty in research synthesis: designing for the garden of forking paths. In: Proceedings of the 2019 CHI conference on human factors in computing systems; 2019. p. 1–14. 50. Gal Y, Ghahramani Z. Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: International conference on machine learning; 2016. p. 1050–1059 51. Subbaswamy A, Saria S. Counterfactual normalization: proactively addressing dataset shift using causal mechanisms. In: 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI; Association For Uncertainty in Artificial Intelligence (AUAI). 2018. p. 947–57. 52. Zhang Y, Vera Liao Q, Bellamy RKE. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In: Proceedings of the 2020 conference on fairness, accountability, and transparency, FAT* ‘20; New York, NY, USA, Portland (OR): Association for Computing Machinery; 2020. p. 295–305. ISBN 9781450369367. doi: https:// doi.org/10.1145/3351095.3372852. 53. Antoran J, Bhatt U, Adel T, Weller A, Hernandez-Lobato JM. Getting a CLUE: a method for explaining uncertainty estimates. arXiv [Preprint] arXiv:2006.06848. 2020. 54. Wiener Y, El-Yaniv R. Agnostic selective classification. In: Advances in neural information processing systems; 2011. p. 1665–1673. 55. Hanczar B, Dougherty ER. Classification with reject option in gene expression data. Bioinformatics. 2008;24(17):1889–95. 56. Selbst AD, Boyd D, Friedler SA, Venkatasubramanian S, Vertesi J. Fairness and abstraction in sociotechnical systems. In: Proceedings of the conference on fairness, accountability, and transparency; 2019. p. 59–68 57. Gebru T, Morgenstern J, Vecchione B, Wortman Vaughan J, Wallach H, Daumee H III, Crawford K. Datasheets for datasets. arXiv [Preprint] arXiv:1803.09010. 2018. 58. Deborah Raji I, Yang J. ABOUT ML: annotation and benchmarking on understanding and transparency of machine learning lifecycles. arXiv [Preprint] arXiv:1912.06166. 2019. 59. Arnold M, RKE B, Hind M, Houde S, Mehta S, Mojsilovic A, Nair R, Natesan Ramamurthy K, Olteanu A, Piorkowski D, et al. Factsheets: increasing trust in ai services through supplier’s declarations of conformity. IBM J Res Dev. 2019;63(4/5):6–1. 10 Trust in Artificial Intelligence: Clinicians Are Essential

140 60. Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, Spitzer E, Raji ID, Gebru T. Model cards for model reporting. In: Proceedings of the conference on fairness, accountability, and transparency; 2019. p. 220–9 61. Bhatt U, Xiang A, Sharma S, Weller A, Taly A, Jia Y, Ghosh J, Puri R, Moura JMF, Eckersley P. Explainable machine learning in deployment. In: Proceedings of the 2020 conference on fairness, accountability, and transparency; 2020. p. 648–57. 62. Brundage M, Avin S, Wang J, Beleld H, Krueger G, Hadeld G, Khlaaf H, Yang J, Toner H, Fong R, et al. Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv [Preprint] arXiv:2004.07213. 2020. 63. Ribeiro MT, Singh S, Guestrin C. “Why should I trust you?” explaining the predictions of any classier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 1135–44 64. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Advances in neural information processing systems; 2017. p. 4765–74 65. Davis B, Bhatt U, Bhardwaj K, Marculescu R, Moura JMF. On network science and mutual information for explaining deep neural networks. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE; 2020. p. 8399–403 66. Koh PW, Liang P. Understanding black-box predictions via influence functions. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70; JMLR.org; 2017. p. 1885–94. 67. Yeh C-K, Kim JK, Yen IEH, Ravikumar PK. Representer point selection for explaining deep neural networks. In: Advances in neural information processing systems; 2018. p. 9291–301. 68. Khanna R, Kim B, Ghosh J, Koyejo S. Interpreting black box predictions using Fisher kernels. In: The 22nd International Conference onArtificial Intelligence and Statistics; 2019. p. 3382–90 69. Wachter S, Mittelstadt B, Russell C. Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv J Law Technol. 2018;31(2). 70. Dhurandhar A, Chen P-Y, Luss R, Tu C-C, Ting P, Shanmugam K, Das P. Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Advances in neural information processing systems; 2018. p. 592–603. 71. Ustun B, Spangher A, LiuY. Actionable recourse in linear classification. In: Proceedings of the conference on fairness, accountability, and transparency; 2019. p. 10-19, 72. Kwon BC, Choi M-J, Taery Kim J, Choi E, Bin Kim Y, Won SK, Sun J, Choo J. RetainVis: visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE Trans Vis Comput Graph. 2018;25(1):299–309. 73. Alsharqi M, Woodward WJ, Mumith J-A, Markham D, Upton R, Leeson PT. Artificial intelligence and echocardiography. Echo Res Pract. 2018;5:R115–25. 74. Zhang J, Gajjala S, Agrawal P, Tison GH, Hallock LA, Beussink-Nelson L, Lassen MH, Fan E, Aras MA, Jordan CR, Fleischmann KE, Melisko M, Qasim A, Shah SJ, Bajcsy R, Deo RC. Fully automated echocardiogram interpretation in clinical practice. Circulation. 2018;138:1623–35. 75. Bhatt U, Davis B, Moura JMF. Diagnostic model explanations: a medical narrative. In: AAAI Spring Symposium: interpretable AI for well-being; 2019. 76. Evangelista A, Gallego P, Calvo-Iglesias F, Bermejo J, Robledo-Carmona J, Sanchez V, Saura D, Arnold R, Carro A, Maldonado G, et al. Anatomical and clinical predictors of valve dysfunction and aortic dilation in bicuspid aortic valve disease. Heart. 2018;104(7):566–73. 77. Dorr Goold S, Lipkin M Jr. The doctor–patient relationship: challenges, opportunities, and strategies. J Gen Intern Med. 1999;14(Suppl 1):S26. 78. Bhatt U, Weller A, Moura JMF. Evaluating and aggregating feature-based model explanations. arXiv [Preprint] arXiv:2005.00631. 2020. 79. Quiñonero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND. Dataset shift in machine learning. Cambridge (MA): The MIT Press; 2009. 80. Finlayson SG, Bowers JD, Ito J, Zittrain JL, Beam AL, Kohane IS. Adversarial attacks on medical machine learning. Science. 2019;363(6433):1287–9. U. Bhatt and Z. Shams

141 81. Oh J, Makar M, Fusco C, McCaffrey R, Rao K, Ryan EE,Washer L,West LR,YoungVB, Guttag J, et al. A generalizable, data-driven approach to predict daily risk of clostridium difficile infection at two large academic health centers. Infect Control Hosp Epidemiol. 2018;39(4):425–33. 82. Ghassemi M, Pushkarna M, Wexler J, Johnson J, and Varghese P. ClinicalVis: supporting clinical task-focused design evaluation. arXiv [Preprint] arXiv:1810.05798, 2018. 83. Roe KD, Jawa V, Zhang X, Chute CG, Epstein JA, Matelsky J, Shpitser I, Overby Taylor C. Feature engineering with clinical expert knowledge: a case study assessment of machine learning model complexity and performance. PloS One, 2020;15(4) e0231300 . 84. Raghu M, Blumer K, Corrado G, Kleinberg J, Obermeyer Z, Mullainathan S. The algorithmic automation problem: prediction, triage, and human effort. arXiv [Preprint] arXiv:1903.12220. 2019. 85. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17(1):195. 10 Trust in Artificial Intelligence: Clinicians Are Essential

RkJQdWJsaXNoZXIy MTYzOTI3MA==