Can AI Mitigate Bias in Writing Letters of Recommendation?

doi:10.2196/51494

Editorial

¹Department of Internal Medicine (adjunct), Southern Illinois University School of Medicine, Toronto, ON, Canada

²JMIR Publications, Toronto, ON, Canada

³CommonSpirit Health, Chicago, IL, United States

⁴Creighton University School of Medicine, Omaha, NE, United States

⁵Division of Internal Medicine, Thomas Jefferson University, Philadelphia, PA, United States

⁶Department of Medicine, Emory University School of Medicine, Atlanta, GA, United States

Corresponding Author:

Tiffany I Leung, MPH, MD

JMIR Publications

130 Queens Quay East

Unit 1100

Toronto, ON, M5A 0P6

Canada

Phone: 1 416 583 2040

Email: tiffany.leung@jmir.org

Letters of recommendation play a significant role in higher education and career progression, particularly for women and underrepresented groups in medicine and science. Already, there is evidence to suggest that written letters of recommendation contain language that expresses implicit biases, or unconscious biases, and that these biases occur for all recommenders regardless of the recommender’s sex. Given that all individuals have implicit biases that may influence language use, there may be opportunities to apply contemporary technologies, such as large language models or other forms of generative artificial intelligence (AI), to augment and potentially reduce implicit biases in the written language of letters of recommendation. In this editorial, we provide a brief overview of existing literature on the manifestations of implicit bias in letters of recommendation, with a focus on academia and medical education. We then highlight potential opportunities and drawbacks of applying this emerging technology in augmenting the focused, professional task of writing letters of recommendation. We also offer best practices for integrating their use into the routine writing of letters of recommendation and conclude with our outlook for the future of generative AI applications in supporting this task.

JMIR Med Educ 2023;9:e51494

doi:10.2196/51494

Keywords

sponsorship (3); implicit bias (3); gender bias (10); bias (52); letters of recommendation (2); artificial intelligence (1758); large language models (200); medical education (545); career advancement (1); tenure and promotion (1); promotion (25); leadership (23)

Letters of recommendation play a significant role in higher education and career progression, particularly for women and underrepresented groups in medicine and science. Letters of recommendation include any letter written to support or sponsor an individual for a job [Schmader T, Whitehead J, Wysocki VH. A linguistic comparison of letters of recommendation for male and female chemistry and biochemistry job applicants. Sex Roles. 2007;57(7-8):509-514. [FREE Full text] [CrossRef] [Medline]1,Bernstein RH, Macy MW, Williams WM, Cameron CJ, Williams-Ceci SC, Ceci SJ. Assessing gender bias in particle physics and social science recommendations for academic jobs. Soc Sci. Feb 14, 2022;11(2):74. [CrossRef]2], internship [Houser C, Lemmons K. Implicit bias in letters of recommendation for an undergraduate research internship. J Furth High Educ. Apr 24, 2017;42(5):585-595. [CrossRef]3], or training position [Grimm LJ, Redmond RA, Campbell JC, Rosette AS. Gender and racial bias in radiology residency letters of recommendation. J Am Coll Radiol. Jan 2020;17(1 Pt A):64-71. [CrossRef] [Medline]4]; a scholarship or grant; an award or recognition; a promotion; or other important professional milestones. For example, letters of support for a job application may be used in so-called round 1 selection stages, even before a candidate interviews for a position. This means that such letters and evaluations, as well as the language used to describe a candidate, can significantly, even if unintentionally, influence a hiring committee's consideration of an individual’s candidacy. Already, there is evidence to suggest that written letters of recommendation contain language that expresses implicit biases, or unconscious biases [Madera JM, Hebl MR, Martin RC. Gender and letters of recommendation for academia: agentic and communal differences. J Appl Psychol. Nov 2009;94(6):1591-1599. [CrossRef] [Medline]5,Trix F, Psenka C. Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society. Mar 2003;14(2):191-220. [CrossRef]6], and that these biases occur for all recommenders regardless of the recommender’s sex [Madera JM, Hebl MR, Dial H, Martin R, Valian V. Raising doubt in letters of recommendation for academia: Gender differences and their impact. J Bus Psychol. Apr 26, 2018;34:287-303. [CrossRef]7]. Given that all individuals have implicit biases that may influence language use, there may be opportunities to apply contemporary technologies, such as large language models (LLMs) or other forms of generative artificial intelligence (AI), to augment and potentially reduce implicit biases in the written language of letters of recommendation. Although AI has been used to analyze recommendation letter content for bias via, for example, natural language processing and sentiment analysis [Sarraf D, Vasiliu V, Imberman B, Lindeman B. Use of artificial intelligence for gender bias analysis in letters of recommendation for general surgery residency candidates. Am J Surg. Dec 2021;222(6):1051-1059. [CrossRef] [Medline]8] or automated text mining [Heath JK, Weissman GE, Clancy CB, Shou H, Farrar JT, Dine CJ. Assessment of gender-based linguistic differences in physician trainee evaluations of medical faculty using automated text mining. JAMA Netw Open. May 03, 2019;2(5):e193520. [FREE Full text] [CrossRef] [Medline]9,Alexander CS. Text mining for bias: A recommendation letter experiment. American Business Law Journal. Apr 06, 2022;59(1):5-59. [CrossRef]10], there remains an unexplored potential opportunity to apply AI to generate letters, especially with the aim of reducing bias.

As of May 2023, some of the authors had one-on-one conversations with medical faculty peers or leaders and even heard conference plenary speakers explicitly endorse subscribing to generative AI services, such as ChatGPT Plus [Introducing ChatGPT Plus. OpenAI. URL: https://openai.com/blog/chatgpt-plus [accessed 2023-06-11] 11], to help them specifically with writing letters of recommendation. It is very likely that there are many professionals who apply such services, yet little to no exploration of the potential opportunities and pitfalls has been reported on this application of generative AI. In this editorial, we provide a brief overview of existing literature on the manifestations of implicit bias in letters of recommendation, with a focus on academia and medical education. We then highlight potential opportunities and drawbacks of applying this emerging technology in augmenting the focused, professional task of writing letters of recommendation. We also offer best practices for integrating their use into the routine writing of letters of recommendation and conclude with our outlook for the future of generative AI applications in supporting this task. For the purposes of this editorial, we focus on letters of recommendation, although the presence of bias in performance evaluations and assessments [Klein R, Julian KA, Snyder ED, Koch J, Ufere NN, Volerman A, et al. Gender Equity in Medicine (GEM) workgroup. Gender bias in resident assessment in graduate medical education: Review of the literature. J Gen Intern Med. May 2019;34(5):712-719. [FREE Full text] [CrossRef] [Medline]12-Dayal A, O'Connor DM, Qadri U, Arora VM. Comparison of male vs female resident milestone evaluations by faculty during emergency medicine residency training. JAMA Intern Med. May 01, 2017;177(5):651-657. [FREE Full text] [CrossRef] [Medline]15], especially in medical training, is also a well-recognized phenomenon. It may be possible to apply some of the key points raised in this editorial similarly to writing performance evaluations.

Implicit bias is a type of bias that arises from unconscious associations and stereotypes about members of a social group. Often, bias is based on gender, race, ethnicity, ability, language proficiency, or any aspect of one’s identity. Gendered language usage occurs in medicine, health care, and professions and areas beyond our usual areas as physicians; the World Bank noted in a 2019 report that “[a]ttitudes toward women are also influenced by gendered languages…gendered languages could translate into outcomes like lower female labor force participation” [Gendered languages may play a role in limiting women’s opportunities, new research finds. The World Bank. Jan 24, 2019. URL: https://www.worldbank.org/en/news/feature/2019/01/24/gendered-languages-may-play-a-role-in-limiting-womens-opportunities-new-research-finds [accessed 2023-06-11] 16].

Gendered terms are words that are associated with a specific gender. Various studies have noted that gendered language appears in letters of recommendation for academic faculty, science, and medicine [Madera JM, Hebl MR, Martin RC. Gender and letters of recommendation for academia: agentic and communal differences. J Appl Psychol. Nov 2009;94(6):1591-1599. [CrossRef] [Medline]5]. Specifically, categories of terms include communal terms (eg, “caring,” “nurturing,” “attentive,” or “kind”), which occur more frequently in recommendation letters for women, and agentic terms (eg, “confident,” “assertive,” “outspoken,” or “ambitious”), which occur more frequently in recommendation letters for men [Madera JM, Hebl MR, Martin RC. Gender and letters of recommendation for academia: agentic and communal differences. J Appl Psychol. Nov 2009;94(6):1591-1599. [CrossRef] [Medline]5]. In a study by Trix and Psenka [Trix F, Psenka C. Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society. Mar 2003;14(2):191-220. [CrossRef]6], the adjective “successful” occurred in 7% and 3% of letters for men and women, respectively, while the nouns “accomplishment” and “achievement” occurred in 13% and 3% of letters for men and women, respectively. For women applicants, “compassionate” and “relates well to patients and staff at all levels” stood out (16% vs 4% in letters for women and men, respectively) [Trix F, Psenka C. Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society. Mar 2003;14(2):191-220. [CrossRef]6].

Less recognized categories of descriptors include hedging language, doubt-raisers, and grindstone language [Trix F, Psenka C. Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society. Mar 2003;14(2):191-220. [CrossRef]6]. Such language is more often applied to women in recommendation letters than to men. Doubt-raising language includes negative, potentially negative, hedging, unexplained, or irrelevant comments and faint praise [Trix F, Psenka C. Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society. Mar 2003;14(2):191-220. [CrossRef]6,Madera JM, Hebl MR, Dial H, Martin R, Valian V. Raising doubt in letters of recommendation for academia: Gender differences and their impact. J Bus Psychol. Apr 26, 2018;34:287-303. [CrossRef]7]. Examples of doubt-raising language include “while she has not done”; “while not the best student I have had”; and “bright, enthusiastic, he responds well to a minimum amount of supervision.” Examples of hedging include “it appears that” or “now that she has chosen,” and an example of faint praise is “she worked hard on projects that she enjoys.” Grindstone language implies that an individual is hardworking because of a need to compensate for a shortcoming in their ability (eg, “hardworking,” “conscientious,” or “dedicated”) [Valian V. Why So Slow?: The Advancement of Women. Cambridge, MA. The MIT Press; 1999.17]. For example, “She is a superb experimentalist – very well organized, thorough and careful in her approach to research” [Trix F, Psenka C. Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society. Mar 2003;14(2):191-220. [CrossRef]6].

Out-of-the-box tools to help with identifying commonly used categories of words are readily available for research purposes. One commonly used tool in text analysis is Linguistic Inquiry and Word Count (LIWC) [Pennebaker JW, Booth RJ, Boyd RL, Francis ME. Linguistic Inquiry and Word Count: LIWC2015. LIWC. 2015. URL: http://downloads.liwc.net.s3.amazonaws.com/LIWC2015_OperatorManual.pdf [accessed 2023-08-15] 18,Hovy D. Text Analysis in Python for Social Scientists: Discovery and Exploration. Cambridge, United Kingdom. Cambridge University Press; Jan 2021.19]. LIWC offers text analysis tools based upon established LIWC dictionary categories [Welcome to LIWC-22. LIWC. URL: https://www.liwc.app [accessed 2023-07-03] 20] that can be augmented with user-defined dictionaries; Madera et al [Madera JM, Hebl MR, Martin RC. Gender and letters of recommendation for academia: agentic and communal differences. J Appl Psychol. Nov 2009;94(6):1591-1599. [CrossRef] [Medline]5] validated added dictionaries of communal and agentic terms in their study of gendered language in recommendation letters [Miller DT, McCarthy DM, Fant AL, Li-Sauerwine S, Ali A, Kontrick AV. The standardized letter of evaluation narrative: Differences in language use by gender. West J Emerg Med. Oct 17, 2019;20(6):948-956. [FREE Full text] [CrossRef] [Medline]21]. Additional researchers have also created, although not yet validated, 5 additional user-defined dictionaries, including grindstone traits, ability traits, standout adjectives, research terms, and teaching terms [Schmader T, Whitehead J, Wysocki VH. A linguistic comparison of letters of recommendation for male and female chemistry and biochemistry job applicants. Sex Roles. 2007;57(7-8):509-514. [FREE Full text] [CrossRef] [Medline]1,Trix F, Psenka C. Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society. Mar 2003;14(2):191-220. [CrossRef]6,Miller DT, McCarthy DM, Fant AL, Li-Sauerwine S, Ali A, Kontrick AV. The standardized letter of evaluation narrative: Differences in language use by gender. West J Emerg Med. Oct 17, 2019;20(6):948-956. [FREE Full text] [CrossRef] [Medline]21-Friedman R, Fang CH, Hasbun J, Han H, Mady LJ, Eloy JA, et al. Use of standardized letters of recommendation for otolaryngology head and neck surgery residency and the impact of gender. Laryngoscope. Dec 2017;127(12):2738-2745. [CrossRef] [Medline]23]. LIWC usage typically requires a paid license for users, and LIWC offers its dictionaries in more than 15 languages.

Additional text analysis and processing techniques also can be applied in various ways to recommendation letters to identify biased language. Such approaches can involve using pre-established dictionaries of terms (eg, from LIWC), performing text mining [Heath JK, Weissman GE, Clancy CB, Shou H, Farrar JT, Dine CJ. Assessment of gender-based linguistic differences in physician trainee evaluations of medical faculty using automated text mining. JAMA Netw Open. May 03, 2019;2(5):e193520. [FREE Full text] [CrossRef] [Medline]9] or topic modeling [Turrentine FE, Dreisbach CN, St Ivany AR, Hanks JB, Schroen AT. Influence of gender on surgical residency applicants' recommendation letters. J Am Coll Surg. Apr 2019;228(4):356-365.e3. [CrossRef] [Medline]24], or applying natural language processing packages [Sarraf D, Vasiliu V, Imberman B, Lindeman B. Use of artificial intelligence for gender bias analysis in letters of recommendation for general surgery residency candidates. Am J Surg. Dec 2021;222(6):1051-1059. [CrossRef] [Medline]8].

Real-time integrated tools to identify biased language are available in productivity platforms. For example, the #BiasCorrect plug-in in Slack works “like spell check but for gender bias, this plug-in will flag your unconscious bias to you in real-time and offer up bias-free alternatives for you to consider instead” [#BiasCorrect install. Catalyst. URL: https://www.catalyst.org/biascorrect-install/ [accessed 2023-08-02] 25]. Integrated tools, extensions, or plug-ins are appealing; however, no such real-time tool exists yet in a text processing program. There are also several websites where users can copy and paste individual words or short chunks of text into a web-based form to identify which words are used more often for women or men and, perhaps, even in certain disciplines [Schmidt B. Gendered language in teaching evaluations. Ben Schmidt blog. URL: https://benschmidt.org/profGender/ [accessed 2023-08-02] 26,Forth T. Gender bias calculator. Tom Forth blog. URL: https://www.tomforth.co.uk/genderbias/ [accessed 2023-08-02] 27]. However, these are stand-alone tools that may serve as more of a curiosity rather than a routinely usable support in the recommendation letter writing workflow. Additionally, all of these existing tools share the same feature of first depending on the human generation of language and then reactively providing feedback if the writer is aware of the tool and uses it with a specific intention.

Overview of LLMs

The concept of AI augmentation of human tasks is not new; augmentation “is where employers create workplaces that combine smart machines with humans in close partnerships—symbiotically taking advantage of both human intelligence and machine intelligence. In other words, the AI system is used to complement the capabilities of a human worker (or vice versa)” [Miller SM, Davenport T. AI and the future of work: What we know today. Tom Davenport. 2022. URL: https://www.tomdavenport.com/ai-and-the-future-of-work-what-we-know-today/ [accessed 2023-06-11] 28]. Similarly, AI augmentation of writing letters of recommendation can offer a pathway to improve letter writing while keeping the human in the loop. Briefly, LLMs are based on a transformer model, a neural network architecture that initially involves a pretraining stage of self-supervised learning from a large amount of unannotated data. Subsequently, in a fine-tuning stage, further training on a smaller, task-specific data set can be done to facilitate specific tasks [Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology. Apr 2023;307(2):e230163. [CrossRef] [Medline]29]. Since the initial general popularity of LLMs during late 2022, with OpenAI’s ChatGPT [Introducing ChatGPT. OpenAI. URL: https://openai.com/blog/chatgpt [accessed 2023-08-02] 30], countless additional LLMs have been developed and launched. Notably, there are also free, open-source models available for research or commercial use, like Meta’s Llama 2 [Meta and Microsoft introduce the next generation of Llama. Meta AI. Jul 18, 2023. URL: https://ai.meta.com/blog/llama-2/ [accessed 2023-08-02] 31].

Training an LLM

Any algorithm or AI is only as good as the training data with which the model is trained. LLMs have already been shown to, for example, generate statements that have certain political leanings [Rozado D. The political biases of ChatGPT. Soc Sci. Mar 02, 2023;12(3):148. [CrossRef]32,Hartmann J, Schwenzow J, Witte M. The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation. arXiv.. Preprint posted online on January 5, 2023. [FREE Full text]33] or have cultural biases [Cao Y, Zhou L, Lee S, Cabello L, Chen M, Hershcovich D. Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. arXiv.. Preprint posted online on March 31, 2023. [FREE Full text]34,Ferrara E. Should ChatGPT be biased? Challenges and risks of bias in large language models. arXiv.. Preprint posted online on April 18, 2023. [FREE Full text]35]. If the training data are biased, because of the probabilistic nature of the language generated in an LLM, that bias can be perpetuated or amplified in prompted outputs. Nevertheless, the potential of LLMs to support the task of recommendation letter writing is still a major opportunity that cannot be ignored.

Using open-source LLMs to train one's own generative AI on a set of one’s own recommendation letters is a possibility, but this perhaps is limited by the size of the training set and the potential of unintentionally amplifying one's own implicit biases. During a workshop at the American Medical Informatics Association’s Annual Symposium in 2020, on the topic of bias in recommendation letters, one advanced career academic faculty member with 3 decades of experience in their field reflected on their writing of over 200 recommendation letters [Leung TI, Ancker JS, Cimino JJ, Ross H, Wu H. S104: panel - an unseen art: Writing letters of support and nomination to promote diversity, equity, and inclusion in informatics. Presented at: 2020 American Medical Informatics Association (AMIA) Annual Symposium; November 18, 2020, 2020; Virtual Conference.36]. At that time, a named entity recognition approach to identifying key words offered a preliminary glimpse at one individual’s writing patterns.

Increasing Efficiency

Improving the efficiency of recommendation letter writing can be especially valuable in easing the burden of this task for the small proportion of underrepresented groups who are in top leadership positions in medicine and scientific fields. For example, in medicine, although the proportion of women department chairs has increased over the last decade, still only 18% are women; the proportion of women medical school deans has barely shifted since 2012, increasing from 16% to 18% in 2018 [The state of women in academic medicine. Association of American Medical Colleges. URL: https://www.aamc.org/data-reports/data/2018-2019-state-women-academic-medicine-exploring-pathways-equity [accessed 2023-08-02] 37]. In academia, when promotion from associate professor to full professor requires letters of recommendation from individuals with a rank identical to that being sought, this burden can be especially amplified for women faculty among the highest academic ranks. Fortunately, the gender gap at the full-time professor level has narrowed over the past decade, yet still only 25% of full professors are women as of 2018 [Joseph MM, Ahasic AM, Clark J, Templeton K. State of women in medicine: History, challenges, and the benefits of a diverse workforce. Pediatrics. Sep 01, 2021;148(Suppl 2):e2021051440C. [CrossRef] [Medline]38,Richter KP, Clark L, Wick JA, Cruvinel E, Durham D, Shaw P, et al. Women physicians and promotion in academic medicine. N Engl J Med. Nov 26, 2020;383(22):2148-2157. [CrossRef] [Medline]39].

Although no biased language checker plug-ins are available in word processing software, some LLMs have the capability to potentially ingest one or more files in various formats. Conceivably, a curriculum vitae in PDF format could be provided as part of a prompt. Afterward, with thoughtful prompts, the LLM could generate relevant portions of a recommendation letter for a writer to use. Putting the energy of generation on the AI, with the human in a position of writing, could be a time-saver. Alternatively, a human writing a rough draft can also prompt AI to refine and polish the language of the recommendation letter. There are more ways that AI can augment the recommendation letter writing process, and in all cases, these would help with the efficiency of generating the letters for busy faculty or those who may need extra support to write professionally and clearly in the language required for the letter. Moreover, as efficiency improves, a diverse range of letter writers can be created across the gender spectrum, thus alleviating burdens and fostering a culture of thoughtful language that emphasizes the merits and potential of candidates for promotion or leadership.

Cautionary Notes

Some additional notes of caution are warranted for anyone considering using generative AI to help them with writing recommendation letters. In scientific publishing, there is almost no remaining controversy as to whether generative AI can coauthor a manuscript (it should not [Jackson J, Landis G, Baskin PK, Hadsell KA, English M, CSE Editorial Policy Committee. CSE guidance on machine learning and artificial intelligence tools. Science Editor. May 1, 2023;46(2):se-d-4602-07. [CrossRef]40-Stokel-Walker C. ChatGPT listed as author on research papers: many scientists disapprove. Nature. Jan 2023;613(7945):620-621. [CrossRef] [Medline]42]). The arguments for no generative AI coauthorship center on accountability. The sense of accountability for the factual content of a written document is self-evident. Publishers either ban generative AI use by authors in generating portions of a manuscript or permit it to a limited extent and with required disclosure and transparency. No analogous guidelines exist for writing recommendation letters, especially since it is a common practice that recommendation letter writers can recycle their letters as templates for another similar letter, or some letter writers ask the candidate to draft a first version of the letter. Although we do not expect letter writers to disclose generative AI use, accountability for the outputs used in an official final recommendation letter lies solely with the signer of the letter.

Additionally, the focus here has been on recommendation letter writing. The other half of this process is recommendation letter reading and interpretation. Regardless of self-generated text or AI-assisted generation of text, there is a history of bias in AI-supported hiring [Drage E, Mackereth K. Does AI debias recruitment? Race, gender, and AI's "Eradication of Difference". Philos Technol. 2022;35(4):89. [FREE Full text] [CrossRef] [Medline]43]. Even human screeners are not immune to this bias, tending to carry biases when they, for example, perceive a name to be identifying a person's gender or race [Steinpreis RE, Anders KA, Ritzke D. The impact of gender on the review of the curricula vitae of job applicants and tenure candidates: A national empirical study. Sex Roles. Oct 1999;41:509-528. [CrossRef]44,Wenneras C, Wold A. Nepotism and sexism in peer-review. Nature. May 22, 1997;387(6631):341-343. [CrossRef] [Medline]45]. This half of the issue on recommendation letter interpretation and, more generally, on AI-supported hiring processes has been the focus of recent regulation in New York City [Automated employment decision tools. NYC311. URL: https://portal.311.nyc.gov/article/?kanumber=KA-03552 [accessed 2023-08-02] 46].

Finally, we cannot emphasize enough that the aim is to reduce bias in language, not to reduce how often women candidates are written about as being “caring” or “nurturing.” In medicine, all physician candidates would ideally embody these traits, among others, in comparable ways that are needed for them to be successful in the target roles they are being recommended for.

Overall, we are optimistic about the potential of generative AI in augmenting recommendation letter writing. Naturally, the opportunities we raise in this editorial are not without their potential limitations. One major counterargument is that the application of any technology to this specific task does not (or cannot) address the underlying problems that racism, stereotyping, and various forms of bias and discrimination are deeply rooted in systemic and organization structure. As a result, the potential for gender bias in AI remains possible [Thakur V. Unveiling gender bias in terms of profession across LLMs: Analyzing and addressing sociological implications. arXiv.. Preprint posted online on July 18, 2023. [FREE Full text]47]. We agree with this position and see the application of technology, in the ways described in this editorial, as a supplementary tool or option for existing programs and initiatives around implicit bias recognition and management [Rodriguez N, Kintzer E, List J, Lypson M, Grochowalski JH, Marantz PR, et al. Implicit bias recognition and management: Tailored instruction for faculty. J Natl Med Assoc. Oct 2021;113(5):566-575. [FREE Full text] [CrossRef] [Medline]48], rather than as a replacement or substitution. Additionally, although this editorial does not address other professional documents that may benefit from technological augmentation, there is evidence to suggest that biased language appears in evaluations of trainees [Hemmer PA, Karani R. Let's face it: We are biased, and it should not be that way. J Gen Intern Med. May 2019;34(5):649-651. [FREE Full text] [CrossRef] [Medline]49], including subjective evaluations for students applying to residency programs [Turrentine FE, Dreisbach CN, St Ivany AR, Hanks JB, Schroen AT. Influence of gender on surgical residency applicants' recommendation letters. J Am Coll Surg. Apr 2019;228(4):356-365.e3. [CrossRef] [Medline]24]; qualitative evaluations of residents and students [Klein R, Julian KA, Snyder ED, Koch J, Ufere NN, Volerman A, et al. Gender Equity in Medicine (GEM) workgroup. Gender bias in resident assessment in graduate medical education: Review of the literature. J Gen Intern Med. May 2019;34(5):712-719. [FREE Full text] [CrossRef] [Medline]12,Gerull KM, Loe M, Seiler K, McAllister J, Salles A. Assessing gender bias in qualitative evaluations of surgical residents. Am J Surg. Feb 2019;217(2):306-313. [FREE Full text] [CrossRef] [Medline]50]; student, resident, and fellow evaluations of faculty physicians [Heath JK, Weissman GE, Clancy CB, Shou H, Farrar JT, Dine CJ. Assessment of gender-based linguistic differences in physician trainee evaluations of medical faculty using automated text mining. JAMA Netw Open. May 03, 2019;2(5):e193520. [FREE Full text] [CrossRef] [Medline]9]; and more [Smith DG, Rosenstein JE, Nikolov MC, Chaney DA. The power of language: Gender, status, and agency in performance evaluations. Sex Roles. May 3, 2018;80:159-171. [CrossRef]51,Sheffield V, Hartley S, Stansfield RB, Mack M, Blackburn S, Vaughn VM, et al. Gendered expectations: the impact of gender, evaluation language, and clinical setting on resident trainee assessment of faculty performance. J Gen Intern Med. Mar 2022;37(4):714-722. [FREE Full text] [CrossRef] [Medline]52]. Racial bias in evaluations also is problematic [Ross DA, Boatright D, Nunez-Smith M, Jordan A, Chekroud A, Moore EZ. Differences in words used to describe racial and gender groups in medical student performance evaluations. PLoS One. Aug 09, 2017;12(8):e0181659. [FREE Full text] [CrossRef] [Medline]53-Stack TJ, Berk GA, Ho TD, Zeatoun A, Kong KA, Chaskes MB, et al. Racial and ethnic bias in letters of recommendation and personal statements for application to otolaryngology residency. ORL J Otorhinolaryngol Relat Spec. 2023;85(3):141-149. [CrossRef] [Medline]55].

In a future investigation, we aim to further determine what practices current faculty and physicians are using in the AI augmentation of their writing of letters of recommendation. There may also be opportunities to computationally determine prompts that best facilitate recommendation letter writing with minimal implicit bias [Jiang Z, Xu FF, Araki J, Neubig G. How can we know what language models know? Trans Assoc Comput Linguist. 2020;8:423-438. [CrossRef]56] or to fine-tune an LLM based on a large corpus of recommendation letters. We look forward to the advancements that medical and scientific education and career advancement processes can benefit from, including new technological tools, like generative AI, to overcome systemic biases for women and underrepresented groups in their respective disciplines. AI augmentation can be a tool when utilized mindfully and with caution, improving one letter of recommendation at a time. This has the potential to address and mitigate systemic biases, especially when equity in medical and scientific careers is at stake [Bates C, Gordon L, Travis E, Chatterjee A, Chaudron L, Fivush B, et al. Striving for gender equity in academic medicine careers: A call to action. Acad Med. Aug 2016;91(8):1050-1052. [FREE Full text] [CrossRef] [Medline]57,Leung TI, Barrett E, Lin TL, Moyer DV. Advancing from perception to reality: How to accelerate and achieve gender equity now. Perspect Med Educ. Dec 2019;8(6):317-319. [FREE Full text] [CrossRef] [Medline]58].

Acknowledgments

This article is inspired by previous related work published by the authors in the official newsletter of the Society of General Internal Medicine, SGIM Forum [Sagar A, Henry T, Shroff S, Leung TI. Best practices: Reading between the lines to promote diversity, equity, and inclusion. SGIM Forum. URL: https://connect.sgim.org/sgimforum/viewdocument/reading-between-the-lines-to-promo [accessed 2023-06-11] 59], and a workshop presentation by the authors at the 2022 Annual Meeting of the Society of General Internal Medicine [Leung T, Sagar A, Henry TL, Shroff S. SGIM2022: Recognizing and reducing bias in letters of support and performance evaluations in 360 degrees. Presented at: 2022 Annual Meeting of the Society of General Internal Medicine; April 9, 2022, 2023; Orlando, FL. [CrossRef]60].

Authors' Contributions

TIL was responsible for conceptualization, writing and preparing the original draft, and reviewing and editing this paper. AS, SS, and TLH were responsible for conceptualization and reviewing and editing this paper.

Conflicts of Interest

TIL is the scientific editorial director for JMIR Publications.

Schmader T, Whitehead J, Wysocki VH. A linguistic comparison of letters of recommendation for male and female chemistry and biochemistry job applicants. Sex Roles. 2007;57(7-8):509-514. [FREE Full text] [CrossRef] [Medline]
Bernstein RH, Macy MW, Williams WM, Cameron CJ, Williams-Ceci SC, Ceci SJ. Assessing gender bias in particle physics and social science recommendations for academic jobs. Soc Sci. Feb 14, 2022;11(2):74. [CrossRef]
Houser C, Lemmons K. Implicit bias in letters of recommendation for an undergraduate research internship. J Furth High Educ. Apr 24, 2017;42(5):585-595. [CrossRef]
Grimm LJ, Redmond RA, Campbell JC, Rosette AS. Gender and racial bias in radiology residency letters of recommendation. J Am Coll Radiol. Jan 2020;17(1 Pt A):64-71. [CrossRef] [Medline]
Madera JM, Hebl MR, Martin RC. Gender and letters of recommendation for academia: agentic and communal differences. J Appl Psychol. Nov 2009;94(6):1591-1599. [CrossRef] [Medline]
Trix F, Psenka C. Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society. Mar 2003;14(2):191-220. [CrossRef]
Madera JM, Hebl MR, Dial H, Martin R, Valian V. Raising doubt in letters of recommendation for academia: Gender differences and their impact. J Bus Psychol. Apr 26, 2018;34:287-303. [CrossRef]
Sarraf D, Vasiliu V, Imberman B, Lindeman B. Use of artificial intelligence for gender bias analysis in letters of recommendation for general surgery residency candidates. Am J Surg. Dec 2021;222(6):1051-1059. [CrossRef] [Medline]
Heath JK, Weissman GE, Clancy CB, Shou H, Farrar JT, Dine CJ. Assessment of gender-based linguistic differences in physician trainee evaluations of medical faculty using automated text mining. JAMA Netw Open. May 03, 2019;2(5):e193520. [FREE Full text] [CrossRef] [Medline]
Alexander CS. Text mining for bias: A recommendation letter experiment. American Business Law Journal. Apr 06, 2022;59(1):5-59. [CrossRef]
Introducing ChatGPT Plus. OpenAI. URL: https://openai.com/blog/chatgpt-plus [accessed 2023-06-11]
Klein R, Julian KA, Snyder ED, Koch J, Ufere NN, Volerman A, et al. Gender Equity in Medicine (GEM) workgroup. Gender bias in resident assessment in graduate medical education: Review of the literature. J Gen Intern Med. May 2019;34(5):712-719. [FREE Full text] [CrossRef] [Medline]
Arora VM, Carter K, Babcock C. Bias in assessment needs urgent attention-no rest for the "Wicked". JAMA Netw Open. Nov 01, 2022;5(11):e2243143. [FREE Full text] [CrossRef] [Medline]
Mamtani M, Shofer F, Scott K, Kaminstein D, Eriksen W, Takacs M, et al. Gender differences in emergency medicine attending physician comments to residents: A qualitative analysis. JAMA Netw Open. Nov 01, 2022;5(11):e2243134. [FREE Full text] [CrossRef] [Medline]
Dayal A, O'Connor DM, Qadri U, Arora VM. Comparison of male vs female resident milestone evaluations by faculty during emergency medicine residency training. JAMA Intern Med. May 01, 2017;177(5):651-657. [FREE Full text] [CrossRef] [Medline]
Gendered languages may play a role in limiting women’s opportunities, new research finds. The World Bank. Jan 24, 2019. URL: https://www.worldbank.org/en/news/feature/2019/01/24/gendered-languages-may-play-a-role-in-limiting-womens-opportunities-new-research-finds [accessed 2023-06-11]
Valian V. Why So Slow?: The Advancement of Women. Cambridge, MA. The MIT Press; 1999.
Pennebaker JW, Booth RJ, Boyd RL, Francis ME. Linguistic Inquiry and Word Count: LIWC2015. LIWC. 2015. URL: http://downloads.liwc.net.s3.amazonaws.com/LIWC2015_OperatorManual.pdf [accessed 2023-08-15]
Hovy D. Text Analysis in Python for Social Scientists: Discovery and Exploration. Cambridge, United Kingdom. Cambridge University Press; Jan 2021.
Welcome to LIWC-22. LIWC. URL: https://www.liwc.app [accessed 2023-07-03]
Miller DT, McCarthy DM, Fant AL, Li-Sauerwine S, Ali A, Kontrick AV. The standardized letter of evaluation narrative: Differences in language use by gender. West J Emerg Med. Oct 17, 2019;20(6):948-956. [FREE Full text] [CrossRef] [Medline]
Dutt K, Pfaff DL, Bernstein AF, Dillard JS, Block CJ. Gender differences in recommendation letters for postdoctoral fellowships in geoscience. Nat Geosci. Oct 3, 2016;9:805-808. [CrossRef]
Friedman R, Fang CH, Hasbun J, Han H, Mady LJ, Eloy JA, et al. Use of standardized letters of recommendation for otolaryngology head and neck surgery residency and the impact of gender. Laryngoscope. Dec 2017;127(12):2738-2745. [CrossRef] [Medline]
Turrentine FE, Dreisbach CN, St Ivany AR, Hanks JB, Schroen AT. Influence of gender on surgical residency applicants' recommendation letters. J Am Coll Surg. Apr 2019;228(4):356-365.e3. [CrossRef] [Medline]
#BiasCorrect install. Catalyst. URL: https://www.catalyst.org/biascorrect-install/ [accessed 2023-08-02]
Schmidt B. Gendered language in teaching evaluations. Ben Schmidt blog. URL: https://benschmidt.org/profGender/ [accessed 2023-08-02]
Forth T. Gender bias calculator. Tom Forth blog. URL: https://www.tomforth.co.uk/genderbias/ [accessed 2023-08-02]
Miller SM, Davenport T. AI and the future of work: What we know today. Tom Davenport. 2022. URL: https://www.tomdavenport.com/ai-and-the-future-of-work-what-we-know-today/ [accessed 2023-06-11]
Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology. Apr 2023;307(2):e230163. [CrossRef] [Medline]
Introducing ChatGPT. OpenAI. URL: https://openai.com/blog/chatgpt [accessed 2023-08-02]
Meta and Microsoft introduce the next generation of Llama. Meta AI. Jul 18, 2023. URL: https://ai.meta.com/blog/llama-2/ [accessed 2023-08-02]
Rozado D. The political biases of ChatGPT. Soc Sci. Mar 02, 2023;12(3):148. [CrossRef]
Hartmann J, Schwenzow J, Witte M. The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation. arXiv.. Preprint posted online on January 5, 2023. [FREE Full text]
Cao Y, Zhou L, Lee S, Cabello L, Chen M, Hershcovich D. Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. arXiv.. Preprint posted online on March 31, 2023. [FREE Full text]
Ferrara E. Should ChatGPT be biased? Challenges and risks of bias in large language models. arXiv.. Preprint posted online on April 18, 2023. [FREE Full text]
Leung TI, Ancker JS, Cimino JJ, Ross H, Wu H. S104: panel - an unseen art: Writing letters of support and nomination to promote diversity, equity, and inclusion in informatics. Presented at: 2020 American Medical Informatics Association (AMIA) Annual Symposium; November 18, 2020, 2020; Virtual Conference.
The state of women in academic medicine. Association of American Medical Colleges. URL: https://www.aamc.org/data-reports/data/2018-2019-state-women-academic-medicine-exploring-pathways-equity [accessed 2023-08-02]
Joseph MM, Ahasic AM, Clark J, Templeton K. State of women in medicine: History, challenges, and the benefits of a diverse workforce. Pediatrics. Sep 01, 2021;148(Suppl 2):e2021051440C. [CrossRef] [Medline]
Richter KP, Clark L, Wick JA, Cruvinel E, Durham D, Shaw P, et al. Women physicians and promotion in academic medicine. N Engl J Med. Nov 26, 2020;383(22):2148-2157. [CrossRef] [Medline]
Jackson J, Landis G, Baskin PK, Hadsell KA, English M, CSE Editorial Policy Committee. CSE guidance on machine learning and artificial intelligence tools. Science Editor. May 1, 2023;46(2):se-d-4602-07. [CrossRef]
Zielinski C, Winker MA, Aggarwal R, Ferris LE, Heinemann M, Lapeña JFJ, et al. WAME Board. Chatbots, generative AI, and scholarly manuscripts. World Association of Medical Editors. 2023. URL: https://wame.org/page3.php?id=106 [accessed 2023-08-08]
Stokel-Walker C. ChatGPT listed as author on research papers: many scientists disapprove. Nature. Jan 2023;613(7945):620-621. [CrossRef] [Medline]
Drage E, Mackereth K. Does AI debias recruitment? Race, gender, and AI's "Eradication of Difference". Philos Technol. 2022;35(4):89. [FREE Full text] [CrossRef] [Medline]
Steinpreis RE, Anders KA, Ritzke D. The impact of gender on the review of the curricula vitae of job applicants and tenure candidates: A national empirical study. Sex Roles. Oct 1999;41:509-528. [CrossRef]
Wenneras C, Wold A. Nepotism and sexism in peer-review. Nature. May 22, 1997;387(6631):341-343. [CrossRef] [Medline]
Automated employment decision tools. NYC311. URL: https://portal.311.nyc.gov/article/?kanumber=KA-03552 [accessed 2023-08-02]
Thakur V. Unveiling gender bias in terms of profession across LLMs: Analyzing and addressing sociological implications. arXiv.. Preprint posted online on July 18, 2023. [FREE Full text]
Rodriguez N, Kintzer E, List J, Lypson M, Grochowalski JH, Marantz PR, et al. Implicit bias recognition and management: Tailored instruction for faculty. J Natl Med Assoc. Oct 2021;113(5):566-575. [FREE Full text] [CrossRef] [Medline]
Hemmer PA, Karani R. Let's face it: We are biased, and it should not be that way. J Gen Intern Med. May 2019;34(5):649-651. [FREE Full text] [CrossRef] [Medline]
Gerull KM, Loe M, Seiler K, McAllister J, Salles A. Assessing gender bias in qualitative evaluations of surgical residents. Am J Surg. Feb 2019;217(2):306-313. [FREE Full text] [CrossRef] [Medline]
Smith DG, Rosenstein JE, Nikolov MC, Chaney DA. The power of language: Gender, status, and agency in performance evaluations. Sex Roles. May 3, 2018;80:159-171. [CrossRef]
Sheffield V, Hartley S, Stansfield RB, Mack M, Blackburn S, Vaughn VM, et al. Gendered expectations: the impact of gender, evaluation language, and clinical setting on resident trainee assessment of faculty performance. J Gen Intern Med. Mar 2022;37(4):714-722. [FREE Full text] [CrossRef] [Medline]
Ross DA, Boatright D, Nunez-Smith M, Jordan A, Chekroud A, Moore EZ. Differences in words used to describe racial and gender groups in medical student performance evaluations. PLoS One. Aug 09, 2017;12(8):e0181659. [FREE Full text] [CrossRef] [Medline]
Rojek AE, Khanna R, Yim JWL, Gardner R, Lisker S, Hauer KE, et al. Differences in narrative language in evaluations of medical students by gender and under-represented minority status. J Gen Intern Med. May 2019;34(5):684-691. [FREE Full text] [CrossRef] [Medline]
Stack TJ, Berk GA, Ho TD, Zeatoun A, Kong KA, Chaskes MB, et al. Racial and ethnic bias in letters of recommendation and personal statements for application to otolaryngology residency. ORL J Otorhinolaryngol Relat Spec. 2023;85(3):141-149. [CrossRef] [Medline]
Jiang Z, Xu FF, Araki J, Neubig G. How can we know what language models know? Trans Assoc Comput Linguist. 2020;8:423-438. [CrossRef]
Bates C, Gordon L, Travis E, Chatterjee A, Chaudron L, Fivush B, et al. Striving for gender equity in academic medicine careers: A call to action. Acad Med. Aug 2016;91(8):1050-1052. [FREE Full text] [CrossRef] [Medline]
Leung TI, Barrett E, Lin TL, Moyer DV. Advancing from perception to reality: How to accelerate and achieve gender equity now. Perspect Med Educ. Dec 2019;8(6):317-319. [FREE Full text] [CrossRef] [Medline]
Sagar A, Henry T, Shroff S, Leung TI. Best practices: Reading between the lines to promote diversity, equity, and inclusion. SGIM Forum. URL: https://connect.sgim.org/sgimforum/viewdocument/reading-between-the-lines-to-promo [accessed 2023-06-11]
Leung T, Sagar A, Henry TL, Shroff S. SGIM2022: Recognizing and reducing bias in letters of support and performance evaluations in 360 degrees. Presented at: 2022 Annual Meeting of the Society of General Internal Medicine; April 9, 2022, 2023; Orlando, FL. [CrossRef]

‎

AI: artificial intelligence

LIWC: Linguistic Inquiry and Word Count

LLM: large language model

Edited by T de Azevedo Cardoso; This is a non–peer-reviewed article. submitted 02.08.23; accepted 08.08.23; published 23.08.23.

©Tiffany I Leung, Ankita Sagar, Swati Shroff, Tracey L Henry. Originally published in JMIR Medical Education (https://mededu.jmir.org), 23.08.2023.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Can AI Mitigate Bias in Writing Letters of Recommendation?