Translating Clinical Questions by Physicians Into Searchable Queries: Analytical Survey Study

Background: Staying up to date and answering clinical questions with current best evidence from health research is challenging. Evidence-based clinical texts, databases, and tools can help, but clinicians first need to translate their clinical questions into searchable queries. MacPLUS FS (McMaster Premium LiteratUre Service Federated Search) is an online search engine that allows clinicians to explore multiple resources simultaneously and retrieves one single output that includes the following: (1) evidence from summaries (eg, UpToDate and DynaMed), (2) preappraised research (eg, EvidenceAlerts), and (3) non-preappraised research (eg, PubMed), with and without validated bibliographic search filters. MacPLUS FS can also be used as a laboratory to explore clinical questions and evidence retrieval. Objective: Our primary objective was to examine how clinicians formulate their queries on a federated search engine, according to the population, intervention, comparison, and outcome (PICO) framework. Our secondary objective was to assess which resources were accessed by clinicians to answer their questions. Methods: We performed an analytical survey among 908 clinicians who used MacPLUS FS in the context of a randomized controlled trial on search retrieval. Recording account log-ins and usage, we extracted all 1085 queries performed during a 6-month period and classified each search term according to the PICO framework. We further categorized queries into background (eg, “What is porphyria?”) and foreground questions (eg, “Does treatment A work better than B?”). We then analyzed the type of resources that clinicians accessed. Results: There were 695 structured queries, after exclusion of meaningless queries and iterations of similar searches. We classified 56.5% (393/695) of these queries as background questions and 43.5% (302/695) as foreground questions, the majority of which were related to questions about therapy (213/695, 30.6%), followed by diagnosis (48/695, 6.9%), etiology (24/695, 3.5%), and prognosis (17/695, 2.5%). This distribution did not significantly differ between postgraduate residents and medical faculty physicians (P=.51). Queries included a median of 3 search terms (IQR 2-4), most often related to the population and intervention or test, rarely related to the outcome, and never related to the comparator. About half of the resources accessed (314/610, 51.5%) were summaries, 24.4% (149/610) were preappraised research, and 24.1% were (147/610) non-preappraised research. Conclusions: Our results, from a large sample of real-life queries, could guide the development of educational interventions to improve clinicians’ retrieval skills, as well as inform the design of more useful evidence-based resources for clinical practice. Trial Registration: ClinicalTrials.gov NCT02038439; https://www.clinicaltrials.gov/ct2/show/NCT02038439 (JMIR Med Educ 2020;6(1):e16777) doi: 10.2196/16777 JMIR Med Educ 2020 | vol. 6 | iss. 1 | e16777 | p. 1 http://mededu.jmir.org/2020/1/e16777/ (page number not for citation purposes) Seguin et al JMIR MEDICAL EDUCATION


Introduction
Web-based searches have become the norm when looking for information and answers to most of our questions in daily life. This has also become true in the practice of medicine; online medical resources to access evidence are increasingly considered "as essential as the stethoscope" [1]. While famous search engines, such as Google, or information sources, such as Wikipedia, are used in both medical and nonmedical worlds, answering clinical questions to inform point-of-care decisions has additional challenges and implications [2]. Triggered by more than 20 years of evidence-based medicine (EBM) [3,4], the unit of information in medicine comes mostly in the form of research evidence, published across thousands of medical journals and indexed in numerous databases (eg, MEDLINE, Embase, and the Cumulative Index to Nursing and Allied Health Literature [CINAHL]). The volume of this new evidence through all these channels is rapidly increasing at the pace of 3000-4000 new publications per day, compiled or processed in hundreds of EBM summaries and resources [5][6][7].
Physicians are typically familiar with only a few of these resources, likely those to which they have been exposed in training or by peers, and often ignore most of the ecosystem and architecture of published evidence. Yet, their daily practice triggers, on average, five to eight questions every 10 patients [8][9][10]. Clinical questions can be classified as background and foreground questions (see Figure 1). Background questions (eg, "What is porphyria?") are typically about the nature of a disorder, a measure, a treatment, or a test. They are easily answered through online textbooks. Foreground questions are more directly related to the diagnosis, prognosis, and treatment of a given patient population (eg, "How effective would levonorgestrel be as emergency contraception for an obese patient?") [11]. The teaching of EBM recommends that foreground questions be formulated according to the population, intervention, comparison, and outcome (PICO) framework, or the population, exposure, comparison, and outcome (PECO) framework, and answered by research evidence [12]. How physicians translate their clinical questions into searchable queries remains poorly known. How many search terms do they use? How often do their queries fit the PICO framework [12,13]? Do experienced and fully trained clinicians differ from residents in training? Do queries differ according to the medical specialty? We aimed to examine these questions in a large sample of practicing clinicians of various levels of training and specialty type.
The type of search engine or evidence resource may also influence the way we conduct queries. Google and Wikipedia tend to retrieve relevant answers, albeit selective, with intuitive, less-structured search strategies [14][15][16]. Some EBM online textbooks and evidence summaries may provide a similar user experience to clinicians. By contrast, searching PubMed or other databases requires more training and structure, is less intuitive, and tends to produce large and diluted outputs for similar clinical questions [12].
We, therefore, explored how clinicians formulate their queries in a federated online search engine, namely MacPLUS FS (McMaster Premium LiteratUre Service Federated Search). MacPLUS FS allows clinicians to explore multiple resources simultaneously, retrieving one single output that includes the following: (1) evidence from evidence-based summaries (eg, UpToDate and DynaMed), (2) preappraised research (eg, EvidenceAlerts), and (3) non-preappraised research (eg, PubMed), with and without validated search filters (see Figure   2). In this study, we will outline how we used MacPLUS FS, which functions as a laboratory, to explore clinical questions, the taxonomy of queries, and evidence retrieval (ie, what resources clinicians access to answer their questions when provided with a wide array of EBM resources) (see Multimedia Appendix 1) [5]. While MacPLUS FS functions as a laboratory for evidence retrieval research, its exact twin-ACCESSSS search engine-is freely available online [17].

Study Design and Clinician Sample
We conducted an analytical survey of clinical search queries among 431 postgraduate medical trainees and 477 medical faculty members registered to a federated search engine, MacPLUS FS. The service was freely available to registered users from any computer with an internet browser throughout the clinical setting or elsewhere.
Participating clinicians consented to be enrolled in 6-month, MacPLUS FS, randomized controlled trials [5], which tested three interventions to enhance the quantity and quality of searching for current best evidence in order to answer clinical questions in a factorial design. As described with more detail in the published protocol of the trials [5], we tested the following three interventions embedded in MacPLUS FS: (1) a Web-based clinical question recorder, (2) an evidence retrieval coach composed of eight short educational videos, and (3) an audit, feedback, and gamification approach to evidence retrieval, based on the allocation of badges and reputation scores. Participating clinicians were randomized to each of the three interventions in a factorial design (A × B × C).
For each clinician, utilization of MacPLUS FS was recorded through accounts tracking log-ins and usage, including their detailed search queries. Registration to the service was free, and access to each evidence resource was through clinicians' academic institutions, mostly McMaster University, Hamilton, Canada. Clinicians were categorized according to their baseline search levels and specialty types [5].

Sample of Search Queries
We extracted all 1085 search queries performed by clinicians during the conduct of the MacPLUS FS trials. Two authors (AS and TA) assessed each query individually, counting the number of search terms-counting all words (eg, the query "porphyria" contains 1 term)-and documenting all abbreviations and Boolean terms (ie, logical operators such as "AND," "OR," or "NOT"). Search queries were then classified into (1) structured searches, (2) searches for specific articles (eg, when clinicians typed in the title of a given study), (3) iteration of structured searches, namely a group of related structured queries with a similar PICO question within the same log-in session, and (4) undetermined searches (eg, "Scimitar").

Assessment of Search Queries and Evidence Resources Access
The same two authors (AS and TA) classified structured queries into background or foreground questions (see Figure 1), according to the PICO framework, and blinded the participants' characteristics, except the log-in session. Queries that included only terms related to population or intervention were classified as background questions. Those including several terms related to population and intervention and/or outcome and/or comparator were categorized as foreground, and further categorized into therapy, diagnosis, etiology, and prognosis. For each query, we examined the distribution of access to each evidence resource from the federated search: summaries, preappraised research, and non-preappraised research (see Figure 2).

Statistical Analysis
We examined types of questions (ie, background, foreground, and type of foreground) according to the level of training as well as clinicians' specialties and baseline frequencies of search (ie, in the prior months since their registration to MacPLUS FS). We then examined the number and type of search terms across each type of question. We compared distributions using chi-square parametric tests when relevant and Kruskall-Wallis tests for nonnormal distributions. Data abstraction was done using Microsoft Excel 2016, version 15.29, and data analysis was performed using SPSS Statistics for Windows, version 23.0 (IBM Corp).

Clinicians
Participants were postgraduate residents and medical faculty members who had registered in MacPLUS FS prior to the trial. Of the 678 postgraduate residents and 753 medical faculty members, 431 (63.6%) and 477 (63.3%), respectively, were deemed eligible after the exclusion of 247 postgraduate residents and 266 medical faculty members, who either never logged in to MacPLUS FS during the year prior to the study or quit the institutions served by MacPLUS FS [5]. Searchers were further classified, depending on their baseline average search frequencies during the 6 months prior to the trial [5], as regular searchers (≥1 search per month), occasional searchers (<1 search per month), or alert-only users (no searches).

From Clinicians to Queries
The 908 clinicians made 1085 search queries, of which 235 (21.66%) were subsequent iterations of the same search, 124 (11.43%) were a search for a specific article, and 31 (2.86%) could not be classified and remained undetermined. A total of 695 out of 1085 queries (64.06%) were structured queries following the PICO format, with 480 out of 695 (69.1%) single queries, whereas 215 (30.9%) included a group of related queries. This corresponds to an average of 2.1 attempts per group query. Table 1 summarizes the distributions of the 695 structured queries. We classified 56.5% (393/695) as background and 43.5% (302/695) as foreground questions, the majority of which were related to therapy (213/695, 30.6%), followed by diagnosis (48/695, 6.9%), etiology (24/695, 3.5%), and prognosis (17/695, 2.4%). Distributions did not differ according to level of training (P=.51) (see Table 1).  Table 2 shows the distributions of queries related to background and foreground clinical questions, with respect to the clinicians' levels of training, specialty types (ie, family medicine, internal medicine, internal medicine specialties, pediatrics, psychiatry, surgery, anesthesiology, and others detailed in Multimedia Appendix 2), and categories of search frequency. Internal and family medicine physicians made 48.5% (337/695) of structured queries, 55.2% (186/337) of which were related to background content (see Table 2). However, there were differences regarding the frequencies of searches with regular searchers looking for significantly more background questions (P=.009). There were no differences between specialty types (P=.67).  Overall, 72.5% (504/695) of structured queries (see Table 3) contained at least 1 term related to population, and 45.9% (319/695) contained at least 1 term related to an intervention. Few queries contained terms about etiology, diagnostic tests, or outcome. No query included the comparator. Background queries included a median of 2 search terms (IQR 1-3). Of these queries, 71.2% (280/393) included a population term, 24.7% (97/393) included an intervention term, 1.0% (4/393) included an etiology term, 6.1% (24/393) included a diagnostic term, and 2.5% (10/393) included an outcome term. Foreground queries included a median of 4 search terms (IQR 3-5). Of these queries, 74.2% (224/302) included a population term, 73.5% (222/302) included an intervention term, 21.5% (65/302) included an outcome term, 16.2% (49/302) included a diagnostic term, and 7.6% (23/302) included an etiology term. Clinicians made no use of explicit Boolean search terms to link various PICO elements.  The number of evidence-based resources that clinicians accessed for each type of query (ie, by clicking the available links in the search output) are displayed in Table 4. The distribution of accessed resources is significantly different across categories (P<.001). Although 35.7% (248/695) of structured queries did not result in any resource access, 39.9% (277/695) led to one resource accessed, 11.8% (82/695) led to two, and 12.7% (88/695) led to three or more resources. Across all 1085 queries, the average number of resources accessed was 0.88 (SD 1.42). When users attempted a second search on the same clinical question (ie, similar PICO concepts but revised search terms), 7.2% (17/235) resulted in one or more resources accessed, while 92.8% (218/235) led to an end of their search query with no additional resource accessed. When searching for a specific article, 37.9% (47/124) led to one resource accessed and 12.9% (16/124) led to two or more resources accessed.  Table 5 shows types of accessed resources with respect to level of training, type of query, and specialty. Across the 695 structured queries, there were 610 accessed resources, with half of them (314/610, 51.5%) being summaries, 24.4% (149/610) being preappraised research, and 24.1% (147/610) being non-preappraised research. When comparing the distribution of resources that were accessed across the federated search output, medical faculty members looked at significantly more summaries than did postgraduate trainees (P<.001), and family physicians looked at significantly more resources than did internists and specialized physicians (P<.001).

Principal Findings
Among 1085 queries made by 908 clinicians, 695 were structured queries. A small majority were related to background questions, and most foreground questions were questions about therapy, rather than diagnostic or prognostic questions. Structured queries included a median of 3 terms, most often related to the population and intervention or test, rarely related to the outcome, and never related to the comparator. Explicit Boolean terms were rarely used; of note, the search engine assumed by default a Boolean "AND" between search terms. About half of the resources accessed were summaries, while the rest were equally divided between preappraised and non-preappraised resources.
We found no difference between searches made by postgraduate resident trainees and medical faculty members. As they are in training, one could have expected postgraduate residents to have more background questions, whereas faculty members were expected to have more foreground questions, for example, in comparing the effectiveness or risks of management strategies. Our results did not confirm this assumption, as faculty members had more than half of their searches on background questions as well. This may be due to the complexity of patient care. A given faculty member may be an expert in a given field but adopt a learning strategy to rapidly get the big picture, to understand uncommon situations. Their high level of access for summary resources, such as UpToDate or DynaMed, likely supports this explanation. Similarly, family doctors also accessed more summary resources, not only because of their need for quick clear answers to questions arising within short appointments with patients, but also, perhaps, because they provide care for patients across an entire age spectrum.
Another issue relates to the frequency of searches clinicians are able to perform in daily life. In our study, 908 clinicians performed only 1085 queries in 6 months. Other studies have shown that clinicians tend not to search in order to answer the questions that arise in their daily clinical practice [10,[18][19][20]. In our study, a third of the structured searches led to no resource access through the platform, for which we have no explanation. More than 20 years ago, Ely et al [19,20] already showed that clinicians spend less than 2 minutes looking to answer a question-a finding probably even more accurate nowadays with increased access to information online-and suggested that searching for evidence may not fit with clinicians' multiple tasks and training [21]. It is also possible that clinicians have looked for answers in other resources (eg, PubMed or UpToDate), or even in Google, Google Scholar, or Wikipedia. Alternatively, clinicians may often not conduct searches online but, rather, directly ask their peers or use local guidelines [22][23][24][25][26]. Reasons include convenience and time constraints to access ready-to-care information that conforms with local knowledge rather than challenging it. Although looking for answers on a general search engine or via colleagues or guidelines is easier, it does not guarantee or promote a fully EBM approach to health care [27,28]. Clinicians could, therefore, benefit from information specialists available to help at the point of care [29] and from the design of more intuitive tools to navigate the complexity of the evidence ecosystem.
Another observation from our study is that clinicians' queries tend to remain relatively simple: few search terms, often covering few PICO concepts, mostly population and intervention. While simple strategies work well for high-level summaries, they are much less efficient with large databases like PubMed. Our daily habits for searching on the Web may explain clinicians' tendencies for simple queries. Strictly from a user's perspective, we have all become very efficient in searching for information mostly through Google and Wikipedia, just by typing a few intuitive keywords in the free-text bar at the top of a webpage. Medical search engines may misguide the user in having them assume the engine will work similarly to Google [30].
One area for improvement of search engines could be to invite users to structure their queries according to the PICO framework. Schardt et al [31] have found that searchers using the PICO format had more precise results than users searching with the standard interface on PubMed; in that study, precision scores were defined as the number of relevant or gold-standard articles retrieved in a result set compared to the total number of articles retrieved in that set. Unfortunately, and possibly due to the small sample, the difference between the search groups was not statistically significant [31]. An alternative could be to improve search engine functionalities, with the remaining challenge, however, of avoiding any cherry picking of the evidence and, thus, potentially biased conclusions for clinical practice. A potential solution lies with federated search engines like MacPLUS FS, which complement summary-level evidence with other preappraised and non-preappraised research. Indeed, we have shown that physicians access all types of resources translating an interest into different layers of the EBM when these layers are displayed together on one page (see Figure 2). The use of a federated search engine may thus help clinicians navigate across EBM resources, allowing them to look at and compare different resources simultaneously and to identify the current best evidence that is most adapted to their information needs.

Limitations and Strengths of the Study
The main limitation of our study was that clinicians likely used other means than MacPLUS FS to answer some of their daily questions. Our design also did not assess the clinical impact of the answers retrieved. This would have required mixed methods approaches to estimate the number of patients needed to benefit from information (ie, number needed to benefit from information [NNBI]), as described by Pluye et al [32].
Finally, our sample of searches was recorded in the context MacPLUS FS randomized controlled trials [5], and it remains unclear how search queries may have differed without the possible influence of the interventions tested. The second intervention-the evidence retrieval coach-included eight short educational videos, of which only one was providing advice on the PICO formulation of clinical queries. However, only a small group of participants would have been exposed to that short video, and none of the other interventions were specifically aimed at improving the formulation of queries.
Strengths include the direct record of queries in one of the largest samples of physicians from different specialties and levels of practice. It is also the first study on a federated search engine, which allowed us to show that clinicians access all resources and not only summary-level evidence.

Conclusions
A constant flow of new articles overwhelms clinicians who are continuously exposed to them. To keep up and to answer our clinical questions, it is essential to clarify and translate clinical questions into searchable queries. Our results could lead to the development of educational and clinical interventions on how to increase searching skills [2]. These could include workshops and tools to translate clinical questions into queries and to better structure and adapt them to each type of resource.
Our findings also highlight the potential role of federated search engines over the use of single resources to meet clinicians' needs [23]. A federated search engine retrieves evidence and may help clinicians get answers to their questions with current best evidence, even with short time frames and limited experience and skills for searching.
Other avenues of research include the improvement of search functionalities and clinical interventions to meet users' expectations in navigating through the evidence, in order to rapidly find the most relevant and least-biased answers for better clinical practice and patient care.