This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included.

Single-choice items (eg, best-answer items, alternate-choice items, single true-false items) are 1 type of multiple-choice items and have been used in examinations for over 100 years. At the end of every examination, the examinees’ responses have to be analyzed and scored to derive information about examinees’

The aim of this paper is to compile scoring methods for individual single-choice items described in the literature. Furthermore, the metric

Scoring methods for individual single-choice items were extracted from various databases (ERIC, PsycInfo, Embase via Ovid, MEDLINE via PubMed) in September 2020. Eligible sources reported on scoring methods for individual single-choice items in written examinations including but not limited to medical education. Separately for items with n=2 answer options (eg, alternate-choice items, single true-false items) and best-answer items with n=5 answer options (eg, Type A items) and for each identified scoring method, the metric expected chance score and the expected scoring results as a function of examinees’

A total of 21 different scoring methods were identified from the 258 included sources, with varying consideration of correctly marked, omitted, and incorrectly marked items. Resulting credit varied between –3 and +1 credit points per item. For items with n=2 answer options, expected chance scores from random guessing ranged between –1 and +0.75 credit points. For items with n=5 answer options, expected chance scores ranged between –2.2 and +0.84 credit points. All scoring methods showed a linear relation between examinees’

In examinations with single-choice items, the scoring result is not always equivalent to examinees’

Multiple-choice items in single-response item formats (ie, single-choice items) require examinees to mark only 1 answer option or to make only 1 decision per item. The most frequently used item type among the group of single-choice items is the so-called best-answer items. Here, examinees must select exactly 1 (ie, the correct or most likely) answer option from the given answer options [

Single-choice items have been used for more than 100 years to test examinees’ knowledge. The use of these items began among US school pupils, who were given alternate‑choice or best-answer items [

The use of multiple-choice items did not remain exclusive to the setting of high schools but also extended to examinations in university contexts [

Examinations aim at assessing examinees’ ability (ie, examinees’

To grade examinees or to decide about passing or failing a summative examination based on a minimum required level of

Since the introduction of multiple-choice items, numerous scoring methods have been described in the literature and (medical) educators are advised to choose an appropriate scoring method based on an informed decision. Therefore, the aim of this scoping review is (1) to map an overview of different scoring methods for individual single-choice items described in the literature, (2) to compare different scoring methods based on the metric

Examples of 3 different multiple-choice items in single-choice format and alternative designations used in the literature (no claim to completeness).

The literature search was performed according to the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) checklist [

Potentially eligible sources were scientific articles, books, book chapters, dissertations, and congress abstracts reporting scoring methods for individual single-choice items in written examinations including but not limited to medical examinations. Scoring methods for item groups and scoring on examination level (eg, with different weighting of individual items, with mixed item types, or considering the total number of items per examination) were not assessed. Further, scoring methods that deviate from the usual marking procedure (ie, a single choice of marking exactly 1 answer option per item) were not considered. These include, for example, procedures that assess the confidence of examinees in their marking (eg, confidence weighting), let examinees select the incorrect answer options (eg, elimination scoring), let examinees narrow down the correct answer option (eg, subset selection), or allow for the correction of initially incorrectly marked items (eg, answer-until-correct). No further specifications were made regarding language, quality (eg, minimum impact factor), or time of publication.

Four databases (ERIC, PsycInfo, Embase via Ovid, and MEDLINE via PubMed) were searched in September 2020. The search term was composed of various designations for single-choice items as well as keywords with regard to examinations. It was slightly adapted according to the specifications of the individual databases. The respective search terms for each database can be found in

Search terms used for each of the 4 databases.

Database | Search term |

ERIC | (“single choice” OR “alternate choice” OR “single response” OR “one-best-answer” OR “single best response” OR “true-false” OR “Typ A”) AND (item OR items OR test OR tests OR testing OR score OR scoring OR examination OR examinations) |

PsycInfo | (“single choice” OR “alternate choice” OR “single response” OR “one-best-answer” OR “single best response” OR “true-false” OR “Typ A”) AND (item OR items OR test OR tests OR testing OR score OR scoring OR examination OR examinations) |

Embase via Ovid | ((“single choice” or “alternate choice” or “single response” or “one-best-answer” or “single best response” or “true-false” or “Typ A”) and (item OR items or test or tests or testing or score or scoring or examination or examinations)).af. |

MEDLINE via PubMed | (“single choice”[All Fields] OR “alternate choice”[All Fields] OR “single response”[All Fields] OR “one-best-answer” OR “single best response” OR “true-false”[All Fields] OR “Typ A”[All Fields]) AND (“item”[All Fields] OR “items”[All Fields] OR “test”[All Fields] OR “tests”[All Fields] OR “testing”[All Fields] OR “score”[All Fields] OR “scoring”[All Fields] OR “examination”[All Fields] OR “examinations”[All Fields]) |

Literature screening, inclusion of sources, and data extraction were independently performed by 2 authors (AFK and PK). First, the titles and abstracts of the database results were screened. Duplicate results as well as records being irrelevant to the research question were sorted out. For books and book chapters, however, different editions were included separately. In a second step, full-texts sources were screened, and eligible records were included as sources. In addition, the references of included sources were searched in an additional hand search for further, potentially relevant sources. After each step, the results were compared, and any discrepancies were discussed until a consensus was reached. Information with regard to the described scoring methods was extracted using a piloted checklist.

The following data were extracted from included sources using a piloted spreadsheet if reported: (1) name of the scoring method, (2) associated item type, and (3) algorithm for calculating scores per item. The mathematical equations of each scoring method were adjusted to achieve normalization of scores up to a maximum of +1 point per item if necessary.

For all identified scoring methods, the expected scoring results in case of pure guessing were calculated for single-choice items with n=2 and n=5 answer options, respectively [

In addition, expected scoring results for varying levels of k (0≤k≤1) were calculated. For examinees with partial knowledge (0<k<1), a correct response can be attributed to both partial knowledge and guessing, with the proportion of guessing decreasing as knowledge increases. By contrast, examinees with perfect knowledge (k=1) always select the correct answer option without the need for guessing [

Examinees were expected to answer all items, and it was supposed that examinees were unable to omit individual items or that examinees do not use an omit option. Furthermore, all items and answer options were assumed to be of equal difficulty and to not contain any cues. The calculation of the expected scoring result is shown in the following equation:

where f are the credit points awarded for a correctly marked item (i=1) or an incorrectly marked item (i=0) depending on the scoring method used; k is the examinees’ ^{0} is defined as 1.

MATLAB software (version R2019b; The MathWorks) was used to calculate the relation between examinees’

Within the literature search, a total of 3892 records were found through database search. Of these, 129 sources could be included. A further 129 sources were identified from the references of the included sources by hand search. The entire process of screening and including sources is shown in

The included sources describe 21 different scoring methods for single-choice items. In the following subsections, all scoring methods are described with their corresponding scoring formulas for calculating examination results as absolute scores (S). In addition, an overview with the respective scoring results for individual items as well as alternative names used in the literature is presented in

Flow diagram of systematic literature search.

Identified scoring methods and algorithms for single-choice items.

Method number and sources | Scoring method | Algorithm^{a-e} |

1 [ |
0-1 score [ Zero-one scoring [ Binary scoring [ Dichotomous scoring [ All-or-none scoring [ Number-right (NR) scoring [ Number of right (NR) rule [ No. right score (No Rt) [ NC Rights score [ R method [ Number correct scoring [ Percentage-correct scoring [ Raw score [ Score=rights [ Uncorrected score [ Conventional scoring [ Rights-only score [ 3 right minus 0 wrong [ |
f=1 (if i=1) |

2 [ |
Formula scoring [ Omission-formula scoring [ Omit-correction [ Positive scoring rule [ Adjusted score [ |
f=1 (if i=1) |

3 [ |
Fair penalty [ |
f=1 (if i=1) |

4 [ |
N/A^{g} |
f = 1/(n – 1) (if i=1) |

5 [ |
N/A | f=1 (if i=1) |

6 [ |
Formula scoring [ Conventional-formula scoring [ Conventional correction-for-guessing formula [ Conventional correction formula [ “Neutral” counter-marking [ CG Negative marking [ Logical marking [ Correction for blind guessing (CFBG) [ Correction for guessing (CFG) formula [ Correction for chance formula [ Discouraging guessing [ Rights minus wrongs correction [ Corrected score [ Classical score [ Mixed rule [ |
f=1 (if i=1) |

7 [ |
N/A | f = 1/(n – 1) (if i=1) |

8 [ |
N/A | f = (n – 1)/n (if i=1) |

9 [ |
3 right-wrong [ Negative marking [ |
f=1 (if i=1) |

10^{i} [ |
N/A | f=1 (if i=1) |

11 [ |
N/A | f=1 (if i=1) |

12^{i} [ |
N/A | f=1 (if i=1) |

13 [ |
Formula scoring [ Correct-minus-incorrect score [ C-I score [ R-W method [ Number right minus number wrong method [ Right-minus-wrong method [ Rights minus wrongs method [ Right-wrong [ T-F formula [ Guessing penalty [ Correction-for-guessing [ Negative marking [ Logical marking [ 1 right minus 1 wrong [ Penal guessing formula [ Corrected score [ |
f=1 (if i=1) |

14^{i} [ |
N/A | f=1 (if i=1) |

15^{i} [ |
N/A | f=1 (if i=1) |

16 [ |
N/A | f=1 (if i=1) |

17^{i} [ |
N/A | f=1 (if i=1) |

18^{i} [ |
N/A | f=1 (if i=1) |

19 [ |
Right – 2 wrong [ 1 right minus 2 wrong [ Rights minus two times wrongs [ r-2w [ |
f=1 (if i=1) |

20^{i} [ |
1 right minus 3 wrong [ |
f=1 (if i=1) |

21^{j} [ |
N/A | f=1 (if i=1)_{m}=1)_{m}=0) |

^{a}f: resulting score per item.

^{b}i=1 if the item was marked correctly; otherwise i=0.

^{c}n: number of answer options per item (n≥2).

^{d}o=1 if the item was omitted; otherwise o=0.

^{e}t_{m}=1 if the statement is true; otherwise t_{m}=0.

^{f}NC: number correct.

^{g}N/A: not applicable (ie, no explicit name was previously introduced in literature).

^{h}CG: correct for guessing.

^{i}Only described for n=2.

^{j}Only described for single true-false items.

One credit point is awarded for a correct response. Therefore, the examination result as absolute score (S) corresponds to the number of correct responses (R). No points are deducted for incorrect responses (W). The formula is S = R.

One credit point is awarded for a correct response. In addition, 1/n credit points per item are awarded for each omitted item (O). No points are deducted for incorrect responses. The formula is S = R + O/n. This scoring method was first described by Lindquist [

One credit point is awarded for a correct response. For incorrect responses, 1 – 1/n credit points are awarded. The formula is S = R + (1 – 1/n)W. This scoring method was first described by Costagliola et al [

For each correct response, 1/(n – 1) credit points are awarded. Omitted items and incorrect responses do not affect the score. The formula is S = R/(n – 1). For example, 1 credit point is awarded for a correct response on single-choice items with n=2 (ie, alternate-choice items, single true-false items) but only 0.25 credit points are awarded for a correct response on best-answer items with n=5. This scoring method was first described by Foster and Ruch [

One credit point is awarded for a correct response. For incorrect responses, 1/[2 (n – 1)] points are deducted. The formula is S = R – W/[2 (n – 1)]. This scoring method was first described by Little [

One credit point is awarded for a correct response. For incorrect responses, 1/(n – 1) points are deducted. The formula is S = R – W/(n – 1). This scoring method was first described by Holzinger [

For each correct response, 1/(n – 1) credit points are awarded. For an incorrect response, 1/(n – 1) points are deducted. The formula is S = (R – W)/(n – 1). This scoring method was first described by Petz [

For each correct response, (n – 1)/n credit points are awarded. For an incorrect response, 1/n points are deducted. Omissions do not affect the score. The formula is S = [(n – 1)/n]R – W/n. As a result, examinees achieve only 0.5 credit points for each correct response on single-choice items with n=2 and 0.8 credit points for each correct response on best-answer items with n=5. This scoring method was first described by Guilford [

One credit point is awarded for a correct response. For incorrect responses, 1/3 points are deducted. The formula is S = R – (1/3)W. Originally, this scoring method was described by Paterson and Langlie [

One credit point is awarded for a correct response. For incorrect responses, 0.48 points are deducted. The formula is S = R – 0.48W. This scoring method was first described by Gupta and Penfold [

One credit point is awarded for a correct response. Half a point is deducted for incorrect responses. The formula is S = R – 0.5 W. This scoring method was first described in 1924 by Brinkley [

One credit point is awarded for a correct response. For incorrect responses, 0.6 points are deducted. The formula is S = R – 0.6W. This scoring method was first described by Gupta [

One credit point is awarded for a correct response. One point is deducted for incorrect responses. The formula is S = R – W. For items with n=2, methods 6 and 13 result in identical scores. This scoring method was first described by McCall [

This scoring method results in 1 credit point for a correct response, 0.7 credit points for an omitted item, and –1 point for an incorrect response. The formula is S = R + 0.7O – W. This scoring method was first described by Staffelbach [

This scoring method results in 1 credit point for a correct response, 0.7 credit points for an omitted item, and –1.1 points for an incorrect response. The formula is S = R + 0.7O – 1.1W. This scoring method was first described by Kinney and Eurich [

One credit point is awarded for a correct response. For an incorrect response, n/(n – 1) points are deducted. The formula is S = R – nW/(n – 1). This scoring method was first described by Miller [

For an incorrect response, 1.5 times as many points are deducted as credit points are awarded for a correct response. The original scoring formula is S = 2R – 3W. If a maximum of 1 credit point is awarded per item, 1 credit point is awarded for a correct response and 1.5 points are deducted for an incorrect response. This results in the following scoring formula: S = R – 1.5W. This scoring method was first described by Cronbach [

One credit point is awarded for a correct response. For an incorrect response, 1.8 points are deducted. The scoring formula is S = R – 1.8W. This scoring method was first described by Lennox [

One credit point is awarded for a correct response. For an incorrect response, 2/(n – 1) points are deducted. The formula is S = R – 2W/(n – 1). This scoring method was first described by Gates [

One credit point is awarded for a correct response. Three points are deducted for an incorrect response. The formula is S = R – 3W. This method was first described by Wood [

One credit point is awarded for correctly identifying the statement of true-false single items as true or false. If the statement presented is marked incorrectly, 62/38 points are deducted on true statements (W_{t}, incorrectly marked as false), but only 38/62 points are deducted on false statements (W_{f}, incorrectly marked as true). The scoring formula is S = R – (62/38)W_{t} – (38/62)W_{f}. This scoring method was first described by Cronbach [

The expected chance scores of examinees without any knowledge (k=0) vary between –1 and +0.75 credit points per item for single-choice items with n=2. For single-choice items with n=5, expected chance scores show a larger variability. Here, the expected chance scores vary between –2.2 and +0.84 credit points per item, depending on the selected scoring method. A detailed list is shown in

Overview of scoring results for single-choice items with either n=2 or n=5 answer option.

Method number | Scoring formula^{a-f} |
n^{g}=2 |
n=5 | |||||

Credit for incorrect responses^{h} |
Credit for correct responses^{i} |
Expected chance score | Credit for incorrect responses^{h} |
Credit for correct responses^{i} |
Expected chance score | |||

1 | S = R | 0 | 1 | 0.50 | 0 | 1 | 0.20 | |

2 | S = R + O/n | 0 | 1 | 0.50 | 0 | 1 | 0.20 | |

3 | S = R + (1 – 1/n)W | 0.50 | 1 | 0.75 | 0.80 | 1 | 0.84 | |

4 | S = R/(n – 1) | 0 | 1 | 0.50 | 0 | 0.25 | 0.05 | |

5 | S = R – W/[2 (n – 1)] | –0.50 | 1 | 0.25 | –1/8 | 1 | 0.10 | |

6 | S = R – W/(n – 1) | –1 | 1 | 0.00 | –0.25 | 1 | 0.00 | |

7 | S = (R – W)/(n – 1) | –1 | 1 | 0.00 | –0.25 | 0.25 | 0.15 | |

8 | S = [(n – 1)/n]R – W/n | –0.50 | 0.50 | 0.00 | –0.20 | 0.80 | 0.00 | |

9 | S = R – (1/3)W | –1/3 | 1 | 1/3 | –1/3 | 1 | –2/30 | |

10 | S = R – 0.48W | –0.48 | 1 | 0.26 | –0.48 | 1 | –23/125 | |

11 | S = R – 0.5W | –0.50 | 1 | 0.25 | –0.5 | 1 | –0.20 | |

12 | S = R – 0.6W | –0.60 | 1 | 0.20 | –0.6 | 1 | –0.28 | |

13 | S = R – W | –1 | 1 | 0.00 | –1 | 1 | –0.60 | |

14 | S = R + 0.7O – W | –1 | 1 | 0.00 | –1 | 1 | –0.60 | |

15 | S = R + 0.7O – 1.1W | –1.10 | 1 | –0.05 | –1.10 | 1 | –0.68 | |

16 | S = R – nW/(n – 1) | –2 | 1 | –0.50 | –1.25 | 1 | –0.80 | |

17 | S = R – 1.5W | –1.50 | 1 | –0.25 | –1.5 | 1 | –1.00 | |

18 | S = R – 1.8W | –1.80 | 1 | –0.40 | –1.8 | 1 | –1.24 | |

19 | S = R – 2 W/(n – 1) | –2 | 1 | –0.50 | –0.5 | 1 | –0.20 | |

20 | S = R – 3W | –3 | 1 | –1.00 | –3 | 1 | –2.20 | |

21 | S = R – (62/38)W_{t} – (38/62)W_{f} |
–62/38 or –38/62 | 1 | N/A^{j} |
–62/38 or –38/62 | 1 | N/A^{j} |

^{a}S: examination result as absolute score.

^{b}R: number of correct responses.

^{c}O: number of omitted items.

^{d}W: number of incorrect responses.

^{e}W_{t}: number of true statements incorrectly marked as false.

^{f}W_{f}: number of false statements incorrectly marked as true.

^{g}n: number of answer options per item.

^{h}R=0, O=0, W=1.

^{i}R=1, O=0, W=0.

^{j}Expected chance scores were not calculated for method 21, because these depend on the proportion of true-false items with correct or incorrect statements.

The relation between examinees’

Relation between examinees’

In this review, a total of 21 scoring methods for single-choice items could be identified. The majority of identified scoring methods is based on theoretical considerations or empirical findings, while others have been arbitrarily determined. Although some methods were only described for certain item types (ie, single-choice items with n=2), most of them might also be used for scoring items with more answer options. However, 1 method is suitable for scoring single true-false items only.

All scoring methods have in common that omitted items do not result in any credit deduction. Some scoring methods even award a fixed amount of 0.7 points on omitted items (methods 14 and 15), which is, however, lower than the full credit for a correct response, or the score to be achieved on average by guessing (1/n, method 2).

For the identified scoring methods, the possible scores range from a maximum of –3 to +1 points. A correctly marked item is usually scored with 1 full point (1 credit point). Exceptions to this are 3 scoring methods that only award 1 credit point in case of single-choice items with n=2 (methods 4 and 7) or that never award 1 credit point (method 8). These scoring methods are questionable because as the number of answer options increases, the guessing probability decreases. Further, a differentiation between examinees’ marking on true and false statements (method 21) is not justified, because the importance of correctly identifying true statements (ie, correctly marking the statement as true) and false statements (ie, correctly marking the statement as false) is likely to be considered equivalent in the context of many examinations.

With the exception of method 6, the relation between examinees’

Examinations are designed to determine examinees’ knowledge as well as to decide whether the examinees pass or fail in summative examinations. It can be generally assumed that examinees must perform at least 50% of the expected performance to receive at least a passing grade [

To account for guessing in case of single true-false items, the scoring formula R – W (method 13) was originally propagated in the literature, where the number of incorrect responses is subtracted from the number of correct responses [

Therefore, alternative scoring methods and scoring formulas emerged in addition to the already discussed scoring formula R – W. In this context, the literature often refers to formula scoring. However, the term

So far, the relation between examinees’

Although some of the identified scoring methods might also be applied to other item formats (eg,

In practice, the evaluation of a multiple-choice examination should be based on an easy-to-calculate scoring method that allows for a transparent credit awarding and is accepted by jurisdiction. In this regard, scoring methods with malus points (ie, methods 5-21) may not be accepted by national jurisdiction in certain countries (eg, Germany) [

The scoring of examinations with different item types, item formats, or items containing a varying number of answer options within a single examination is more complicated. Here, the individual examination sections would have to be evaluated separately or the credit resulting from the respective item type or item format would have to be corrected to enable a uniform pass mark. For example, in the single-choice format, credit points resulting from items with n=2 would have to be reduced to compensate for the higher guessing probability compared with items with n=5 (ie, 50% vs 20% guessing probability).

Single-response items only allow clearly correct or incorrect responses from examinees. Consequently, the scoring should also be dichotomous and result in either 0 points (incorrect response) or 1 credit point (correct response) per item. Because of the possibility of guessing, scoring results cannot be equated with examinees’

PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) checklist.

Excluded sources after screening of full texts.

correct for guessing

resulting score per item

examinees’

number of answer options per item

number correct

number of omitted items

Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews

International Prospective Register of Systematic Reviews

number of correct responses

examination result as absolute score

number of incorrect responses

_{f}

number of false statements incorrectly marked as true

_{t}

number of true statements incorrectly marked as false

The authors acknowledge support by the Open Access Publication Funds of Göttingen University. The funder had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

All data generated during or analyzed during this study are included in this published article and its supplementary information files.

AFK and PK contributed to the study’s conception and design, performed the literature search and data extraction, and drafted the manuscript. PK performed statistical analyses. All authors interpreted the data, critically revised the manuscript, and approved the final version of the manuscript.

None declared.