Skip to main content

The Predictive Nature of Criterion Scores on Impact Score and Funding Outcomes

Re-posted from the Office of Extramural Research, National Institutes of Health

In order to develop and implement data-driven policy, we need to carefully analyze our data to understand the “stories behind our metrics.” Without analyzing our data to know what’s going on, we’re essentially flying blind! A group of authors from the NIH Office of Extramural Research sought to investigate the stories behind peer review scoring and why some grant applications are more likely to be funded than others. They extended analyses previously reported by NIH’sOffice of Extramural Research and National Institute of General Medical Studies. Last month,they published their analysis of over 123,000 competing R01 applications and described the correlations of individual component peer review scores – significance, investigator(s), innovation, approach, and environment – with subsequent overall impact score, and funding outcome.

Box Plot of Criterion Score and Overall Impact Score. Long description in caption.

Figure 1: Box Plot Distributions of Criterion and Overall Impact Scores for R01 Applications, FY 2010–2013. Fig 1 shows the box plot distributions of the five research criterion scores (scale: 1–9) and the Overall Impact score (scale: 10–90). Box plot whiskers extend to the most extreme data point which is no more than 1.5 times the interquartile range from the box. Each criterion score N = 123,707 applications; Overall Impact score N = 71,651 applications.

From “How Criterion Scores Predict the Overall Impact Score and Funding Outcomes for National Institutes of Health Peer-Reviewed Applications” by Eblen, et al.: Box Plot Distributions of Criterion and Overall Impact Scores for R01 Applications, FY 2010–2013.

NIH’s use of these criterion scores began in 2009, as a part of the Enhancing Peer Review initiative. The authors analyzed data regarding R01 applications submitted in fiscal years 2010-2013, and constructed multivariable regression models to account for variations in application and applicant characteristics. They found that by far an application’s approach score, and to a lesser extent, the significance score, were the most important predictors of overall impact score and of whether any given application is funded.

What does this mean for you as applicants? We think it’s helpful for R01 applicants to know that the description of the experimental approach is the most important predictor of funding, followed by the significance of the study. As an applicant, familiarizing yourself with the peer reviewer guidance and questions they are asked about approach and significance may be helpful as you put together your application.

The authors leveraged their data to examine a number of other potential correlates of funding. For example, they find that the New Investigator status of the R01 application is positively associated with funding outcomes. This lends even more support to our recommendation that early-career applicants and investigators should familiarize themselves with new and early investigator policies when considering submission of multiple-PI applications, and applying to NIH.

The authors also report some interesting data consistent with previously reported data on funding outcomes by race and gender: women were slightly less likely to be funded than men (rate ratio 0.9, P<0.001), while black applicants were substantially less likely to be funded than white applicants (rate ratio 0.7, P<0.001). However, when the criterion score were factored into the regression models, the demographic differences disappeared (for women adjusted rate ratio 1.0, P=0.22; for blacks adjusted rate ratio 1.0, P=0.73). In an upcoming Open Mike blog post we’ll discuss additional data related to this topic and steps NIH is taking to address these issues.

I’d like to thank the authors of this manuscript for this analysis and congratulate them on their publication.