We publish R tutorials from scientists at academic and scientific institutions with a goal to give everyone in the world access to a free knowledge. Our tutorials cover different topics including statistics, data manipulation and visualization!

Introduction

Scholar indices are intended to measure the contributions of authors to their fields of research. Jorge E. Hirsch suggested the h-index in 2005 as an author-level metric intended to measure both the productivity and citation impact of the publications of an author. An author has index h if h of his or her N papers have at least h citations each, and the other (N-h) papers have no more than h citations each.

In response to a comment, we will use our trusty RISmed package and the PubMed database to develop a script for calculating an h-index, as well as two similar metrics, the m-quotient, and g-index. Here is the code to conduct the search, the citations information is stored in the `EUtilitiesSummary()`

as `Cited()`

.

x <- "Yi-Kuo Yu" res <- EUtilsSummary(x, type="esearch", db="pubmed", datetype='pdat', mindate=1900, maxdate=2015, retmax=500) citations <- Cited(res) citations <- as.data.frame(citations)

Calculating the h-index is just a matter of cleverly arranging the data. Above, we created a data frame with one column containing all the values of `Cited()`

in our search. We will sort them in descending order, then make a new column with the index values. The highest index value that is greater than the number of citations is that author’s h-index. The following code will return that index number.

citations <- citations[order(citations$citations,decreasing=TRUE),] citations <- as.data.frame(citations) citations <- cbind(id=rownames(citations),citations) citations $id<- as.character(citations$id) citations $id<- as.numeric(citations$id) hindex <- max(which(citations$id<=citations$citations)) hindex12

Here is the data frame we created above that shows that Dr. Yi-Kuo Yu has an h-index of 12, since he has 12 publications with 12 or more citations.

citationsid citations 1 181 2 62 3 34 4 31 5 23 6 19 7 19 8 18 9 14 10 14 11 13 12 13 13 10 14 8

Although the h-index is a useful metric to measure an author’s impact, it has some disadvantages. For instance, a long, less impactful career will typically outscore a superstar junior scientist. For these cases, the m-quotient divides the h-index by the number of years since the author’s first publication. In this sense it is a way to normalize by career span.

y <- YearPubmed(EUtilsGet(res)) low <- min(y) high <- max(y) den <- high-low mquotient <- hindex/den mquotient0.92

Another weakness of the h-index is that doesn’t take into account highly cited publications. It doesn’t matter if an author has a few highly cited publications, he gets the same h-index as a relatively obscure author. The g-index was developed to address this situation. The g-index is the largest rank (where papers are arranged in decreasing order of the number of citations they received) such that the first g papers have (together) at least g^2 citations”. Here is code to calculate the g-index.

citations$square <- citations$id^2 citations$sums <- cumsum(citations$citations) gindex <- max(which(citations$square<citations$sums)) gindex22

We made two new columns, one for the squares of the index column and one for the cumulative sum of the citations in descending order. Similar to the h-index, we need the index of the highest squared index value that is less than the cumulative sum. Our output with the two new columns below shows that Dr. Yu has a g-score of 22, based on the fact that especially his top two publications have many citations.

citationsid citations square sums 1 181 1 181 2 62 4 243 3 34 9 277 4 31 16 308 5 23 25 331 6 19 36 350 7 19 49 369 8 18 64 387 9 14 81 401 10 14 100 415 11 13 121 428 12 13 144 441 13 10 169 451 14 8 196 459 15 7 225 466 16 7 256 473 17 7 289 480 18 7 324 487 19 7 361 494 20 7 400 501 21 6 441 507 22 5 484 512 23 4 529 516 24 4 576 520

Check out the updated Shiny App to let the App do the work for you.