Alexander I. Pudovkin*, Eugene Garfield**
*Institute of Marine Biology, Far East Branch, Russian Academy
of Sciences, Vladivostok 690041, Russia
Telephone: 7-4232-311-173; Fax: 7-4232-310-900; email: email@example.com
**Chairman Emeritus, Institute for Scientific Information®
3501 Market Street, Philadelphia, PA 19104-3389, USA
Telephone: 215-243-2205; Fax 215-387-1266; email: firstname.lastname@example.org
Abstract. Using citations, papers and references as parameters a relatedness factor (RF) is computed for a series of journals. Sorting these journals by the RF produces a list of journals most closely related to a specified starting journal. The method appears to select a set of journals that are semantically most similar to the target journal. The algorithmic procedure is illustrated for the journal Genetics. Inter-journal citation data needed to calculate the RF were obtained from the 1996 ISI Journal Citation Reports on CD-ROM©. Out of the thousands of candidate journals in JCR©, thirty have been selected. Some of them are different from the journals in the JCR category for genetics and heredity. The new procedure is unique in that it takes varying journal sizes into account.
The classification of scientific and scholarly journals is a problem
well known to scientists and librarians for decades. Traditional classification
relies on subjective analysis which for one reason or another proves inadequate
and is subject to the vagaries of time. Quantitative methods have been
proposed for overcoming these problems. This was greatly facilitated with
the introduction of citation indexes in the 1960's and the later introduction
of the ISI Journal Citation Reports. JCR's for science and
social science are produced annually. In the seventies, JCR's in
print were issued as the last volume of the Science Citation Index©
or Social Sciences Citation Index©. Later microform
and CD-ROM editions were introduced and more recently it appeared on the
JCR reports inter-journal citation frequencies for thousands of journals. In addition to an alphabetic listing, journals are grouped by categories. Journals are assigned to categories by subjective, heuristic methods1. In many fields these categories are sufficient but in many areas of research these “classifications” are crude and do not permit the user to quickly learn which journals are most closely related.
JCR provides, for each journal, a set of its most closely related journals based on citation relationships. These are the journals it cites most heavily (cited journals) and also the journals which cite it most often (citing journals). These are extremely useful and provide a crude classification, but unfortunately due to the variations in the sizes of journals one only obtains a superficial perception of the relatedness between two or more specific journals.
Various authors have studied journal-to-journal citation rates, mostly for the purposes of hierarchical clustering of the journals and delineation of specialty fields (Narin et al., 1972; Narin et al., 1973; Leydesdorff, 1994; Narin et al., 2000). However, they do not deal with the key problem of varying journal sizes. In this paper we have described a method which takes size into account. The method has its origins in earlier works by Pudovkin and Elizabeth Fuseler (Pudovkin, 1992, 1993; Pudovkin and Fuseler, 1995). They attempted to visualize citation relationships of core marine and freshwater biology journals. For that purpose indexes of citation relatedness were used. This enabled journals to be clustered and then displayed in a two-dimensional diagram. The resultant “map” of journal relatedness was quite meaningful: a tight group of multi-disciplinary marine biology journals occurred in the center of the diagram, journals more narrow in scope were situated on the periphery, topically similar journals being grouped close to each other. The more meaningful visualization of marine journals was the result of using the indexes of citation relatedness, which took into account the variation in journal sizes.
Recently, Egghe and Rousseau have developed a theory for quantifying language preferences in journal citations (Egghe, et al., 1999; Rousseau, 1999; Egghe, Rousseau, 2000). The measures suggested by them are similar to the indexes of citation relatedness suggested by Pudovkin (1992, 1993): the measures developed by them also take into account the number of citations from one journal to another and the sizes of the journals. However, our approach is more pragmatic than theoretical. We wished to develop a procedure that would, through quantitative evaluation of citation relatedness, allow one to automatically find topically similar journals, that is, without considering the titles of papers or journal content.
The algorithm described here uses the indexes of citation relatedness, suggested by Pudovkin (1992, 1993). The process appears to approximate the subjective, that is, semantic judgment of experts. We have illustrated the procedure using one core journal in the field of genetics and heredity, the well known Genetics, published by the Genetics Society of America.
Journal Relationship Measures
Let journal relatedness of two journals, “i” and “j” be symbolized by Ri>j. = Hi>j * 106 / (Papj * Refi ), where Hi>j is the number of citations in the current year from journal “i” to journal “j” (to papers published in “j” in all years of 'j'), Papj and Refi are the number of papers published and references cited in the j-th and i-th journals in the current year. An arbitrary multiplier of 106 makes the values of the relatedness index more easily perceived and handled. For example, the 1996 issues of Genetics cited all years of Heredity 351 times. The number of references cited in Genetics was 21,060, and the number of papers published in Heredity was 146. Substituting these numbers in the formula we get RG>H = 351*106 /(146*21,060) = 114.2 (where G stands for Genetics and H stands for Heredity). Figure 1 visualizes these calculations.
The rationale for the formulation of the indexes follows. The number of citations from one journal to another journal should be (on average) proportional to the number of papers published in the cited journal and to the number of cited references in the citing journal. Thus, a journal publishing 1,000 papers a year will tend to receive 10 times as many citations as a journal publishing only 100 papers, all other conditions being the same. Similarly, a journal which has cumulatively cited 10,000 references will tend to cite another journal ten times more often than a topically similar journal that cumulatively cites 1,000 references. Thus, these numbers, which reflect the sizes of citing and cited journals, are placed into the denominator of the formula. The number of citations a journal receives depends on the cumulative number of papers published in the journal during all the years of its existence. Since an annual JCR does not provide this historical information, it was decided to use the number of papers published in the current year. It was understood, of course, that this convention introduces a fortuitous error in the estimation of citation relatedness, as journal sizes change differently from year to year. Though, for the majority of journals their sizes are relatively stable over the years (Garfield, 1996). It was considered unwise to use the number of citations to the papers of the current year because of the time lag in getting citations, which is quite significant in less than hot research fields. Besides, yearly citation scores are rather low for many journals, hence they would be too subject to chance fluctuations.
If we consider a pair of journals, A and B, there may be two indexes: RA>B and RB>A. These can be very different. Consider the above mentioned journals, Genetics and Heredity RG>H = 114.2 and RH>G = 68.3. It is noteworthy that the citation relatedness of a journal to itself (that is “self-relatedness”) may be lower than its relatedness to some other journals. For instance, Journal of Genetics has both citing and cited relatedness indexes with Genetics that are higher than the self-relatedness of Genetics. The latter, RG>G = 301.9; the former RJG>G = 338.3; and RG>JG = 503.7. The same is true for Genetics and Genetical Research relationship: RGR>G = 393.0; and RG>GR = 306.0. It is interesting to note the very high self-relatedness of the Journal of Genetics, RJG>JG = 961.5 and Genetical Research, RGR>GR = 1693.0.
As was mentioned above, each pair of journals may be characterized with a pair of indexes, that quantifies their reciprocal citation levels: “A” citing “B”, and “B” citing “A”. How should one integrally characterize the citation relatedness of a pair of journals? Previously, Pudovkin (1993) and Pudovkin and Fuseler (1995) used the arithmetic average of the two indexes where RA&B = (RA>B + RB>A)/2. Now it is suggested we use the larger of them, RA&Bmax = max(RA>B, RB>A), which we shall call the relatedness factor (RF). A similarly sounding term, Relationship Factor, was recently introduced by Shama et al. (2000), though it refers to the relationship between disciplines rather than journals. It takes into account the impact factors of journals and the number of citations from journals of one discipline to the journals of another.
Consider the pair of journals Genetics and Genetika (Russian Journal of Genetics). The latter is the title of the low circulation cover-to-cover translation in English that is published simultaneously with the original. Both Genetics and Genetika are very similar in content, publishing papers on all aspects of genetics. But being a Russian language journal Genetika receives few citations from Genetics, while it cites Genetics quite often. The citation relatedness indexes for them are RA>B = 49.7 and RB>A = 1.6 (where A stands for Genetika and B stands for Genetics). Similar situations are observed with other national journals, even those published in English: e.g. the French English language journal Genetics Selection Evolution and Genetics, RGSE>G = 124.2, RG>GSE = 25.8. Two other examples: Scandinavian Genetica and Genetics, R1>2 = 97.5, R2>1 = 42.9; the British journal Heredity and Genetics, R1>2 = 160.9, R2>1 = 48.5. The analogous situation applies when the pair of journals involves one which is an older, established journal and the other is a recently launched one, e.g. Genome and Genetics, R1>2 = 122.7 and R2>1 =29.9. Another example: Molecular Ecology and Genetics, R1>2 = 127.7 and R2>1 =16.7. Thus, the maximal value of the two indexes seems to better reflect the topical similarity of the journals.
The asymmetry of citation relationships in some journal pairs discussed above has some similarity to the language preferences studied by Egghe and Rousseau (2000), though the asymmetry revealed by us is certainly a different phenomenon, as it is often seen in journal pairs in the same language.
Illustration: Finding the Journals Most Related to Genetics
For each journal JCR provides two lists: citing and cited journals. The cited and citing citation scores were retrieved for those journals that cited Genetics or were cited by it 7 or more times. Also retrieved were journals with lesser citation scores (2 and more), which seemed “genetical” judging from their titles. There were 271 such journals. Thirty journals with the highest RF (with Genetics) are given in Table 1.
Table 2 lists 30 journals which give to or receive from Genetics the highest number of citations (raw citation scores). Journal titles in bold face are included in the “Genetics & Heredity” (“G & H”) category of JCR.
Table 2Thirty journals giving or receiving the highest number of citations to or from Genetics
Data based on JCR, 1996. A: Impact Factor; B: number of 1996 papers;
C: number of cited references; D: maximal number of citations (to or from
the journal); E: rank by “D”; F: relatedness factor to Genetics,
RG&imax; G: rank by “F”. Journals in JCR “Genetics &
Heredity” category in bold.
It is evident that the new algorithmic approach selected the journals that are similar in content to Genetics: Twenty one journals listed in Table 1 are in the “G & H” category while only 13 journals in Table 2 are in this category. This difference is due to the weighting (or filtering) property of the citation relatedness indexes and the RF, which will be discussed below. The algorithm located some other journals that should be included in the “G & H” category (or genetics should be indicated as subcategory for them), that are not now included: 1) Molecular Biology and Evolution, 2) Molecular Ecology, 3) Maydica. The first journal is categorized by JCR as “biochemistry & molecular biology”, though it mostly covers population and evolutionary genetics. The second journal publishes population and evolutionary genetics papers, touching on ecology. JCR's category for it is “ecology” without any mention of “genetics”. The third journal is characterized by JCR as “agriculture; plant science”. Consideration of the journal paper titles shows that twenty papers of 42 published in Maydica in 1996 dealt with genetics or genetic improvement in cultivated plants. Also noteworthy, the subcategory of “genetics” is not indicated in the JCR category for Annual Review of Ecology and Systematics, which publishes many highly cited papers on population and evolutionary genetics. It ranks 26th in Table 1.
It is interesting to note the high citation relatedness to Genetics of journals dealing with developmental and cell biology. These disciplines are much “geneticized” now. This is reflected in Table 1. The journals Cell, Molecular and Cellular Biology, Rouxs Archives of Developmental Biology are among 30 journals most related to Genetics.
An important feature of the suggested approach is the calculation of SPECIFIC citation relatedness, that is, the new indexes take into consideration the sizes of citing (through the number of references) and cited (through the number of published papers) journals. The word SPECIFIC is used as are terms in physics such as “specific weight”, “specific density”, etc. If one ignores journal size in considering citation scores, the pattern of relatedness is quite different. Table 2 includes 30 journals that give or receive the highest number of citations to or from Genetics. It is important to note the high ranks of multidisciplinary journals such as Proceedings of the National Academy of Sciences of the USA, Nature, Science and of very large non-genetics journals such as Journal of Biological Chemistry, Journal of Bacteriology, Journal of Molecular Biology. Among the journals in Table 2 one does not find smaller journals that are highly related to Genetics and included in the JCR “G & H” category such as Journal of Genetics, Journal of Neurogenetics, Genetics Selection Evolution, Evolutionary Biology, Genes & Genetic Systems. The proposed method is further illustrated when one compares the data for a few other journals included in JCR's “G & H” category, by raw citation scores and by the RF (Table 3).
Table 3Some core genetics journals ranked by relatedness factor to Genetics, RG&imax or raw citation scores
Data based on JCR, 1996. A: Impact Factor, B: number of 1996 papers,
C: number of cited references, D: relatedness factor to Genetics,
RG&imax; E: raw citation score (maximal of “to” or “from”),
F: rank by D, G: rank by E. Journals in JCR “Genetics & Heredity” category
It can be seen that all the journals have much higher ranks when sorted
by RF rather than by raw citation scores. The differences in ranks of three
“genetical” journals are noteworthy. These are not included in the “G &
H” category. They are Fungal Genetics and Biology – 133 and 41 (the
first number is the rank by raw citation score, the second is the rank
by RF), Development Genes and Evolution – 138 and 70,
Genetica – 163 and 72. The RF ranks these genetics journals closer
to Genetics than raw citation scores do. To illustrate the low information
content of the latter, compare the data for two journals that are very
different in size: Journal of Biological Chemistry and Molecular
Biology and Evolution (Table 4).
Table 4Citation relatedness of two journals of different sizes, which cite or are cited by Genetics with similar numbers of citations
Data based on JCR, 1996. Rj>G and RG>j are indexes
of citing and cited relatedness of a journal “j” and Genetics.
Though they give to and receive from Genetics similar numbers of citations, they are very different in relevance to Genetics, which is clearly reflected in the relatedness indexes: RG>j and Rj>G are 3.8 & 2.6 and 109.2 & 142.1, respectively.
The small Journal of Genetics published in India is an interesting case. It is a journal with a low impact factor of 0.278. In 1996 it published only 8 papers containing 390 cited references. It ranks 1st in the Table 1, but when sorted by raw citation score it ranks 49th. Of its 390 cited references 88 are to Genetics (that is 22.6%, while self-citation of Genetics is only 13.5%). It probably means that Indian scientists publishing in the Journal of Genetics frequently publish in Genetics as well and in their papers in Genetics frequently cite the papers they publish in Journal of Genetics. Evidently, this is not true for authors in other national journals such as the French Genetics Selection Evolution, the Scandinavian Genetica, the British Heredity and the Russian Genetika.
It seems unexpected that Genetics is so weakly related to journals on human and medical genetics (see Table 5).
Table 5Citation reletedness of journals on human and medical genetics to Genetics
Data based on JCR, 1996. A: Impact Factor; B: number of 1996 papers;
C: number of cited references; D: relatedness factor to Genetics,
RG&imax; E: raw citation score (maximal of “to” or “from”);
F: rank by D; G: rank by E. Journals in JCR “Genetics & Heredity” category
Here we summarize the results of our study.
1. The new algorithmic approach enables one to find thematically related journals out of a multitude of journals.
2. Weighting citation data by journal size allows identifying journals that are similar in content better than unweighted raw citation data.
3. In the case of the starting journal Genetics the method identified those journals which are significantly genetic in content, but were not included in the “Genetics & Heredity” category of the JCR.
4. Journals included in the “G & H” category are rather heterogeneous in content. Some are highly related to Genetics, while others, as for example journals on medical genetics are poorly related to its content. There is a significant difference between subjects such as plant, animal, human and other aspects of genetics.
5. JCR has become an established world wide resource but after two or more decades it needs to reexamine its methodology for categorizing journals so as to better serve the needs of the research and library community.
6. Using the methods described JCR could provide additional options for its web version. JCR's listings for cited and citing journals could provide a column with relatedness indexes (RA>j and Rj>A) and provide the option to sort by raw citation scores, relatedness indexes and relatedness factor just as it does now for the impact factor.
One might speculate on further usage of the suggested procedure, when it is computerized. Three applications come easily to mind.
1) Searching for relevant journals to form a small laboratory library.
Specify a small set of journals (say, 3 to 5), which are undoubtedly relevant
to the Lab's research profile. Pool up all the references contained in
these journals and sum up numbers of papers in each of them, thus forming
a pooled-up “macrojournal”. (The idea was used by Cozzens & Leydesdorff,
but had been earlier used by Garfield, 1986). Earlier
journal citation studies too numerous to mention had used terms such as
core, unit, group, or category, and coincided with the appearance of the
first JCR in 1975 (Garfield, 1975). Count the
number of citations given to the macrojournal by all other available journals
and received from the macrojournal by each of them. Calculate the RF of
the macrojournal with all other journals.
Sort the journals by the RF. Select a reasonable number of the journals with the highest ranks. These will constitute the desired set of journals most relevant to the Lab's research profile.
2) Determining the subject category for a journal, when it is not evident from the journal title. Perform the procedure for the journal under categorization, identical to that described above for the journal Genetics. The journals with the highest ranks (after sorting by RF) will characterize the semantic category of the journal under categorization.
3) Algorithmic categorization of journals according to a pre-specified set of subject categories. Set up the desired set of categories. Select for each category a small set of undoubtedly relevant (diagnostic) journals. Form a macrojournal for each category (as described above in item 1). For each journal to be categorized calculate RF. Sort the diagnostic macrojournals by RF (in relation to the journal under categorization). If the value difference of RF for the 1st and 2nd ranks is substantial, ascribe to the journal under categorization the category of the macrojournal ranked 1st. If the difference in RF values is not substantial, ascribe to the journal the categories of macrojournals ranked 1st and 2nd.
We wish to thank ISI® for permission to use the JCR data for this study, and three anonymous referees for useful comments.
BACK Cozzens, S.E., & Leydesdorff, L. (1993). Journal systems as macro-indicators of structural change in the sciences, in: A.F.J. Van Raan, R.E. de Bruin, H.F. Moed, A. J. Nederhof, & R.W.J. Tijssen (eds), Science and Technology in a Policy Context (Leiden: DSWO Press), 219-233.
BACK Egghe, L., Rousseau, R., & Yitzhaki, M. (1999). The "own-language preference": Measures of "relative language self-citation." Scientometrics, 45, 217-232.
BACK Egghe, L., & Rousseau R. (2000). Partial orders and measures for language preferemces. Journal of the American Society for Information Science, 51 (12), 1123-1130.
Garfield, E. (1975). No-growth libraries and citation analysis; or, pulling
weeds with ISI's Journal Citation Reports. Current Contents No. 26, 5-8.
Reprinted in Essays of an Information Scientist, Volume 2, pp. 300-303
(1975). Philadelphia: ISI Press.
Garfield, E. (1986). Journal Citation Studies. 46. physical chemistry and
chemical physics journals. Part 2. Core journals and most-cited papers.
Current Contents No. 2, 3-10 (January 13, 1986). Reprinted in Essays of
an Information Scientist, Volume 9, pp. 9-16 (1988).
Garfield, E. (1996). The significant scientific literature appears in a
small core of journals. Scientist, 10 (17), 13-16.
ISI Journal Citation Reports. http://www.isinet.com/isi/products/citation/jcr/index.html
BACK Leydesdorf, L. (1994). The generation of aggregated journal-journal citation maps on the basis of the CD-ROM version of the Science Citation Index. Scientometrics, 31: 59-84.
BACK Narin, F., Carpenter, M.P., & Berlt N. (1972). Interrelationships of scientific journals. Journal of the American Society for Information Science and Technology, 23: 323-331.
BACK Narin, F., Carpenter, M.P. (1973). Clustering of scientific journals. Journal of the American Society for Information Science and Technology, 24: 425-435.
BACK Narin, F., Hamilton, K.S., & Olivasto, D. (2000). The development of science indicators in the United States. In B. Cronin & H. B. Atkins (Eds). The Web of Knowledge: A Festschrift in Honor of Eugene Garfield (pp. 337-360). Medford, NJ: Information Today.
BACK Pudovkin, A.I. (1992). Citation links of the journal Biologiya Morya (Soviet Journal of Marine Biology). Biologiya Morya, No. 5-6, 83-92. (in Russian)
BACK Pudovkin, A.I. (1993). Citation relationships among marine biology journals and those in related fields. Marine Ecology Progress Series, 100, 207-209.
BACK Pudovkin, A.I., Fuseler, E.A. (1995). Indices of journal citation relatedness and citation relationships among aquatic biology journals. Scientometrics, 32, 227-236.
BACK Rousseau, R. (1999). Temporal differences in self-citation rates of scientific journals. Scientometrics, 44 (3), 521-531.
Shama, G., Hellgardt K., & Oppenheim C. (2000). Citation footprint
analysis. Part I: UK and US chemical engeneering academics. Scientometrics,
49 (2), 289-305.