Science, Vol:122, No:3159, p.108-111, July 15, 1955
Citation Indexes for Science:
A New Dimension in Documentation through Association of Ideas
Eugene Garfield, Ph.D.
"The uncritical citation of disputed data by a writer, whether it be deliberate or not, is a serious matter. Of course, knowingly propagandizing unsubstantiated claims is particularly abhorrent, but just as many naive students may be swayed by unfounded assertions presented by a writer who is unaware of the criticisms. Buried in scholarly journals, critical notes are increasingly likely to be overlooked with the passage of time, while the studies to which they pertain, having been reported more widely, are apt to be rediscovered." (1)
In this paper I propose a bibliographic system for science literature that can eliminate the uncritical citation of fraudulent, incomplete, or obsolete data by making it possible for the conscientious scholar to be aware of criticisms of earlier papers. It is too much to expect a research worker to spend an inordinate amount of time searching for the bibliographic descendants of antecedent papers. It would not be excessive to demand that the thorough scholar check all papers that have cited or criticized such papers, if they could be located quickly. The citation index makes this check practicable. Even if there were no other use for a citation index than that of minimizing the citation of poor data, the index would be well worth the effort required to compile it.
This paper considers the possible utility of a citation index that offers a new approach to subject control of the literature of science By virtue of its different construction, it tends to bring together material that would never be collated by the usual subject indexing. It is best de scribed as an association-of-ideas index, and it gives the reader as much leeway as he requires. Suggestiveness through association-of-ideas is offered by conventional subject indexes but only within the limits of a particular subject heading.
If one considers the book as the macro unit of thought and the periodical article the micro unit of thought, then the citation index in some respects deals in the submicro or molecular unit of thought. It is here that most indexes are inadequate, because the scientist is quite often concerned with a particular idea rather than with a complete concept. "Thought" indexes can be extremely useful if they are properly conceived and developed.
In the literature-searching process, indexes play only a small, although significant, part. Those who seek comprehensive indexes to the literature of science fail to point out that such indexes, although they may be desirable, will provide only a better starting point than the one provided in the selective indexes at present available. One of the basic difficulties is to build subject indexes that can anticipate the infinite number of possible approaches the scientist may require. Proponents of classified indexes may suggest that classification is the solution to this problem. but this is by no means the case. Classified indexes are also dependent upon a subject analysis of individual articles and, at best, offer us better consistency of indexing rather than greater specificity or multiplicity in the subject approach. Similarly, terminology is important, but even an ideal standardization of terminology and nomenclature will not solve the problem of subject analysis.
What seems to be needed, then, in addition to better and more comprehensive indexes, alphabetical and classified, are new types of bibliographic tools that can help to span the gap between the subject approach of those who create documents — that is, authors — and the subject approach of the scientist who seeks information.
Since 1873 the legal profession has been provided with an invaluable research tool known as Shepard’s Citations, published by Shepard’s Citations, Inc., Colorado Springs, Colo..(2) . A citation index is published for court cases in the 48 states as well as for cases in Federal courts. Briefly, the Shepard citation system is a. listing of individual American court cases, each case being followed by a complete history, written in a simple code. Under each case is given a record of the publications that have referred to the case, the other court decisions that have affected the case, and any other references that may be of value to the lawyer. This type of listing is particularly important to the lawyer, because, in law, much is based on precedent.
Citation indexes depend on a simple system of coding entries, one that re quires minimum space and facilitates the gathering together of a great volume of material. However, a code is not absolutely necessary if one chooses to compile a systematic listing of individual cases or reports, with a complete bibliographic history of each of them. Thus, it would be possible to list all pertinent references under each case with sufficient completeness to give the index more of the appearance of a bibliography. However, this would result in an extremely bulky volume.
There are analogies in bibliographic operations. For example, in cataloging looks for booksellers’ or library catalogs, an attempt is made to find references to each book in one or more authoritative bibliographic sources, such as the’ catalogs of the British Museum (BM), Bibliothèque Nationale (BN), or the Library of Congress (LC) . The "authority" card used in cataloging sometimes looks like a Shepard entry.
Another example is a book-review digest, in which one finds for each book title a series of references and selections from published reviews, critical and otherwise. Certain indexing publications perform a similar function.
Some time ago I became concerned with the problem of developing a citation code for science. This was necessary for the efficient manipulation by mechanical devices of entries to scientific indexes. Iii the course of this research I developed a very simple system for identifying an individual scientific article that had appeared in the periodical press. The resulting numerical code consisted of two parts. The first part was a serial number, used instead of an abbreviation, to identify each periodical; it was similar to the serial numbers employed in the World List of Scientific Periodicals, by no means a new idea. For example, Die Bibliographic der fremdsprachigen Zeitschriften Literatur has for many years used such a system to save space.
The second part of the code number was also a serial number, assigned to each article in a particular publication, starting with 1 and continuing throughout all volumes. The code thus gives no indication of year or volume number, a serious shortcoming. The article number is also not unique, having been used by the Proceedings of the Society for Experimental Biology and Medicine since its inception. These two serial numbers taken together, it can be seen, can identify any published periodical article. It soon became apparent, after such codes had been utilized on an experimental basis, that the use of the codes would facilitate the compilation of a citation index. (Other coding systems would be equally applicable.)
A citation index to science would have the following main
First there would be a complete alphabetic listing of all periodicals
in addition to the code number for each periodical. This list would be
similar to the World List, but without the library holdings
The main portion of the citation index would list in straight numerical
order the code numbers for all the articles covered. Under each code
for example, 3001-6789, there would be listed other code numbers
articles that had referred to the article in question,
with an indication of whether the citing source was an original
review, abstract, review article, patent, or translation, and so forth.
In effect, the system would provide a complete listing, for the
covered, of all the original articles that had referred to the article
in question. This would clearly be particularly useful in historical
when one is trying to evaluate the significance of a particular work
its impact on the literature and thinking of the period. Such an
factor" may be much more indicative than an absolute count of the
of a scientist’s publications, which was used by Lehman (3)
Other advantages would also obtain. In a way such listings would provide each scientist with an individual clipping service. By referring to the listings for his article, an author could readily determine which other scientists were making reference to his work, thus increasing communication possibilities between scientists. It is also possible that the individual scientist thus might become aware of implications in his studies that he was not aware of before.
Most authors like to see how their works are received. Bringing together all book reviews and abstracts is very important, for it is not possible for an author to keep up with the thousands of publications in which his contribution might be reviewed. This applies equally to publishers. It would not be impossible to include books in the citation index. Indeed, as a first suggestion, the use of Library of Congress card numbers as the identifying code for books would seem appropriate.
It is necessary next to discuss some realistic questions concerned with the realization of such an index. Bitner (8)has estimated that 30,000 cases are covered by Shepard’s Citations in 1 year, the cases and articles appearing in not more than a few hundred publications. In 1953 about 1 million citations were added—close to 40 citations per case.
What is the prospect in scientific literature? The last published edition of the World List of Scientific Periodicals contained more than 50,000 titles in science and technology. It is variously estimated that between 1 and 3 million new scientific articles are published each year. The Journal of the American Chemical Society alone publishes more than 3000 per year, including approximately 2000 original articles. The order of magnitude is therefore potentially from 50 to 100 times as great as it is for Shepard’s Citations.
However, not all of these 50,000 publications are being covered in our present indexing activities, and yet this has not prevented us from continuing indexes of standard type or from starting new ones. Lack of complete coverage is not necessarily an argument against a citation index. It is in fact an argument in its favor. Coverage could perhaps be limited to the list of periodicals covered by one of the leading indexing services. This approach would, of course, have an immediate disadvantage. Such a subject selection would mean that less directly related subjects of interest would be excluded, and these are the publications that the individual is least likely to cover in his own research.. It would be necessary to consider all the pros and cons in a selective approach and then to determine the possible utility of such a tool. For example, would a citation index to the 1500 periodicals covered by the Current List of Medical Literature be of real value, or, similarly, a citation index to the 5000 periodicals covered by Chemical Abstracts? The Current List would, in fact, offer a good starting point, since it already provides a unique code for the 100,000 items indexed by it each year. Presumably these are the most significant contributions in the covered fields for the year. If 10 is the number of references in the average article, then about 1 million citations would be involved. The preparation of that number annually is not unreasonable. Shepard’s has already used well over 50 million citations in its publishing activities.
The ultimate success of a citation index would depend on many factors. For example, if each periodical would assign unique code numbers to the articles published, it would be possible for authors to list these numbers in their bibliographies and, thus, to save the work of coding on the part of the citation index staff. It is unlikely that such a development could take place in less than 5 or 10 years, but it is comparable to the problem of getting publishers to include Library of Congress card numbers in their publications.
When such a large volume of data is to be handled, mechanical devices of high speed and versatility could be used to great advantage and would probably determine success or failure. Once the coding is done, compilation itself is quite mechanical. This could be done by means of conventional filing slips; the Shepard organization itself has used them success-fully for 80 years. However, it would be facilitated by a mechanical approach using punched cards.
The utility of a citation index in any field must also be considered from point of view of the transmission of ideas. A thorough scientist cannot be satisfied merely with searching the literature through indexes and bibliographies if he is going to establish the history of an idea. He must obviously do a great deal of organized, as well as eclectic, reading. The latter is necessary because it is impossible for any one person (the indexer) to anticipate all the thought processes of a user. Conventional subject indexes are thereby limited in their attempt to provide an ideal key to the literature. The same may be said of classification schemes. In tracking down the origins of an idea, the citation index can be of real help. This is well illustrated by an example from my own experience.
Many years ago the Radio Corporation of America developed a reading-aid for the blind.(9 ) This device had an electronic system for converting printed letters into recognizable sound patterns. Using the device, a blind man could scan a printed page; in a set of headphones he could hear a series of sound patterns, each letter having its own recognizable sound pattern. In effect, the words were spelled out, letter by letter, in code. I was particularly interested in this device because I had been independently working on a device that would copy print, letter by letter, and reproduce it for bibliographic and other purposes. The two devices had something in common in that they both employed scanning devices. I then wanted to learn whether anyone had ever suggested that the RCA reading-aid could be used for this purpose. It will be apparent that if anyone had known of the RCA device and had thought of adapting it for copying purposes, a reference to the article might have been made. This reference could easily have been included in an article or patent that was not at all related to the problem of reading devices. A citation index would have given me just what I was after. Nothing could substitute for extensive reading, but a great deal of time could have been saved by bringing the appropriate works to my attention.
In the course of my reading I did find a few references to this device, one in a book (10), and several others in periodical articles, one of which was a German article on the mechanization of philological analyses and concordance building. The latter article (11) did not discuss my own special interest in copying devices, but it did show the similarity between the author’s and my own thinking from the point of view of letter-recognition devices, which is what the RCA device attempts to be. In other words, both of us were interested in this device as a letter-recognition device for the analysis of text.
In another instance the RCA article was unexpectedly cited in the journal Electronic Engineering in an article on information theory (12) that I was reading because of an entirely different interest. No subject indexer could have anticipated this crossbreeding of interests. Perhaps there are many other articles and books unknown to me that have made similar references to this device. How can they be located when the main subject matter of the article is, on the surface, so unrelated in nature?
One might say that it would be possible to index articles more thoroughly to achieve the same results. For example, the article on information theory, if thoroughly indexed, might have included an entry under reading devices for the blind. Yet if this were done, our periodical indexing services would clearly become hopelessly overloaded with material that is not necessary to lead us to the micro unit—the entire article or one of its major sections. Although it might be said that no scientist interested in the greater comprehensiveness to be found in a citation index would object to having such a great mass of references in a subject index, this is impracticable. It would require an army of indexers to read the articles and identify the exact subject matter of every paragraph or sentence. Yet this would be necessary. To illustrate, it is only in the very last paragraph of the article on information theory that one would find a reference to reading devices for the blind.
Were an army of indexers available, it is still doubtful that the proper subject indexing could be made. Over the years changes in terminology take place, that vitiate the usefulness of a standard subject index. To a certain extent, this is overcome through the citation approach, for the author who has made reference to a paper 40 or 50 years old has interpreted the terminology for us. By using authors’ references in compiling the citation index, we are in reality utilizing an army of indexers, for every time an author makes a reference he is in effect indexing that work from his point of view. This is especially true of review articles where each statement, with the following reference, resembles an index entry, superimposed upon which is the function of critical appraisal and interpretation. To the indexer this has its ad-vantages as well as its disadvantages. (13)
To determine in a practical way what the citation index could offer, it was decided to track down the citations made in one journal to a single significant article, in order to compile a sample entry for the citation index. At the suggestion of Erich Meyerhoff, I selected Hans Selye’s famous article on the general adaptation syndrome (14). A systematic search was then made of all papers that were published in the Journal Of Clinical Endocrinology subsequent to Selye’s paper up to 1951—a period of 5 years, including well over 500 articles. Every bibliography in each of the 500 articles was checked for a reference to Selye’s article. Twenty-three articles were found to make such reference; each of them was then checked for the character of the information provided.
Examination of the citation list (Table 1) shows the great variety of subject matter included. One thing became quite clear, even to the uninitiated—that is, the influence of Selye’s article has been quite pronounced. Such evidence is extremely valuable to the historian.
It is interesting to note that, although all the articles cited were indexed in Quarterly Cumulative Index Medicus, not one is to be found there under the heading "Adaptation." In fact, it is surprising not to find any articles from this journal under this subject heading.
It also becomes quite obvious that many references to Selye’s
were general and contribute little or nothing to the readers’
since exact page references are not provided. In several cases the
article is even cited but not referred to in the text. Selye’s
on all of these authors is quite apparent. In particular instances the
citations are of value in locating confirmatory evidence of some of
Thus, in the case of a highly significant article, the citation index has a quantitative value, for it may help the historian to measure the influence of the article — that is, its "impact factor." With regard to a less significant work, one would suspect that the bibliographic advantages might be increased, because the scientist or librarian would be provided with references not to be found in conventional indexes. The preliminary evidence presented indicates that the citation index offers interesting possibilities for another approach to bibliographic control.
The next step in compiling the index for the Selye article would be to seek out additional references to it in more peripheral journals, but obviously the farther away you get from the immediate subject area of the main article, the fewer the references to it you will locate. Yet these may well be the most useful references of all, for the cross-fertilization of subject fields is one of our most important problems in science literature.
It will be well to close with a brief description of how the citation index might be compiled. The first step would be the selection of the particular group of periodicals to be covered; next, the period to be covered, say, only that since 1900.
The problem actually has two facets: the selection of periodicals to be covered in order to obtain citations, and the selection of those articles for which we want a citation record. For example, all articles in journals in the Current List of Medical Literature that have remained in continuous publication since 1900 might be coded, in which case the Journal of Clinical Endocrinology would not be included. However, we might include as citation sources all journals covered by the Current List. Thus, the bibliographies appearing in articles in the Journal of Clinical Endocrinology would supply references to the basic group of articles.
Each coder would be assigned a group of articles in a particular journal. The first step would be to number each article in the journal in ascending order, by utilizing a complete table of contents of that journal from its inception.
Once a code number has been assigned to each article, the proper codes may then be assigned to each periodical. This might be the number given in the World List, with new numbers for any periodicals not to be found there.
Actual coding starts with the first article in a particular periodical. The coder prepares a 3- by 5-in. card for each citation made in the article. Each card should give (i) the code number for the citing article, (ii) the code number for the article cited, and (iii) a classification of the citing article as an original contribution, review article, abstract, and so forth.
Many references will be excluded by the limits of coverage set up. Thus all references to articles not in the prescribed list of journals would be excluded.
All books would be excluded unless otherwise specified, in which case the reference card would carry the code for the citing article and the code for the book (its LC card number).
After all the articles had been coded, it would next be necessary to sort the cards by the code numbers for the items cited. This would yield a group of cards for each cited article. These would then be sorted by code numbers for the citing articles. This completes the coding and sorting. The next step would be preparation for the printer.
From this description it will be apparent that, although a great volume of material is to be covered, relatively unskilled persons can perform the necessary coding and filing. Professional supervision would still be required, because certain decisions require skilled judgment, for example, when ibid. or loc. cit. must be carefully interpreted. Footnotes tend to make coding somewhat cumbersome. The code I have described is merely an example used to illustrate the method in principle. If the system were adopted, then in the future every author ought to be required to include the serial number of each item he referred to, so as to facilitate not only the compilation of citation indexes but also other operations such as requests for reprints (15)(16).
In a certain sense a citation index is not very different from a compendium like Beilstein, which gives a rather complete record of a compound, compiled by a similar method. A citation index for the literature of chemistry would undoubtedly make the preparation of such works as Beilstein much easier than it is at present. The new bibliographic tool, like others that already exist, is just a starting point in literature research. It will help in many ways, but one should not expect it to solve all our problems.
References and Notes