ASSOCIATION-OF-IDEAS TECHNIQUES IN DOCUMENTATION
SHEPARDIZING THE LITERATURE OF SCIENCE
Smith, Kline & French Laboratories
Submitted as course work to:
Research Information Center, National Bureau of Standards.
In the literature searching process indexes play only a small though significant part. Those who seek comprehensive indexes to the litexature of science fail to point out that such indexes, though they may be desirable, will only provide a better starting point than the one provided in selective indexes presently available. One of the basic difficulties is building subject indexes that can anticipate the infinite number of possible approaches the scientist may require. The proponents of classified indexes would suggest that classification is the solution to this problem, but it is by no means the case. Classified indexes are also dependent upon subject analysis of individual articles and at best offer us better consistency of indexing rather than greater specificity or greater multiplicity in subject approach. Similarly, terminology is important, but even ideal standardization of terminology and nomenclature will not solve the problem of subject analysis.
What may be needed then, in addition to better and more comprehensive indexes, alphabetical and classified, are new types of bibliographical tools that may help to span the gap between the subject approach of those who create documents, i.e. authors and the subject approach of the scientist seeking information.
This paper considers the possible utility of a Citation Index which does offer a new approach to subject control of the literature of science. By virtue of its different construction it tends to bring together material that would never have been so collated by the usual subject indexing. It is best described as an association-of-ideas index, giving the reader as much leeway as he requires. Suggestiveness through association-of-ideas is offered by conventional subject indexes, but only within the limits of a particular subject heading. If one considers the book as a macro unit of thought and the periodical article the micro unit of thought, as does Ranganathan, then the Citation Index in some respects deals in the sub-micro unit or molecular unit of thought. It is here that most indexes are inadequate because the scientist is quite often concerned with a particular idea rather than a complete concept. “Thought" indexes can be extremely useful, if properly conceived and developed.
Since 1873 the legal profession has been provided with an invaluable research tool known as Shepard’s Citations , published by Shepard’s Citations, Inc. (1), Colorado Springs, Colorado. A citation index is published for court cases in the forty-eight states as well as for cases. in Federal courts. Briefly, the Shepard Citation system is a listing of individual American court cases, each case being followed by a complete history, written in a simple code. Under each case s given a record of the publications which have referred to the case, the other court decisions which have affected the case and any other references which may be of value to the lawyer. This type of listing is particularly important to the lawyer, since so much in law is based upon precedent. (16)
Shepardizing, as this system of recording information has been termed, is dependent upon a simple system of coding entries, which requires a minimum amount of published pages, and facilitates the gathering together of the great volume of material covered, However, it is also possible to Shepardize without a code, if one means by the term the systematic listing of individual cases or reports, with a complete history of those cases or reports from a bibliographical point of view. Thus, it would be possible to list under each case, completely spelled out, all pertinent references, giving it more of the appearance of a bibliography. However, this would result in an extremely bulky volume.
There are many parallels to this type of activity in bibliographical op erations. For exanple, authority work in the cataloging of books for booksellers catalogs or library card catalogs involves the attempt to find references to the book- being cataloged in one or more well known bibliographical sources such as the British Museum (BM), Bibliotheque Nationale (BN), or the Library of Congress (LC). An authority card sometimes looks like a Shepard entry. However, this is not Shepardizing since there is neither a systematic collating technique employed in gathering the data, nor is the information listed as systematically as in a Citation Index. It is not possible for any librarian to locate the title of any book in a single source and find there all of the above information. If he could, he would then have an international union catalog.
Another example is a book review digest where one finds for each title included a series of references and selections from publis hod reviews, critical and otherwise, appearing in the literature. Certain indexing publications perform a similar function.
Some time ago I became concerned with the problem of developing a citation code for science (not Shepardizing which was necessary for the efficient manipulation by mechanical devices of entries to scientific indexes . In the course of this research I developed a very simple system for identifying an individual scientific article appearing, in the periodical press. The resulting numerical code consisted of two parts. The first was a serial number uniquely identifying the periodical publication involved, similar to the serial numbers employed in the World List of Scientific Periodicals , by no means a new idea. - For example, Der Bibliographie der Fremdsprachigen Zeitschriften Literatur has used such a system for many years to reduce space requirements. Instead of giving an abbreviation to the title of a journal, a serial number is used.
The second part of the code number was also a serial number which was assigned to each article in a particular publication starting with number 1 for the very first article published, continuing through all published volumes. Thus, the code gives no indication of year or volume number, a serious shortcoming. The article number is also not unique, having been used by the Proceedings of the Society for Experimental Biology and Medicine
since its inception. These two seri al numbers taken together, it will be seen, could identify any published periodical article. (It is not the purpose of this paper to discuss the merits or disadvantages of this simple code. I am trying to trace the background of the main theme of this paper, i.e. the Citation Index.) It soon became apparent, after utilizing such codes on an experimental basis, that the use of this code would facilitate the compilation of a Shepardized system of listings for scientific articles. (Other coding systems would be equally applicable.)
A CITATION INDEX TO SCIENCE would have the following main characteristics. There would first be a complete alphabetic listing of all periodicals covered, in addition to the code number for each periodical. This would be similar to the World List without the library holdings information. In the main portion of the Citation Index there would be a listing in straight numerical order of the code numbers for all articles covered. Under each code number, as e.g. 3001-6789 there would appear a list of other code numbers representing articles that had referred to the article in question. Next to each one of these code numbers there would be an indication as to whether the citing source was an original article, review, abstract, review article, patent or translation, etc. The first of these, the original article, is particularly important. In effect, we would have a complete listing, for the publications covered, of all the original articles that had made reference to the article in question. It should be apparent that this would be particularly useful in historical research, where, e.g. one is trying to evaluate the significance of a particular work and its impact on the literature and thinking of the period. Such an “Impact Factor” may be much more indicative than an absolute count of the number of a scientists publications, used by Lehman (2) and Dennis (3). The “Impact Factor” similar to the quantitative measure obtained by Gross (4) in evaluating the relative importance of scientific journals, a method later criticized by Brodman (5) but used again by Fussier (6).
Other advantages would also be obtained. In a certain sense such listings would provide each scientist with an individual clipping service. By referring to the listings for his article an author could readily determine what other scientists were making reference to his work, thus increasing communication possibilities between scientists. On the other hand, it is possible that the individual scientist may become aware of implications in his studies that he was not aware of before.’
Most authors like to see how their works are received. Bringing together all book reviews ana abstracts is very important, since it is not possible for an author to keep up with the thousands of publications in which his contribution may be reviewed. This applies equally to publishers. It would not be impossible to include books in the. Citation Index. Indeed, as a first suggestion the use of Library of Congress card numbers as the identifying code for books would seem most appropriate.
It is now necessary to discuss some realistic questions concerned with the materializing of such an index. Bitner (7) has estimated 30,000 cases covered by Shepard's Citations in one year, the cases and articles appearing in not more than a few hundred publications. In 1953 about 1,100,000 citations were added or close to forty citations per case.
What is the prospect in scientific literature? The last published edition of the World List of Scientific Periodicals contained over 50,000 titles in science and technology. It is variously estimated that between one to three million new scientific articles a year are published. The Journal of the Americal Chemical, Society publishes over 3,000 per year alone including approximately 2000 original articles. The order of magnitude then is, potentially, from fifty to one hundred to one as compared with the figures for Shepard’s Citations.
However, we are not covering all of these 50,000 publications in our present indexing activities and this has not prevented us from continuing standard type indexes and starting new ones. Lack of complete coverage is not necessarily an argument against a citation index. It is in fact an argument in its favor. Coverage could be limited perhaps to the list of periodicals covered by one of the leading indexing services. This approach would of course have an immediate disadvantage. Such a subject selection would mean that less directly related subjects of interest would be excluded and these are the publications that the individual is least likely to cover in his own research, It would be necessary to consider all the pros and cons in selective approach and then determine the possible utility of such a tool. For example, would a Citation Index to the 1,500 periodicals covered by the Current List of Medical Literature be of value or similarly a Citation Index to the 5,000 periodicals covered by Chemical Abstracts . From an experimental point of view the Current List would offer a good starting point since one is already provided with unique codes for the 100,000 items indexed each year. Presumably these articles represent the most significant contributions for the year. If one takes ten as the number of references in the average article then about one million citations would be involved. This is not an unreasonable figure. In its publishing activities Shepard’s has already used well over fifty million citations appearing in over fifty separate sections. In addition, this has been done on a cumulative basis, resulting in permanently bound volumes for 15 year periods.
The ultimate success of a Citation Index would depend upon many factors. For example, if it were possible to have each periodical assign unique code numbers to the articles published it would be possible for authors to list these numbers in their bibliographies, thus saving the work of coding on the part of the Citation Index staff. It is unlikely that such a development could take place in less than five or ten years, but it is comparable to the
problem of getting publishers to include LC card numbers in their publications.
Where such a large volume of data is to be handled it must be expected that mechanical devices of high speed and versatility would be of great advantage ‘and would probably be a determining factor in the system's success. Once the coding is completed the work of’ compilation is quite mechanical. As will be explained later this work can be done on conventional filing slips, but would be facilitated by a mechanical approach. The Shepard organization has used the former method successfully for 80 years.
The utility of a citation index in any field must also be considered from the point of view of the transmission of ideas. A thorough scientist cannot merely be satisfied with searching the literature through indexes and bibliographies, if he is going to establish the history of an idea. He must obviously do a great deal of organized, as well as, eclectic reading. The latter is necessary because it is impossible for one person (the indexer) to anticipate all the thought processes of ,the user. Conventional subject indexes are thereby li mi ted in their attempt to provide an ideal key to the literature. The same may be said of classification schemes. In tracking down the origins of an idea the Citation Index can be of help. This is well illustrated by an example from my own experience.
Many years ago RCA developed a reading-aid for the blind (8). This device had an electronic system for converting printed letters into a recognizable sound pattern. In effect, the words were spelled out, letter by letter, in code. I was particularly interested in this device because I had been independently working ,on a device which would copy print, letter by letter, and reproduce it for bibliographical and other purposes. The two devices had something in common in that they both employed scanning devices. I then wanted to learn if anyone had ever suggested that the RCA reading aid could be used for this purpose. It will be apparent that if anyone had known of the RCA device and had thought of adapting it for copying purposes, a reference to the article might have been made. This reference could easily have been written in an article which was not at all related to the problem of reading devices. A citation index would have given me just what I was after. Nothing could substitute for extensive reading, but a great deal of time could have been saved by bringing the appropriate works to my attention.
In the course of my readings I did find a few references to this device, one in a book, (9)and several others in periodical articles, one of which was a German article on the mechanization of philological analyses and concordance building. The latter article(10) did not discuss my own special interest in copying devices, but it did show the similarity in Busa’s thinking to my own from the point of view of letter recognition devices, which is what the RCA device attempts to be. In other words, I was also interested in this device as a letter recognition device and found another author interested in letter recognition devices for the analysis of text. It is not a coincidence that different scientists should have such associative interests. Indeed, Busa is a philosopher -- not a chemist.
In another instance the RCA article was cited in the journal Electronic Engineering in an article on Information Theory (11). I mention this not because it had any bearing on the device itself, but because I was definitely interested in the subject matter of this article, even though it was remotely related to the problem of reading devices. No subject indexer could havE anticipated this cross-breeding , of interests. One can only conclude that there are undoubtedly many other articles and books in existence that have made similar references to this device. It is not unlikely that these articles would be of interest, even though the main subject matter of the article might be, on the surface, unrelated.
One might say that it would be possible to index articles more thoroughly to achieve the same result. The article on Information Theory, if thoroughly indexed might have included an entry under reading devices for the blind. It should be quite apparent that if this were done, our periodical indexing services would become hopelessly overloaded with necessary material, since the function of the periodical index is presumably that of leading us to the micro unit -- the entire article or at least a significant portion of one.
Yet, it might be said that the scientist who is interested in the greater depth to be found in a Citation Index cannot object to such a great mass of references ‘in a subject index, The final nswer here is that it would require an army of indexers to read the articles and determine the exact subject matter involved in every: published paragraph or sentence. Thus, in the very last paragraph of the last mentioned article one finds the following:“Sensory Prosthesis. There have been various attempts to replace one of the body senses by another. Devices have been made which enable a blind person to read or find his way about (20)(2l), by arranging that information which would normally be received visually is transformed so that it can be received by one of the other senses, usually by ear." (Electronic Engineering, Nov. 1953)
Even if we had this army of indexers available it is doubtful that the proper subject approach would be taken in all cases. One might also mention that over the years changes in terminology take place, which are to a certain ex tent overcome through the citation approach, since the author who makes a reference to a paper that is forty or fifty years old is making the jump in terminology for us. By using authors' references for the Citation Index we are in effect utilizing an actual army of indexers, in that every time an author makes a reference he is essentially indexing that work from his own point of view. It can be shown that this is especially true in the review article where each statement, followed by a reference, is very much like an index entry, with the addition of the critical approach, which has its advantages as well as disadvantages (12).
In order to get a more practical view of what the Citation Index could offer it was decided to track down a single article as much as possible, so as to compile a sample entry for the Citation Index. To make the search productive it was decided to select a significant contribution to the literature since it would be expected that such an article would be cited more frequently. At the suggestion of Mr. Erich Meyerhoff, I selected Hans Selye’s famous article on the General Adaptation. Syndrome (13). A systematic search was then made of all papers that were published in The Journal of Clinical Endocrinology subsequent to Selye's paper up to 1951 -- a period of five years, including well over 500 articles. Every bibli- ography in each of the 500 articles was checked for a reference to Selye’ a article. Twenty-three articles did make such a reference, each of which was then checked for the character of the information provided.
Examination of the citation list (see appendix) shows the great variety of subject matter included. Actual examinaticn of the articles indicates that the information would in. this case, be of little value to the scientist from the point of view of the information contained in the citing paragraphs. For example, “In thyrotoxicosis some of the adrenal changes probably are manifestations of a chronic alarm reaction,” A reference to Selye follows.
Usually the references (as in this case) are of a general nature, merely citing Selye to confirm their own opinions or observations. One thing does become quite clear though, even to the relatively uninitiated, i.e. the influence of the Selye article is quite pronounced. Such evidence would be extremely valuable to the historian.
It is interesting to note that although all the articles cited were indexed in Quarterly cumulative Index Medicus not one is found under the heading for adaptation. In fact, it is surprising not to find any articles from this journal under that subject heading.
It also becomes quite obvious that many references to this work were absolutely unnecessary and contribute nothing to the readers’ enlightenment, since there is never an exact page reference. In several articles the Selye article is cited and not referred to in the text, These references would have been more properly included in a list of additional readings, and is reminiscent of observations made by Murray and Kopeck (14) concerning ghost references in compiling their bibliography on tissue culture. Selye’ s influence on all of these authors is quite apparent. In certain instances the citations are of value in finding confirmation of some of his theories as “Selye also reported a lowering of the BMR after severe physical stress.”
It becomes apparent then, that in the case of a very significant article, the importance of the Cttation Index is a quantitative one, in that it may help the historian measure its influence, i.e. its “Impact Factor”. In a less significant work one would suspect that the bibliographical advantages would be increased, providing the scientist or librarian with references that would not be found in conventional indexes. Undoubtedly a much more careful study and analysis of the Citation Index is in order, but it would seem from the evidence presented it offers interesting possibilities for another approach to bibliographical control.
The next step would be to seek out additional references to the Selye article in the more peripheral journals, but it should be obvious that the further you get away from the subject area the fewer articles will be located and yet these are probably the most useful references since the cross fertilization of subject fields is one of our most important problems in science literature.
In order to make sure that the reader has a very clear idea of what the Citation Index would be like, it might be wise to close with, a brief description of how the Citation Index might be compiled.
The first step would be the selection of the group of periodicals to be covered. A decision would then have to be made concerning the time period to be covered. For example, one might specify that nothing earlier than 1900 would be included.
Actually this is a two facet problem. One is the selection of periodicals to be covered from the point of view of obtaining citations, The other is the focus on the articles for which we would be interested in having a citation record. As a concrete example it could be specified that all articles appearing in journals on the list of the Current List of Medical Literature that have been in continuous publication since 1900 would be included. Thus, the Journal of Clinical Endocrinology would not be included. However, we would include in our coding coverage all journals covered by the Current List . Thus, we could and would code the bibliog raphies appearing in articles in the Journal of Clinical Endocrinology because they will contain references to our basic group of articles.
Each coder would be assigned a group of articles. The first step would be to number each article in ascending order, starting with the first article published. This is a rather mechanical job, but it immediately indicates the need for a large library. This step could be expanded by stating that the coder is preparing a complete list of the contents of that journal from its inception. In many cases, such a list is actually available. A copy of this list could be made and the code numbers assigned. The disadvantage of such a simple code thus becomes immediately apparent. A code based on volume number and pagination might be much easier, but also has its disadvantages.
Assuming that we have now assigned the proper code to each article, we would then assign the proper codes to each periodical. This could be done by using the number appearing in the World List . New numbers would have to be made for any periodicals located that are not on this list.
We are now ready for actual coding. Starting with the first article in a particular periodical, the coder prepares a 3 x 5 card for every reference made in each article. On each card there appears the code number for the citing article and then the code number for the article cited. There would also appear a note as to whether it (the citing article) is an original contribution, review article, abstract, etc. Thus, for the first article there might be 25 such cards prepared. Proceeding to the second article the same thing would be done. It will be apparent that at first there may be relatively few cards needed because most of the references will be to works out of the time period specified. Thus, if the first article we are coding appeared in 1900, probably none of the articles cited will be of value since we have decided to include only references later than 1900. As we get into the century more and more pertinent references will be found.
There will also be many further references that we will exclude by virtue of our scope definition. All journal articles not on the list must be excluded. If, the Physical Review is not one of the journals in the group to be covered, no references to articles in this journal will be coded. On the other hand, if this journal had been on the Current List list, then its articles would be coded. An article in the Physical Review could easily make references to one of our basic group of articles.
All books are excluded unless it is specified that they are to be included, in which case the card for references to books would contain the code for the citing article and .the code for the book, which could be the LC card number.
After all articles are coded it is necessary to sort the cards by the code numbers for the items cited. When this sorting is completed there will be a group of cards for each article cited. It will then be necessary to sort these by the code numbers for the citing articles. This completes the coding and sort ing. The next step would be preparation for the printer.
From this brief but detailed description it will be apparent that although a great volume of material is covered relatively unskilled people could perform the coding and filing. Professional supervision would be required since certain problems are not always easy to handle. For example, in a bibliography the term ibid may be used or locus citatus . These have to be carefully inter preted sometimes. Then there are many publications which use the footnote technique, making coding somewhat more cumbersome. By designing a different code it would be possible to handle coding more simply and even arrange for coding without access to the original articles in. many instances, but that is more properly the subject of another paper. The code described above is merely by way of example, and its use would ‘be justified more from the point of view of making future coding simple. The logical conclusion here would be that in the future every author would be required to include the serial number of every item he referred to, so as to facilitate, not only the compilation of citation indexes, but also other operations such as requests for reprints.(15)
In a certain sense a Citation Index is not greatly different than a compendium like Beilstein . We find a rather complete history of a compound here, but certainly the’ methods of ompilation are rather similar. In the case of Beilstein the editing work is of a much higher order. Undoubtedly, a Citation Index for the literature of chemistry would make the preparation of such works as Beilstein much easier. If one takes the attitude that any bibliographical tool is just a starting point in literature research operation then it is possible to consider the main merits of new approaches, without being disappointed that not all our problems are solved.
Index Sample Based on article by Hans Selye.
“The General Adaptation Syndrome” J. Clinical Endocrinology , 6:117—230 (1946)
The code number for this journal in the World List is 11,123a.
The article number will be arbitrarily taken as 687.
The code number for the article ‘will be 11123a-687.
The list of citing articles follows:
1. Williams, R. H. Thyroid & Adrenal Interrelations, 7:52-57(1947)
2. Venning, E. H. Glycogenic Corticoids, 7:79-101(1947)
3. Forbes et al. 17-Ketosteroids in Trauma and Disease, 7:264-88(1947)
4. Talbot et a].. Excretion of 11-Oxycorticosteroids, 7:33l-50(l947)
5. Castillo, E. B. del, et a].. Syndrome of Rudimentary Ovaries, 7:385-422(1947)
6. Forsham, P. H. et a].. Pituitary Adrenocorticotropin, 8:l5-66(1948)
7. Pincus, G. et al. Rhythm in Bipid Excretion, 8:221-6(1948)
8. LeCompte, P. M., Width of Adrenal Cortex in Lymphatic Leukemia, 9:158-62(1949)
9. Wolfson, W. Q., l7-Ketosteroids in Gout, 9:497-513(1949)
10. Stein, H. J. et al. Hormonal Response to Heat and Cold, 9:529-547(1949)
11. Davis, M. E. Eosinophils in Pregnancy and Labor, 9:714-24(1949)
12. Conn, J. V., Na and Cl of Sweat as cortical index, 10:12—23(1950)
13. Recant, L. et a].. Effect of Epinephrine on Eosinophils, 10:187-229(1950)
14. McArthur, J. V. et a].. Urinary excretion of corticosteroids in Diabetic
15. Bors, E., Fertility in Paraplegic Males, 10:381-398(1950)
16. Grossman, S. et a].. Idiopathic Lactation following Thoracoplasty, 10:729-734
17. Cooper, J. S. et a].. Metabolic consequences of Spinal Cord Injury,
18. Hiaco, D. Adrenal Metabolites in Bronchial Asthma, 10:1570-8(1950)
19. Jailer, J. V., Pituitary-adrenal system in Infants, 11:186-192(1951)
20. Deane, H. W., The Adrenals in Experimental Hypertension, 11:193-208(1951) -
21. Hiaco, D. et a].. Epinephrine and ACTH in Bronchial Asthma, 11:395-407(1951)
22. Scuaffenberg, C. A. et a].. p-Hydroxypropiophenone (PHP) and other so-called
pituitary inhibitors, 11:1215-1223(1951)
23. Talbot, N. B. et a].. Urinary Water-soluble Corticosteroida, 11:1223-1236
The Citation Index entry for this item might look as follows:
R - Review
A - Abstract
O - Original Article
464-9789(R) A — Abstract
1. a. back to text Shepard’ s Citations : A detailed presentation of the scope and functions of Shepard’ $ Citation books, with illustrative examples and an analyses of their relation to other methods of legal research. (cover title: “How to use Shepard’s Citations”).
Shepard's Citations, Inc., Colorado Springs, 1954. 33p.
b. back to text Problems, Questions and Answers in the use of Shepard’ s Citations . (To be used with the 1873-1954 edition of “How to use Shepard’s Citations!) Shepard’s Citations, Inc., Colorado Springs, 1954. 15 p.
2. back to text Lehman, Harvey C.; “Man’s creative production rate at different ages and in different countries”, Scientific Monthly , 78, 321(1954).
3. back to text Dennis, L, “Bibliographies of Eminent Scientists”, Scientific Monthly 79, 180-3(1954).
4. back to text Gross, P. L. K. and Gross, E. M., “College libraries and chemical education", Science 66, 385-9(1927).
5. back to text Brodman, E., “Choosing physiology journals”, Medical Library Assoc. Bull.
6. back to text Fussier, H. H., “The characteristics of the research literature used by chemists and physicists in the United States”. Library Quarterly 19, 19-35, 119-43(1949).
7. back to text Bitner, A. and Price, M. 0., Personal Communication, April (1954).
8. back to text Zworykin, V. K. and Flory, L. E., “An electronic reading aid for the blind”. Proc. Amer. Philosophical Soc . 91, 139-42(1947).
9. back to text Shaw, R. R., “Machines and the bibliographical problems of the twentieth century”, p. 19, March 1951. Reprinted from "Bibliography in an age of science. ” U. of Illinois, Urbana, 1951.
10. back to text Busa, R., “Mechanisierung der philologische Analyse”. Nachrichten fur Dokumentation 3, 14-19(1952).
11. back to text Andrew, A. M., “Information Theory”, Electronic Engineering , Nov. 1953.
12. back to text Garfield, K., “The Review literature as a source of critical entries for scientific indexes”. Unpublished paper. December 1952.
13. back to text Selye, H., “The General Adaptation Syndrome”. J • Clinical Endocrinology
14. back to text Murray, M. R. and- Kopech, G., “ Bibliography of the Research in Tissue Culture 1884-1950”, New York, Academic Press, 1953.
15. back to text Garfield, E., “Unique identification tags for literature citations” Communication to be published in Science , December 1954.
16. back to text Adair, W. C., “Citation Indexes for Scientific Literature?”, To be published in January 1955 issue of American Documentation.