Random Thoughts of a Pseudo-Visionary
about the Present and Future of the Internet
Publisher, The Scientist®
Chairman Emeritus, ISI®
3501 Market Street
Philadelphia, PA 19104
National Chemical Information Symposium
July 17, 1996
When Charlie Bragg asked me to speak, he said he wanted a visionary. I once leaped at the opportunity to address such an audience. Today it is harder to live up to the over-inflated image of a visionary. Just because I founded a company like ISI, [which like Topsy, just grew and grew (Topsy is a character in Uncle Tom's Cabin)], doesn't qualify me as a professional visionary like Alvin Tofler of Future Shock.1 Some of my early essays may have nurtured this image as a prognosticator. For example, in 1971, as Chairman of the ASIS National Meeting, the theme I selected was the "Information Conscious Society." In my keynote address2, I mentioned the problems people have using telephone books -- a problem rarely discussed in rarified settings such as this one. I quoted an article by E. Lauritis about legal aid which I have abbreviated:
"My name is Mary Doe, and I am poor. I live in Johnstown, Pennsylvania. My son is in trouble with the law, my landlord said he'd evict me, throw me out. I need a lawyer, but, I don't have the money and can't afford a lawyer.
I remember somebody saying that if you need a lawyer and can't pay, you should go to legal aid -- or maybe it was public defender. So I'm really desperate, see, so I go to the phone book, and I look for legal aid, and there's nothing in the yellow pages and it says "See Attorneys," I look there but I can't see legal aid or public defender, and I'm stuck."
Poor Mary Doe! Maybe if she asks enough people somebody will tell her she's really looking for the Cambria County Office of Legal Aid, listed under C."2
It's depressing to realize that the situation I described 25 years ago is almost identical today.
It is still impossible to locate the Cambria County Legal Aid Society under Legal Aid because all such entries are listed under the county or city name rather than the function of Legal Aid. And you won't find it on the Internet either. If, however, you dial the information operator, you may be referred to the local Legal Aid Society. But don't count on it. If you dial information in Philadelphia (555-1212) you will be told there is a listing for Community Legal Services at 215-981-3700 in Philadelphia. A few miles away, in Montgomery County, there is a Legal Aid Society located in Norristown (610-275-5400).
This unchanged situation bothered me so much that I searched for someone at Bell Atlantic who ought to be interested in these problems. Eventually I did find him and asked if he ever used the Internet. He said yes. Then I asked if he ever heard of the Feedback section, the place where you can send comments. Then I asked where this was located in the phone book. He agreed that they might learn something from the Internet experience.
You can't yet use a web crawler to find your local Legal Aid Society but there is hope. According to the non-profit Legal Services Corporation in Washington, DC, they have a home page in the works. When it is available, you'll be able to zero in on legal aid anywhere in the country. [This government-funded group is at 750 First Street, NE (202/336-8800). This is not to be confused with the local Legal Aid Society of Washington (202/628-1161).]
You all know that the Internet is a mixed-bag producing lots of frustration but also occasional exhilaration. It suffers not only from information overload and redundancy but also lacks real time access. The delay you encounter in accessing most URLs is somewhere between bearable and intolerable. As I'll demonstrate later, unless we have real time access to the URLs identified by crawlers we will suffer from frustration more often than not.
Gopher as KWIC Index
Last year I described at ACS how to search the gopher file of The Scientist®, the newspaper that I've published for 10 years. When we started that file three years ago on NSFnet, it was kind of gee-whiz -- isn't that wonderful? By the time we started our Web site over two years later, the gopher file at AT&T had already left something to be desired. Here for example is a sample search on combinatorial chemistry.
Figure 1 : Gopher Menu -- shows beginning and end of three page printout.
Figure 2 :Gopher Contents Page
Figure 3 :Gopher Search page showing search on "combinatorial chemistry" on top and results of Gopher Search on bottom.
The gopher presentation for a keyword search is reminiscent of the KWIC indexes developed by Herb Ohlman at SDC and Peter Luhn of IBM in the late 1950's. You would think that AT&T could come up with something a little more sophisticated than a KWIC Index display.
In the next series of slides, we have the corresponding displays for the Web site.
Figure 4 :Web Contents Figure 5 : Listing of "stories in this issue" for June 24, 1996. Figure 6 : Expanded contents page for June 24, 1996 including short abstracts. Figure 7 :Searches on Web Site on "combinatorial chemistry" (top) and "molecular modeling" (bottom).
Three years ago, I was glad to be able to do a search at all; now even this more attractive looking display [provided though the University of Arizona's Glimpse search engine at Penn] still leaves room for improvement.
Search engines are unforgiving. Once you create a file, special programming is needed to incorporate information in the display like the date of issue, without re-indexing every document.
To illustrate how you proceed from the first stage of a search to the next, I've done a search on "technology transfer."
Figure 8 : Search done on phrase "technology transfer. "You then clicked on second choice: "Using the Internet for Technology Transfer."
In the next slide, I've called up the second hit on "Using the Internet for Technology Transfer."
Figure 9 : Article by Lee Katterman which shows 13 hot-linked connections.
You chose the first one "Indiana University Technology Transfer Office."
Then after viewing this list, I've clicked on the entry for Indiana University's Technology Transfer Office.
Figure 10 : Home page of Indiana University's Technology Transfer Office.
When we agreed to put The Scientist on the NSFnet, we were frequently asked whether it wasn't dangerous to its survival as a subscription service. There is nothing in our experience that would confirm that expectation. As a strictly ASCII file, I was confident that few subscribers to the print edition would give up the pleasure and value of the printed edition. Three years later I still cannot really tell you yet how many people "read" The Scientist in electronic form. We receive statistics each month on the number of times searches are done or how often FTP's occur. But one FTP could be serving the needs of every scientist in New Zealand. And when I've asked people who FTP files if they have any idea how many people read The Scientist in its electronic version, they really can't say. Everything is expressed in terms of searches and hits.
I had hoped that it would be possible to use the Internet in a pro-active way so as to improve the dissemination of The Scientist's articles to all relevant readers of listserves, bulletin boards, etc. So far it has proven to be too time consuming to attempt to identify all relevant bulletin boards for each new story. And it would be too time consuming to correspond with the individual bulletin board monitors or editors. The rules of etiquette on the original internet prohibited us from trying to send out a wide swath broadcast of the contents page of each issue but even the now commercialized Internet has not changed the fact that unsolicited direct mail of any kind offends a lot of scientists, particularly as information overload has increased.
Of course, individual readers, knowing of an article that would interest colleagues, can go up on our web or gopher sites to transmit that article. But we have no way of knowing how often that occurs. We can only estimate.
Having The Scientist at my fingertips has saved me considerable time in responding to a variety of requests for information. Students worldwide seem to think we are authorities on any subject which has been covered in a Scientist news story. Recently, someone asked me to advise him how to find a school to obtain an MBA as part of his plan to switch from microbiology to management, simply because we had published an article last year on cross-training in career development.
A recent experience with the staff at The Scientist illustrates the cultural change that is involved in going from traditional indexes to the new electronic access. We have produced manual printed indexes to The Scientist since its inception. These indexes cover 1986 to 1995. It is no longer urgent for us to do this index on a fully current basis because we can access The Scientist on our web site. When we planned this year's budget we excluded the cost of manual indexing. Then in May someone asked when the index would be updated. The advantages of the printed index were cited. This is illustrated in the next slide.
Figure 11 : T/S SUBJECT INDEX - HIGHLIGHT "technology transfer."
But when I asked how long it took to access the original stories they seemed perplexed. I pointed out that you had to take into account the time required to locate the article in the library. Long before you've found the issue in the library, I can pull the article up on the screen.
I mentioned earlier the problems of selectively disseminating the content of The Scientist in the present Internet environment. These problems will be partially solved when web crawlers like Alta Vista can up-date searches without repeating what you've learned in last week's search. Later on I will demonstrate an Alta Vista search but for the moment I thought it would be of possible interest to make a comparison with a DialIndex search on Dialog. Continuing with "technology transfer" here is the result of that search.
Figure 12 : DialIndex search on "technology transfer".
While you can search thousands of URL's via a crawler, you can also search hundreds of databases using DialIndex. In a recent talk to NFAIS, Roger Summit pointed out that Alta Vista had 23 million records while Dialog gave you access to 334 million, half in science.3
So if the editors of Internet bulletin boards are doing their jobs, they can also use one or more crawlers to find out what articles have appeared in the latest Scientist that concern the interests of their invisible college. Of course, this means they have to create term profiles that will anticipate the varieties of natural and scientific language implied by their specialty topic. In the meantime, they'll continue to use Current Contents, Chemical Abstracts, and other current awareness services to find current scientific literature.
Electronic vs. Printed Outputs
We have been hearing for quite some time how the Internet is going to displace the printed word. It used to be called the paperless society. But we are still consuming enormous quantities of paper to print our e-mail and Internet output. I for one can't sit for hours reading a screen. Even if I could, the portability of printouts is necessary for my mobile existence. I frequently see people in planes and trains using portable PC's, but somehow they never seem to be reading. We have a long way to go before the printed page is displaced. Are you ready to replace reading your morning paper by using the New York Times Online? Some surveys seem to indicate this trend may be occurring among the younger generation.
Does the same reasoning apply to reading journals?
Considerable progress has been made in providing access to printed journals in electronic form. I have tested out the Journal of Biological Chemistry and similar publications. Using the Adobe Acrobat software, the quality of the printouts is excellent. The search engine is remarkable. I would certainly be willing to use these electronic files to access individual articles for which I need immediate access. But I am not yet comfortable with the idea of giving up current subscriptions to all the journals to which I subscribe. That could change gradually. I'm fully prepared to accept electronic storage just to locate single articles. But browsing journals involves much more than meets the eye. And perhaps the JBC like other monstrous journals needs to think about twigging so that each member gets 5,000 instead of 25,000 pages per year.
I am also ready to compromise on how I publish. My own experiences with writing for journals reminded me how frustrating it is to wait over one year to get an article published. The original manuscript could have been disseminated in one day over the Internet. But unlike the physics community, neither the information science nor chemical communities have yet adopted the pre-print culture -- nor have the scientific communities represented by Science and Nature and other leading journals.
Future of SDI SystemsFigure 13 : ISI Research Alert report -- use a report from the end of June.
I continue to be an avid user of ISI's SDI system. The ISI Research Alert® (formerly ASCA) has now been around for over 30 years4. It was the first commercially available SDI system. But for over 20 years it was essentially a financial flop since it could not survive without the SCI® database. Scientifically it was an enormous success and continues to serve the needs of many types of applied scientists. But for basic researchers it still has to be combined with scanning of journal contents pages. In fact, the success of Current Contents® was, and still is, primarily due to the inability of most researchers to define their reading needs precisely. If they knew exactly what they needed, would research be necessary?
The quantity of information published today is much greater than it was 40 years ago when CC® started. Yet I can honestly say that 50% of what I read would not turn up in any ordinary keyword search. But that might change as the full-text revolution matures. I might add that Research Alert can now be delivered over the Internet, thereby making it more timely and also susceptible to further electronic manipulation. I myself continue to use the printed version.
Cited Reference Searching
When I was more active in research, I literally scanned the contents of seven editions of Current Contents. As time passed, I increasingly relied upon the unique ability of cited reference searching to support my reading needs. Cited reference searching is simply a variant of hypersearching. It will be an integral part of the fully electronic journal world of the future. For the moment, however, it is not practical to use the Internet to find all articles which have cited your work, or some specific paper, book, or author -- at least not without considerable effort. JBC Online, mentioned earlier, has the ability to hyperlink into cited and citing references articles in its own file. But the only way a journal like JBC can include a complete citation search for each published article is to incorporate the relevant citation links from the Science Citation Index® or from other electronic journals as is happening between PNAS and JBC.
A group at the University of Southampton in the UK is also developing what it calls linking databases.5 Making journals and the SCI®transparent to each other in the near future is an increasingly realistic expectation. ISI® is developing Intranet SCI® capability which has been demonstrated and will eventually cover the entire file from 1945 to the present and they have already announced the use of the Internet for Current Contents® and Research Alert.
Figure 14 : Alta Vista search on "Eugene Garfield" using the advanced query.
Using today's Internet crawlers to conduct a cited reference search is rather frustrating and time consuming. If you search for my name on Alta Vista, using advanced query mode, the first 11 entries retrieved are quite relevant. But there appears to be a Garfield Street in Eugene, Oregon, which Alta Vista couldn't differentiate. Figure 15 : Index of Persons
Nor could it recognize that the entries for Eugene and Garfield Cantrell are not relevant. This adjacency problem can be solved but what happens when we start to look for all those references simply cited as E. Garfield? Once again, let me stress the importance of display or visualization.
With Alta Vista and other crawlers, you must hyperlink to each URL in turn to find out why you were directed to that URL. In a typical SCI citation search using CD-ROM, here is what you see initially for a cited reference search.
Figure 16 : SCI Search on Eugene Garfield from 96 Jan-May disc.
In the next slide, you see the expanded result. A list of article titles and/or abstracts is the next step. These are printed one at a time or printed as a group.
Figure 17 : SCI search results.
In the next slide (#19), I've printed one full citing record.
Figure 18 : Sample long record which cites you.
In the near future you will be able to do these SCI searches via the ISI Intranet.
And it will also be possible via the Current Contents/ISI Electronic Library to access the electronic version of life sciences journals. That project has been widely announced and briefly described as providing "users with immediate desktop access to the tables-of-contents, bibliographic data, and abstracts of the approximately 1,350 prestigious journals indexed in Current Contents/Life Sciences, as well as to the full images of those journals in the system for which publishers have given permission. ISI has partnered with a number of players in order to expedite this initiative.
Slides on Electronic Library Figures 20 & 20a:
Scientific Reviewing as Profession
I think that this conference symbolizes the fulfillment of what was one of the earliest visionary themes of chemical information pioneers. We often talked about the future shift from a wet laboratory to the dry laboratory. Doing dry research in the library was not always considered a reasonable surrogate to lab research. The wholesale adoption of the Internet culture seems to be the culmination of that metaphor. Without realizing it, people who are using the Internet almost obsessively are learning to use information in creative pursuits. The lure of the Internet is infectious but it remains to be seem how long it will last. If we can solve the problem of information overload by designing easy to use search engines and provide enough wideband circuits, we can expect that not only scientists and scholars will enjoy the fruits of this information revolution, but that it will empower ordinary people in their dealings with doctors, lawyers, and others who are defined as professionals simply because they command a knowledge of the information resources. In the April 29th issue of The Scientist there was a story on the so-called Rocket Man, Jeff Baxter, who has played guitar for the Doobie Brothers and Steely Dan6:
Figure 21 : Notebook piece regarding Jeff Baxter.
"Jeff Baxter has played guitar for the Doobie Brothers and Steely Dan, is a two-time Grammy Award winner, and continues to produce albums and write music for commercials, films, and television shows. But his new gig is considerably more highbrow: serving on a congressional advisory panel on missile-defense technology. The assignment, from Rep. Curt Weldon (R-Pa.), created a bit of a stir. While the task may seem like heady stuff for a rock-and-roller, Baxter actually has a bit of a background as an amateur scientist. His review of defense publications on new technology that could be used in musical instruments and recording equipment led to a concern for missile-technology issues. He wrote an unpublished paper on the AEGIS naval missile defense system six years ago which caught the eye of Rep. Dana Rohrabacher (R-Calif.), whom Baxter now advises on science and technology issues. "[Weldon] met him when he was holding field hearings in California with Rohrabacker. He was amazed at how knowledgeable Baxter was. He knows missile defense technology inside and out. The 47-year-old college dropout has also written op-ed pieces about NATO (Los Angeles Times, Jan. 9, 1994, page M-5; Washington Times, Jan. 10, 1994, page A-23). His reply to critics who demean his credentials: "They're more than welcome to say what they want, but when I talk to people in the Navy and in the aerospace industry, they want to know if you can walk it as well as you talk it. And I've gotten very favorable comments... There's a great deal of information available out there. If you do your homework, you can avail yourself of a tremendous amount of knowledge.'"
This is related to another theme that I have often discussed, that of scientific reviewing. As you demonstrate daily in your own work, using the literature and databases in a constructive way creates new connections and new syntheses. The SCI actually evolved out of my early interest in the use of review literature as an auto-indexing system. Since that time I have promoted the theme of the professional science critic7 and the professional science reviewer.8 It is further expressed in my involvement with Annual Reviews, the publisher of scholarly scientific review articles. ISI and Annual Reviews jointly sponsor the National Academy of Sciences annual award for scientific reviewing. This past April, Dr. Jeffrey S. Banks received the fifteenth annual award for his "influential reviews of work on the theory of games of incomplete formation, theory of automata, and the theory of repeated play games as they apply to political relationships, as well as for his extensive editorial work." All fifteen awardees are listed in the next slide (see following Table).
Recipients of the NAS Award for Excellence in Scientific Reviewing Year Winner Discipline 1995 Jeffrey S. Banks Economics/Politcal Science 1994 Thomas M. Jessell Developmental Neurobiology/Biology 1993 Janet T. Spence Pschology 1992 Robert T. Watson Atmospheric Chemistry 1991 Alexander N. Glazer Biochemistry/Molecular Biology 1990 James N. Spuhler Cultural and Biological Anthropology 1989 Sidney Colman Theoretical Physics 1988 Eric K. Kandel Life Sciences 1987 Gardner Lindzey Psychology 1986 Virginia Trimble Astronomy/Astrophysics 1985 Ira Herskowitz Biochemistry/Biophysics 1984 Ernest R. Hilgard Psychology 1983 Michael E. Fisher Physics 1982 Victor A McKusick Human Genetics 1981 John S. Chipman Economics 1980 Conyers Herring Solid State Physics 1979 G. Alan Robison Biochemistry List of 15 NAS Award winners for scientific reviewing.
Shortly before I sold my interest in ISI, I had launched the ISI Atlas of Science® to promote the idea of using the Science Citation Index® database to fulfill the dreams of people like Henry G. Small and Derek Price. Henry visualized a future version of a Citation Index that would include the ability to do citation context analysis.9 In this particular version of the SCI, the display of citing papers would not be limited to citing titles and authors but would also include citing sentences or paragraphs. In addition, one would need to display this information in an organized readable fashion as is done by the reviewer, but also supplemented with visual mappings of the interrelated links. Visualization has become a new hot area of information science thanks to people like Ed Tufte10 and Jock Mackinlay, Ramana Rao, and Stuart Card at Xerox.
Figure 23 : Slide from Xerox paper.
They are doing some interesting work at Xerox in Palo Alto.11 In the next slide you can see how citation networks are visualized in their butterfly model of the citing and cited pathways to a key paper. Unfortunately, I can't fully demonstrate how this three dimensional portrayal works. It moves around in 3-D like you move molecular models.
It is difficult to predict whether our poor coal miner's wife will be using the Internet in a few years. But undoubtedly she'll be able to locate the Legal Aid Society in Cambria County, if one still exists since Congress has just taken a hatchet to such luxuries for the poor. If we return in a few years, I have little doubt that searching will be easier, index displays will be more friendly and hypersearching within Intranets, at least, will provide real-time access.
Whether the Internet survives the next stage of its exponential growth has been seriously questioned. Therefore, the separation of the scientific portion of the URL universe from other public information it presently contains may be essential to its technical survival. In 1963, JohnW. Senders estimated the information content of the world's libraries.12 He also spoke of the impact of the exponential growth of the indexes needed to search these files. It will be interesting to see how long it takes for the Internet to reach exponential saturation. Huge investments must be made worldwide to provide real-time access to every user.
- back to text Tofler, Alvin. Future Shock, New York: Random House, 505 pgs., 1970.
- back to text Lauritis, E. The Legal Aid Briefcase 28(6): 205 (May 1970).
- back to text Summit, Roger. "The New Information Paradigm: Threat or Opportunity (Or Both)?" NFAIS Newsletter 38(4/5) 57-69 (April/May 1996).
- back to text Garfield, Eugene. "Introducing ASCA IV, An SDI System with Exclusive Features," Current Contents, May 2, 1969. Reprinted in Essays of an Information Scientist, Volume 1, pg 38. pdf available
- back to text Carr L, De Roure D, Hall W, and Hill G. "The Distributed Service: A Tool for Publishers, Authors and Readers," Proceedings of The Web Revolution: Fourth International World Wide Web Conference, 1995.
- back to text Anonymous. "Rocket Man," The Scientist (Notebook Section), 10(9) 30 (April 29, 1996).
- back to text Garfield, E. "From Information Scientist to Science Critic," Current Contents No. 36, pgs 3-7, September 4, 1989. Reprinted in Essays of an Information Scientist, Volume 12, pgs 251-255. pdf available
- back to text Garfield, E. "Proposal for a New Profession: Scientific Reviewer," Current Contents No. 14, pgs 5-8, April 4, 1977. Reprinted in Essays of an Information Scientist, Volume 3, pgs 84-87. pdf available
- back to text Small, H. G. "Cited Documents as Concept Symbols," Social Studies of Science 8:327-40 (1978).
- back to text Tufte, ER. Envisioning Information. Cheshire, CT: Graphic Press, 128 pages, 1990.
- back to textMackinlay JD., Rao R, and Card SK. "An Organic User Interface for Searching Citation Links," In Human Factors in Computing System Proceedings, Annual Conference Series, ACM SIGCHI, pp 67-73 (1995).
- back to text Senders, JW. "Information Storage Requirements for the Contents of the World's Libraries," Science 141: 1067 (1963).