Library News: The Evolution of Search

A look at the History, Vision, Innovators, and Future of Information Accessibility

1. Foundations

A. The Beginning - In the Pre-WWII era, information sharing was in its relative infancy compared to today. Without the help of more modern electronics, we had reached the upper limit in efficiency of how and where information was stored and shared. The organization and cataloging of information within libraries and archives had been well perfected, but the retrieval and dissemination of that information was being hindered by technology.

B. The Vision - In the burgeoning world of scientific advancement that characterized the United States during and after WWII, astute observers like Vannaver Bush began to realize the need for a better system of information sharing. In an article published in The Atlantic Monthly, Bush observed that:

“The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present day interests, but rather that publication has been extended far beyond our present ability to make real use of the record.”

Vannaver Bush also noted that an additional obstacle to overcome was the inefficiency of current data organization. He posited that the most efficient way to share information would be to structure it in a way similar to how our brains process data, via association.

C. The Game Changer - As the field of computer science advanced in the decades after Vannaver Bush’s insights, his notion of a better way to retrieve and organize information
began to come to fruition. The earliest pioneer of this new type of associative data retrieval was Gerard Salton, A Professor of Computer Science at Cornell University. Salton developed the first information retrieval system called SMART (System for the Manipulation and Retrieval of Text).
Salton and others continued to expand the knowledge base of information retrieval systems through the 70’s and 80’s, developing the theoretical groundwork for more sophisticated information retrieval systems and ultimately Search Engines.

2. Connecting the Dots

A. The Beginning - As computers began to expand human’s ability to store and analyze data, they did little to help us share data, collaborate, or communicate more effectively.

B. The Vision - During the same time that those in the field of computer science began to see a need for better information retrieval systems, others saw the importance of
being able to share and collaborate on information across distances and between computers. J.C.R. Licklider of Bolt, Beranek and Newman was the first to conceptualize the idea of a worldwide computer network, in his memos discussing the “Intergalactic Computer Network.”

C. The Game Changer - The first of these widespread networks was developed by the United States Department of Defense under the acronym ARPANET (Advanced Research Projects Agency Network). ARPANET initially consisted of major research universities in the United States. During the 1970’s and early 80’s, the network expanded and branched out, eventually evolving and forming the first basis of what would become today’s Internet.

3. The Early Internet

A. The Beginning - The early Internet looked nothing like it does today. In order to share files or information, it was necessary to use FTP (file transfer protocol), so anyone wanting to make their data available and open on the Internet served their information via a public anonymous FTP. The problem was, there was no way to know what was available unless you knew where to look.

B. The Vision - As the number of computers connecting to the Internet continued to expand during the early 1990’s, it was becoming apparent to many that there was a disconnect between the amount of information available, and one’s ability to access that information. A tool was needed to make the Internet more accessible to users.

C. The Game Changer - This tool became first real search engine, dubbed “ARCHIE” and was developed by Alan Emtage, a student at McGill University in Montreal. ARCHIE downloaded directory listing of the files available on various public FTP sites and created a searchable database, making it possible to search the Internet for the first time. A year later,another student, Mark McCahill of the University of Minnesota, took this idea one step further and created the tool “Gopher” which was actually able to index(download) plain text documents. Later, two new tools named Veronica and Jughead made it possible to search the plain text documents indexed by gopher, allowing for the first time the ability to quickly find contextual information on the Internet.

4. Bring it to the Masses

A. The Beginning - The network of computers that made up the early Internet was only accessible to a small portion of technological elite, based mostly at universities and research facilities. Although home computer ownership rates were soaring, and the technology was there, the early Internet was almost completely inaccessible to the average user.

B. The Vision - Tim Berners Lee is widely known as the father of the World Wide Web. During the 1980’s Berners Lee was working at CERN, the European Organization for Nuclear Research, and at the time, the largest Internet Node in Europe. During his time at CERN, he saw a great need for a better, more efficient way for researchers to share and collaborate on research. In his words:

“I just had to take the hypertext idea and connect it to the TCP and DNS ideas and - ta-da!- the World Wide Web”

C. The Game Changer - With the goal of creating a tool that would make the Internet a more creative and accessible place, Berners Lee developed the world’s first web browser
and editor, improving upon the clunky and difficult FTP based model by using hypertext (links) to allow users to point-and-click their way across the Internet. On August 1st, 1991, the first website went online; http://info.cern.ch/, a guide to what Berners Lee dubbed “The World Wide Web.”

5. A Glut of Information

A. The Beginning - Before the web became the multi-billion page monstrosity that it is today, there was originally disagreement on the best ways to organize all available web sites and information. During these early years, directories and search engines battled for supremacy of web search. Tim Berners Lee, the father of the World Wide Web was also responsible for the first web directory called the WWW Virtual Library. Other popular web directories that popped up during the early years of the Internet included the EINet Galaxy web directory in 1994, and the Yahoo! directory, also in 1994.

B. The Vision - As the number of pages on the World Wide Web continued to expand, shrewd observers realized that while directories were great, it was quickly becoming an impossible task to manually organize all of the information available online. It was clear a better solution was needed. In response to the need for a more thorough way to catalog or index the growing Internet, the first tools, known as “Spiders,” were created to navigate or “Crawl” around the Internet to find and index new websites and web pages.

C. The Game Changer - The first web crawler, or spiders, was developed in 1993 by Matthew Gray. It was known as the “World Wide Web Wanderer”. The World Wide Web Wanderer and other primitive spiders gathered a great deal of information available on the early World Wide Web, for the first time making the full breadth and depth of the Internet accessible, and setting the groundwork for modern search engines.

6. Search Engines Get Smarter

A. The Beginning - As web crawling spiders continued to gain intelligence and proficiency at navigating and indexing the Internet, new tools were needed for organizing the data that was being accumulated. Early search engines were able to help users find some of what they were looking for, but were incredibly simplistic. The search algorithms that were used to deliver search results from the data crawled by the spiders simply matched the search terms being searched for, and returned the result to the user in the order that they had originally been crawled. It was clear there was a better way.

B. The Vision - In February 1993 six Stanford University students began working on a new generation of search engine with a new solution for dealing with the increasingly large amounts of data of the growing Internet. Their idea was to apply statistical analysis of word relationships to make searching more accurate and efficient. This idea eventually became the Excite search engine. These initial steps by Excite and others spurred a competitive revolution and over the next few years many new competitors entered the search engine game.

C. The Game Changer - On July 20th, 1994 the Lycos search engine went public, introducing to the world the next great advancement in search engine technology. Lycos was the first search engine that not only fully indexed the content on web sites but also ranked them according to more sophisticated word relationship algorithms. Lycos was also a leader in the extent of its index. It quickly became the search engine with the largest index, with more than 60 million documents by November 1996.

7. Spammers and Manipulators

A. The Beginning - During the mid and late 90’s there was a virtual free-for-all of new search engines coming to market, each with slightly different offerings, but none that stood out significantly more than others in terms of their technology or ability to serve relevant results. It was during this time that spammers and scammers began to realize the profit potentials of manipulating the search engines for their own benefit.

B. The Vision - Most search engines during this period used algorithms that relied heavily on the content of the pages it crawled to determine what the pages were about, and how they should be ranked in the search results. This led to manipulative tactics by those looking to improve their rankings in the search engines. Webmasters would simply add keywords to the titles, content, and meta tags of their sites and trick the search engines into ranking their site for highly trafficked keywords.

C. The Game Changer - While universally panned as a problem, spammers and manipulators played a vital role in helping to encourage search engines to become more sophisticated. The techniques used to manipulate the search engines became a rubric for solving problems and led directly to a fairer, more efficient, and ultimately more successful generation of search engines, the biggest of which today has become synonymous with search itself.

8. Google’s Brilliance

A. The Beginning - Search engines before Google focused on using the content of the pages they indexed as the best way to determine what the page was about. As discussed, this method was very prone to manipulation because the variables used in ranking could all be easily manipulated by the webmasters themselves. These inherent problems led to the emergence of an entirely new model for search engines, a model that relied on impartial outside sources to help determine what a site was about, and how it should rank in the search results.

B. The Vision - In January of 1996, Larry Page, a PH.D. student at Stanford University chose his dissertation theme on web-based search engines. His idea was to try to understand the web in terms of its linking structure, or how each page on the Internet linked to other pages on the Internet. For this project, originally named “Backrub,” he partnered with another Stanford PH.D. student named Sergey Brin. Together they began to develop an entirely new search engine technology. Like current search engine technologies, Brin and Page’s new search engine ranked search results based on the content of the pages it indexed, but also took it one step further. Backrub took into consideration the number, quality, and context of the links pointing to each website as it tried to determine where a web site should rank in the search results.

C. The Game Changer - On September 7, 1998 Larry Page and Sergey Brin officially incorporated their new search engine under the now ubiquitous name Google Inc. Over the next few years Google’s popularity began to soar. It became clear that the search results delivered by Google were superior in quality to competitors, leading to the failure of many search engines with inferior technology that had enjoyed success in the earlier days of the Internet.

9. Social Media, Bookmarking

A. The Beginning - Over the past 15 years search engines have become increasingly more proficient at understanding and organizing the information of the Internet. Their ability to deliver relevant results to searchers is nothing short of remarkable. Despite this, many users still craved a more personalized feel to discovering and understanding content on the Internet.

B. The Vision - While the sophistication and ability of computers to organize and serve up relevant data continues to grow exponentially, there may always be a need for human edited, or handpicked content. Our personal preferences may be able to be tracked, mirrored, or simulated by computers, but only human beings can say for sure which content they find the most relevant and/or interesting (we built ChunkIt! to address this problem - to make it as easy a possible for people to decide which search results are truly relevant for them).

Although there is incredible diversity within our species, the need for and popularity of person-to-person information sharing has shown we think alike in ways computers may never understand. This idea is one of the core concepts behind what many consider Web 2.0. This more recent vision for the future of the Internet is concerned with the facilitation of information sharing and collaboration between people in a more intimate way. These new types of interactions should be complimentary to search, assisting people in not only finding what they’re looking for, but in helping them to discover new and interesting content.

C. The Game Changer - In the world of Social Media, there are a few new ideas that have really broken the mold, paving the way for two brand new types of interaction among web users. The first, Digg.com, became one of the first “social voting sites,” allowing users to submit and vote on content. The site grew quickly since its launch in 2004, and currently brings in over 250 million users per year. The skyrocketing of Digg.com and similar sites has proven the great need that exists for users to share content in ways that were formerly unavailable. A second, and equally important idea for the dissemination of human selected content, is the concept of social bookmarking. Delicious.com, the most popular of such sites, allows users to search within content which has been tagged and bookmarked by their peers. No information exists within the Delicious index that hasn’t been chosen, reviewed, or tagged by a member of the Delicious community. These two examples serve as archetypes for how social media is influencing search- and as any Digg or Delicious user may tell you, these sites can serve as invaluable resources for discovering the best of the web.

10. Enhancing the Offering

A. The Paradigm - There is no doubt that the Internet has made huge strides in efficiency, usability, and its ability to organize information. There are now a great number of quality resources and tools to find what you are looking for, share your opinions, and interact with others. While having a large variety of resources available online is a good thing, there is often a disconnect or gap in how resources fit together, interact, and enhance each other.

B. The Vision - This need for cohesion between resources has led to the development of meta-resources, i.e. (tools, websites, or programs that help to enhance the user experience for the user in multiple ways, across websites, or between websites.) Many times these come in the form of new innovative technologies such as RSS, Podcasting, Content Aggregation tools and sites, Comparison Shopping tools and Sites, tools for automating processes, and tools for enhancing the user experience.

C. The Game Changer - In the world of search, one of these tools is changing the way we use search engines all together. By taking an already great resource like Google, Yahoo, or any other search engine and making it more accessible to users, the browser add-on called Chunkit! has enhanced an already excellent resource. ChunkIt! searches within the pages of your engine’s results to find your search terms in context. You can then preview the resulting multicolored “chunks” for relevance without clicking on the actual page. Tools like this, and others, add another layer of usability and accessibility to the plethora of resources already available online.

THE FUTURE OF SEARCH

A. The Beginning - Until now, search has been all about accurately matching what you type as your search query to the content available on the Internet. This idea was the basis for the original search engines, and continues to be the concept we all understand search to be. But search can be so much more. While we may sometimes know what we are looking for, we often don’t have enough information to properly find the information we seek, or perhaps we only begin with a concept, and need to be led to more concrete definitions or ideas. Search as it stand today is robotic and one dimensional, and most certainly still in its infancy.

B. The Vision - If you’re a Star-Trek fan, you’re likely quite familiar with the future of search. The computer on the USS Enterprise is perhaps one of the best known and most easy to relate to examples of the direction search is moving in. Members of the Enterprise ask the computer any question, phrased in any way, and the computer will linguistically understand both the intent of the question, and its main message. Searchers of the future will make queries that result not only in the information they asked for, but also in content that is related in any possible fashion; semantically, conceptually, etc. People will use search as a guide to their understanding in a way that is not even fully conceptualized, but promises to be a mix of advanced artificial intelligence, incredible computational understand of human language, and the integration of huge amounts of human behavioral data that will inform these advanced systems about what is most relevant.

C. The Game Changer - Every major search engine considers the above to be its end goal. Companies like Google and Yahoo are pouring large amounts of monetary and intellectual resources into the technologies that will be the platform for the search of the future. From the research and development being done at the GooglePlex on items like speech recognition and linguistics, it is apparent that we are trending toward ever more sophisticated implementations of what was once a very simple concept. As we move toward the future, those changing the game will be the developers of new technology that harness the vast quantities of information online, and seek to understand that information as fully as possible, coupled with the goal of understanding the way in which we seek that information.

Library News

Thursday, December 18, 2008

The Evolution of Search