Wednesday, July 18, 2007

On Google's Monetization of Libraries

By Rory Litwin

Google's announcement Monday (1) of plans to digitize millions of books in
the collections of the University of Michigan, Harvard, Stanford, NYPL and
Oxford and to make them accessible through that ultra-simple search box is
causing a new outbreak of Google-fever, for which the cure is to remember
some of the principles of librarianship.

Already, Lynn Neary on NPR's "Talk of the Nation" (Dec. 15th) has framed
any potential criticism of this development as "sentimental" attachment to
brick-and-mortar libraries, but it is not sentimentality that sees the
dark side of this development. (2) It is a rational concern for the
preservation of a number of the attributes of libraries that give them
their inestimable value in a society that aspires to democracy and to the
full development of human potential. Google's back-room deal with these
universities (which was not worked out in cooperation with the library
community though it has implications for libraries as an institution)
carries with it a host of problems about which librarians should think
carefully before cheering for this corporate giant in its grand plan to
assimilate the world's cultural heritage.

Monetization

Google co-founder Larry Page is cited in an article that appeared in
Tuesday's Information World Review as being a "firm believer in academic
libraries being able to 'monetise' the information they hold." (3) Paul
Courant, provost at the University of Michigan, is quoted in the Chronicle
of Higher Education as saying the project is worth "hundreds of millions"
of dollars to his University alone. (4) Google obviously considers that
kind of money to be a good investment, which means they expect many
hundreds of millions in revenue from these collections, through
advertising in the near term and probably other means in the longer term.
Already the Google Print (TM) service, of which this deal is to be a part,
provides links to booksellers as well as to libraries. Though they have
not announced plans to offer the full text of copyrighted materials on a
pay-per-view basis, with fees turned over to copyright owners, it is a
technical possibility with the natural force of an economic vacuum in the
corporate context. Logically it would seem to be only a matter of time
before this mode of access becomes a reality, providing a channel for
bypassing both public-interest information policies and the librarian's
professional service. The fact that Google is putting libraries "on page
A1 above the fold" (in the front page NY Times article), as Barbara Fister
put it in an email to COLLIB-L, is not a victory for libraries if the real
meaning of this development is simply the transfer of all of this
information out of our humanistic institution and into the marketplace.
The weighty, loveable, historic notion of "The Library" will doubtless be
prominent in Google's marketing of its great reservoir of text, but we
would be fooled to think it means that the values indicated by that word
(equity of access, collective ownership, privacy, organization,
bibliography, and librarianship as a profession) were somehow in play in
Google's collection. (Note how Google is already attempting to pander to
"sentimental" librarians: "Even before we started Google, we dreamed of
making the incredible breadth of information that librarians so lovingly
organize searchable online," Larry Page is quoted as saying in Google's
Dec. 14 press release. By implication, our "lovingness" only needs their
technology to be made useful, and our "loving organization" of those works
is ultimately unneeded.)

To spell out the obvious, what this development means is the
commercialization of the greatest research libraries in the world with a
handshake, suddenly and epochally (and not because of technological
inevitability - there are other ways that the digitization of these
collections could be handled). The commercialization of libraries has
implications both for the institution's democratic character and for the
quality of people's research. As Mark Rosenzweig wrote in an email
message to multiple lists on Wednesday,

"There is something mind-boggling about the ability of a single,
for-profit company being able to shape the future of a whole sphere
of life. Even more so when it enlists the cooperation of the public
stewards of that sphere in what amounts to a relinquishment of key
elements of responsibility to a unabashedly profit-driven
mega-corporation."

I want to examine a more closely the implications of the Googlization of
research libraries, with just the beginnings of the needed attention to
the loss of privacy, the introduction of commercial bias, questions about
democratization and equity of access, the issues of disintermediation, the
decontextualization of knowledge, and the closing of the information
commons.

Privacy

The privacy of library users in their reading choices has long been held
sacred in the library world. (5) In this world, the privacy of individual
citizens is understood as a precondition for their autonomous development
and their freedom of thought. This is in contrast to the corporate world,
where information about individuals as consumers - demographic
information, interests, identities, choices - is a commodity that is
bought and sold for the purpose of gaining an advantage in the great game
of selling you more stuff (6). Individuals - treated as citizens by
libraries and as consumers by the corporate world - have their privacy at
stake in Google's conquest of the information commons. (7) As Peter
McDonald pointed out in an email to the Progressive Librarians Guild and
Social Responsibilities Round Table listservs on Tuesday, Google collects
a shocking amount of personal information as it tracks users' searches
over time (see Google-watch.org for details (8)). This personal
information can be correlated with individual identities with the
cooperation of ISP's or with commercial sites that share data. At
present, this identifying information isn't shared with Google, but the
potential and the motive are both there, and the public mood is complacent
compliant. Additionally, if Google itself decides to enter the business
of selling access to these works, it will have direct access to users'
identifying information which it would undoubtedly connect to collected
information on search patterns. While libraries and library vendors do a
certain amount of usage-tracking for statistical purposes themselves, the
strong privacy ethic in libraries militates against the misuse of this
information. For example, most public libraries have adopted a policy of
destroying personal information once it is no longer absolutely needed,
making it unavailable to intelligence agencies whose ability to demand it
has been bolstered by the USA PATRIOT Act. (9) If people are using Google
to search or access these millions of works, they may naturally expect
their privacy as readers and citizens to be respected just as it would be
in any library, when in actuality they are being treated as consumers and
data sources for the purpose of marketing and with the possibility of
political repression. When the ultimate of aim of the disposition of
these works shifts from that of enlightenment to that of making money,
privacy is one major value that is lost. The value of our privacy is not
a matter of mere "sentimentality" but is ultimately a protection of our
freedom.

The bias introduced by commercialism

Some say, "What's wrong with advertisements? The business of America is
business, and companies have a right to promote their products. How else
would we find out about them?" We certainly agree, as a society, that
there is a large (apparently ever growing) place for advertisements in our
lives. But the field of research, scholarship and education has mostly
been off-limits to commercialism, for a simple reason. The aim of
research, scholarship and education is truth, and people sense correctly
that commercial interests have the potential to distort the discovery and
the spread of truth. To a large extent they already do, by funding
"friendly" researchers, suppressing research they don't like (10), by
directly spreading disinformation via the public relations industry (11),
by influencing journalism with advertising dollars (12), and by
influencing people directly with dishonest advertising. But however
compromised it may be, in the world of scholarship and education there is
a genuine culture of intellectual honesty that stems from the communal
project of seeking and spreading truth for the common good. You do not
see advertisements for particular historic works of literature in research
libraries, or for particular publishing companies. When a work appears in
a bibliography, it is there because of the independent judgment of a
scholar or a librarian as to the significance and the relevance of that
work; it is not there because somebody is trying to sell it and make money
from it. Libraries are full of "pointers" to information, in the form of
online catalogs, indexes, large and small bibliographies in books and
articles, web-based pathfinders and the personal interactions of
librarians and researchers. These "pointers" have the value that they do
in part because of the independent judgment behind them and the ability of
the professional to match the reader to the right book for them. When a
commercial element is added, the "right book" becomes "the book I want to
sell." The commercial interest is representing only itself while the
unbiased professional is under no pressure to favor any particular vendor
or publisher, and is therefore free to attend to the user's personal quest
for truth and their efforts to contribute to society's shared store of
knowledge. Truth-seekers outside of the context of educational
institutions have an equal interest in unbiased information undistorted by
commercial interests, but in the wider world they tend to be more
vulnerable to that distortion.

Google Print (TM), even in its introductory phase, plays a major role in
introducing advertising into the field of education, scholarship and
research, all the more so the more it attempts to enter the higher
education "market." At the present time Google claims not to allow
commercial interests to distort its search results (though many people,
noting the prominence of commercial clutter in their search results, are
skeptical of this). But Google's status as a private near-monopoly (in
certain respects) means that its reliably "clean" search results cannot be
guaranteed by any public policies and could be transformed into pure
e-commerce at any time. (If we find this alarming, I should point out, it
is not because of "sentimentality" but simply because of our strong
values. We should demand that these values be respected.)

Democratization?

Google is claiming that their digitization project promises to democratize
access to these collections of millions of works. I have to admit that
research libraries do not really represent paragons of democracy and are
not readily accessible to most people, and not only because of geographic
barriers. I also have to admit that to the extent that a person will be
able to freely download an out-of-copyright work that Google has scanned,
access to that particular work has been democratized, and I forgive even
librarians' excitement about this development. However, there is a deeper
sense in which Google's claim to represent the democratization of
information that is presently "locked up" in libraries is a reversal of
the truth, and that reversal is dependent upon what is ultimately an odd
sense of the meaning of democracy.

When these collections are digitized and made available through that simple
search box on the web, something very strange begins to happen. They
begin to take on the character of "stuff" in the same way that everything
else we download and view in web browsers has the character of
"stuff" (similarly to the way that money is "stuff"). There is a bleeding
of contexts; with no physical separation and everything on a flat plane,
there is little contextual separation between our browsing of personals
ads, our online banking, our travel reservations, our eBay, our comics,
our news and our Spinoza. All of these activities and contexts become
"democratized" in a certain sense, but not the sense we mean when we talk
about trying to build a democratic society. Web pages of 7000 words are
called "books" and look identical, or even more impressive, than true
online repositories of literature. The information carried by graphic
design has increasing importance, and may not bear any relation to truth.
The character of everything on the web becomes conditioned by the
character of the web itself, and the character of the web is strongly
determined by its overall consumer orientation and its relation to the
experience of shopping - seeing, choosing, and consuming. As the contents
of research libraries becomes "web content," the mode of the use of these
materials will be transformed according to the mode of use of the web
medium, which sees us skimming, jumping from point to point, impatient.
critical by reflex rather than by reflection, superficial and
narcissistic. In other words, the web medium tends to "dumb down" the use
of what is in it (a phenomenon that may be connected to the relationship
to the medium of television). Consumer society has indeed interpreted
democracy as something we increase as we dumb down mass media
communication and even the educational process in general. So while freer
access to out-of-copyright works is undeniably a democratic thing, we
should also pay attention to the underpinnings of that mode of access and
ask ourselves certain questions: What kind of use of these works is the
web medium itself is likely to encourage, that is, what does the
commercial web do to the nature of research and scholarship? And what
does that do to the character of our democracy? And how will these works
become connected, via a few short hyperlinks, to the distorting influence
of e-commerce?

Here is a less abstract question about how truly democratizing this project
will be: How long will it take before the copyright-protected works in
these collections are available on a pay-per-download basis, turning the
equity-of-access principle of libraries, which is what gives libraries
their essential democratic character, into the principle of access for
those who can afford it? Contrary to free-marketeers, who see the market
as the truest expression of democracy, there is a contradiction inherent
between the needs of democracy and the prerogatives of the market. The
notion of democracy assumes a rational polity, assumes that the
preconditions for an intelligent, thoughtful society exist, while the
market tends to nurture what is most stupid in people, preferring to fool
them rather than to help build independent minds. Transferring these
millions of works from research libraries, even ones at ivory-tower
institutions, into a commercial enterprise such as Google, which will make
money off of them in any way it can, is superficially democratizing but
deeply contrary to democracy's need for information in the public sphere,
as useful as it might be to the more fortunate among us who have the
ability to make use of it.

Disintermediation and decontextualization

Disintermediation, the substitution of "software solutions" for
professional services, has affected most areas of economic activity since
the start of the computer revolution, in librarianship no less than in any
other field. Information seekers often choose the convenience of the
internet over consultation with an information professional, or even the
consultation of a bibliography or an index. The stable exception, up to
this point, has been in the area of serious research of the kind that
requires the use of highly specialized writings, often including those
very old works. To access those materials, and to find them in their
proper context, a researcher needs to use a library and some of the many
research aids that are produced by librarians and scholars. Google's plan
will put those works in a giant bucket (so democratizingly) and enable you
to pull them out with keywords, kind of like catching fish with a net. So
much of this material requires expert knowledge even to comprehend, let
alone situate in its proper context, that disintermediated access can in
some cases be worse than no access at all.

At this point I should distinguish between disintermediation in general and
its specific manifestation in the Google search box. It is possible to
build quite a lot of knowledge into a search interace to an information
resource. Access to a thesaurus of the controlled vocabulary used by an
index can be connected to the search. Reverse-citation information can be
built into the display of search results, with linking provided. Multiple
search fields can take advantage of extensive cataloging. Even when all
of this work is done, the results for the searcher are dependent on her
own knowledge level and skill at searching, and many users go away
frustrated or go away happy with material that they don't realize is of
poor quality or not as relevant as it could be. This is the major problem
librarians face with the tools offered over the web by their own
institutions.

With the Google interface the problems created by disintermediation reach a
new level, because years and years of careful organization of the
materials in question will be dissolved in favor of Google's relevance
ranking system, which treats every web page and every book in Google Print
(TM) outside of its original context, funneling them all through a single
keyword search. (That librarians may have done that organizational work
"lovingly," as Larry Page put it, is irrelevant and a trivializing thing
to say, if it could even be known. More to the point is that this
organizational work was done with the aim of providing access in a
meaningful way.)

There is no accommodation, in the Google world, for the myriad scholarly as
well as popular jargon and dialects even within single subject areas,
which is especially significant when works spanning hundreds of years are
in the mix, a situation that leads to a loss of recall as searches based
on idiosyncratic keywords miss relevant works that use other terms, and a
loss of relevance as works are picked up that use the same keywords in
totally different ways. This is part of the reason that subject
cataloging and indexing is useful and worth the time of professional
catalogers and indexers.

In the Google world, there is no real intelligence determining what
documents (or books) are going to be the most helpful to an information
seeker, according to their intellectual problem and their knowledge
background. Making that determination is not a simple thing; it requires
knowledge of intellectual disciplines, an ability to understand people
well, and a creative mind. Keyword searches can be useful in certain
contexts, but a single keyword search for what is offered as a "whole
universe" is no substitute for a reference librarian, no matter how
sophisticated the search engine. (To say that librarians are "the most
effective search engines yet invented," as John's Hopkins University
President William Brody wrote recently (13) is quite demeaning to
librarians, for whom search engines are only one brainless tool in a large
tool set.) This is one of the reasons libraries employ professional
reference librarians to help people with their information needs.

The organization of information in a library, through its catalogs,
indexes, and numerous bibliographical sources, is not something to be
regarded as having mainly a sentimental value. It is incredibly
practical. The "bucket effect" of dumping millions of texts into a
database searchable only by keywords, no matter how sophisticated the
search engine, represents a major loss of value if access to those works
via Google is compared to access through a library.

I am not forgetting that these research libraries will retain ownership of
the original works, and will also own digital copies of the works that
they will be able to share in any way that they like, which I concede will
be a major benefit of the deal. Realistically, however, as Marc Meola
pointed out on COLLIB-L on Wednesday, information seekers will probably
just "Google it," trusting an algorithm and thinking they are searching
the universe, even more than they do already.

Conclusion

A member of the livejournal community "libraries" posted a link to the New
York Times article Tuesday, commenting, "We're not being taken over, we're
just becoming the greatest information conglomerate of all time." (14)
This illustrates the confusion of so many internet librarians who identify
with "the Web." "The Web" is not us; it is a medium with its own effects.
And Google is not us. Google is not staffed by librarians, and does not
operate according to policies that flow out of long traditions of library
practice guaranteeing privacy, equity of access, collective ownership of
information, information in context, and personal service. This project,
as Larry Page has already put it, is about monetizing the holdings of
research libraries. It is about commercializing library collections that
it has taken centuries to build. It may be the "greatest information
conglomerate of all time," but it is not us. We are nowhere in it; we do
not control it or even influence it. We may be invited to imagine that it
is "us," that it is "a library" or even that it is "Library," and we may
be flattered by the attention, but we should take care to remember what
librarianship means in contradistinction to commercialized information, to
remember the difference between individuals-as-citizens and
individuals-as-consumers and to remember that as librarians we are public
stewards of the information commons and have an obligation to preserve and
protect it. And, to say it one last time, we must not let anyone write
off these concerns as "sentimental." They are not; what they are is
simply values-driven.

Now, I suspect that there is no stopping this (though the project is likely
to be a great deal more difficult than Google anticipates), and I know that
there is no hope of nationalizing Google as a public monopoly, and no hope
of raising comparable public funds for a similarly massive public
digitization project, at least not the way things are going right now.
I also know that in ten years time I will most likely be making good use
of some of the material in Google Print (TM); I don't think I will boycott
it. But I hope that by articulating these problems (most of which relate
more to general trends than to Google specifically) I can help to advance a
critical perspective that will allow us to at least see clearly and to be
of use when crucial questions arise where the public interest is at stake.

References

*

Date: Tue, 17 Jul 2007 08:35:57 -0500
From: Paul Bramscher
Subject: Re: [A-librarians] Google digitization
To: Anarchist and Radical Librarians

Message-ID: <469cc5bd.8070702@comcast.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I work at the University of Minnesota Libraries, which was one of the
CIC institutions to sign on to the Googlization. I wrote a blog entry
about it here:
http://paulbramscher.blogspot.com/2007/06/google-cic-library-agreement.html

The main points I want to emphasis in the blog entry:

1) The decision was made in a complete vacuum at my institution -- even
the people involved in the decision weren't named. So, given the lack
of professional/intellectual/technical/technological/scholarly input in
the handover to Google, we're left to conclude that this was strictly
top-down autocratic deal-making. And since the process was autocratic,
we might expect the results to produce more of the same. The lack of
transparency is pretty creepy -- it was totally black-box.

2) The Google digitization is little more than another example of the
great sell-off of the public domain to the private sector, and the sell
off of public service/occupations as well. You'll note that many of the
currently scanned Google books have fingers, even whole fists, etc. in
some of the images. It appears to be a highly manual process, they even
tried to white-out some of the thumbs. I've calculated that they must
have ~100-200 employees working around the clock.

3) The contract has a double indemnification clause in it, and the
libraries don't even know the details of how the scanning works.

4) Also, Google gets to keep books on the near-horizon of public domain
in escrow indefinitely.

So, for all we know, the scanning of CIC materials is done in an Asian
sweatshop.

So it's pretty interesting when you think about it. We've got over a
century of public libraries, and perhaps 2 centuries of academic
libraries in public or at least scholarly curatorial domain. Instead of
hiring a team of student workers, clerical workers, etc. they've
outsourced the job to god-knows-where, and transferred the
role/history/soul of libraries to the private sector.

I'm sure glad my degree was in CSci and not library science -- at the
rate top library managers are selling out the institution as even a
concept, I doubt there will be much left in 10 years.

Paul Bramscher
Web Applications Developer
Digital Library Development Lab
University of Minnesota Libraries

No comments: