Panel 4: Digitizing Collections

The Digitizing Collections panel, moderated by Associate Yale University Librarian for Collections and International Programs Ann Okerson, is the final panel of the Library 2.0 conference. Panelists include:

Jeff Cunard, Partner, Debovoise & Plimpton

Guy Pessach, Lecturer, Faculty of Law, Hebrew University of Jerusalem

Frank Pasquale, Visiting Professor of Law, Yale Law School

Brewster Kahle, Digital librarian and co-founder of the Internet Archive

Several of the panelists address the proposed Google Books Search settlement as part of their remarks. Commentary on the proposed settlement can be found in today's New York Times.

A more detailed commentary can be found in this article by former Yale ISP resident fellow James Grimmelmann, now teaching at New York Law School. You can access PDFs of the settlement documents at this site. The full-length settlement draft is 134 pages (without attachments), but the 32-page Notice to Authors (Attachment I) provides a fairly detailed summary.

Jeff Cunard, counsel to the Publishers subclass in the Google Book Search (GBS) litigation and settlement:

From the standpoint of the authors and publishers, there is enthusiastic support of the settlement. The GBS cases were brought because Google was scanning millions of books from libraries and displaying snippets of those books without express permission from the copyright holders. Google claimed that this was a fair use under the Copyright Act; authors and publishers disagreed. Libraries were not sued by the authors or publishers.

The authors and publishers objected to the idea that Google could copy first without obtaining permission. They thought Google should have asked for permission before scanning and displaying books. Google said its action was lawful, and that pursuing a massive scanning project like this makes it impossible to get the permission of every person who holds a copyright to the material.

The question at issue was whether digitization and the display of a snippet of a book was a fair use. The two sides disagreed, and this question ultimately won't be answered in the GBS case, because the parties reached a settlement, coming to the conclusion that a better, broader deal could be struck in a settlement.

Why did the authors and publishers decide to settle?

1. It's a way of allowing readers in the US to access whole books in a meaningful way. Snippets can be frustrating, and the settlement will be particularly useful to making out-of-print books accessible. The terms of the settlement set different defaults for in-print and out-of-print books. Out-of-print books can be displayed to differing degrees based on various pay schemes and rightsholder permission. The default for in-print books is no-display, unless the rightsholder authorizes it. This will help to breathing new commercial life into out-of-print books

2. The settlement provides a comprehensive rights authorization scheme through class action. It Would be very difficult to get rights authorization from this large a body of rights holders in any way other than a class action settlement.

3. The settlement provides new revenue models for making these books available. Consumers can purchase accesss to a book online from any computer connected to the internet in the US. Google will make available institituaionl subscriptions, where research institutions can buy a subscritption to the entire digital corpus. Rightsholders can derive ad-based revenues, and in the future, may be able to permit print-on-demand and downloadable PDF services. All revenue is split 67% to the rightsholders, 33% to Google.

The settlement establishes the Book Rights Registry. Similar to the ASCAP and BMI rights registries, in that it serves to distribute earned revenues to rightsholders.

Libraries get digital copy of books that they provide for digitization. Libraries were represented by a lawyer in the settlement discussions, and the settlement has provisions about what a library is allowed to do. The settlement reflects a series of rights and obligations regarding what a library can and can't do with its Library Digital Copy. They can make it available for non-consumptive research, enhanced access for people with disabilities, and classroom and research uses.

In sum: The settlement will allow readers in the United States to find access to and read millions and millions of books. The publishers subclass is very enthusiastic about the settlement and looks forward to it being approved by the court.

Guy Pessach:

Discussing digital archives in Europe, which are different from the commercial and privatized scheme presented in the GBS settlement. In Europe, there is more focus on public provision of digitization and cultural preservation.

Example: The Digital Images Collection of the Victoria and Albert Museum in London. The collection may be used for personal and educational uses, with no permission or royalties required.

Example: The Europeana project, a search platform that directs searchers to European books and artworks from more than 1000 participating institutions. Each institution holds its content separately, and the databases are linked through the Europeana search engine. Users must respond to the rights claims of both the original copyright holders and the Europeana search platform.

Concerns with the European system: European copyright law recognizes a copyright in most kinds of databases. Digital images of public domain materials may be recognized as protected by copyright. And only certified cultural institutions can be included in the Europeana platform.

Moving toward legal policy in EU, the reform proposals presented are insititutional licensing schemes between institutions, not legislative reform.

The difference in the approaches to digital collections between the US and Europe is similar to what happened with radio and tv: Europe goes public, US goes private, but both concentrate around licensing regimes that replace legal frameworks and block any hope of legal reforms. This places incredibly high barriers to entry for any entity that isn't the dominant licensee (Google or the EU), and allows the dominant players to leverage copyright law.

Concerns: digital archives have major social roles to play; they shape our views, ideologies, and perceptions. In the history of media regulation, we can see that a concentrated map of public or private archives doesn't follow goals of democratic culture. We need a more hybrid system that allows mulitple forms of digital collections.

The comprehensiveness of GBS is double-edged sword: desirable but also dangerous for cultural preservation if it leads us to ignore or foreclose other options.

Where do we go from here?
Possibility of copyright reforms (such as compulsory licensing, broader interpretations of fair use, digital deposit requirements) probably won't be pursued.

Another option: Pursue new directions to incorporate free speech theory into the evaluation of the settlement agreement. We need to think of free speech not in the contemporary way, but from the Internet perspective. This should come up as main consideration in digital preservation debates, thinking about the rights of future generations to be exposed to a diverse range of materials and opinions of the past, as well as the right of contemporary speakers to have their imprint on history.

--- Q&A ---
Q: Is there a useful comparison to be made between GBS and the Kindle's text-to-speech feature?

Cunard: Not really. Kindle is distributing books with the permission of rightsholders. In the case of GBS, there are millions of books being copied, and there's no contractual relationship among the various rightsholders. The principle issue for GBS is whether scanning and displaying snippets is fair use; for Kindle, it would be whether there's a commercially significant non-infringing use of the text-to-speech function. Fair use determinations are highly fact-specific, so it's hard to compare the two cases.

Q: With regard to the most-favored nation clause in the GBS settlement, how will other organizations compete in book digitization? How do the publishers subclass contemplate competitors to Google?

Cunard: The licenses provided for in the settlement are non-exclusive; the rightsholder can withdraw from the registry before the end of the opt-out period (May 5th, 2009), turn off uses or access to their work after the opt-out period, and can make separate deals with any other licensee. The Registry is not precluded from making deals with anyone else [though, under the terms of the settlement, it can't make a better deal with a competitor that it has with Google], though the "somewhat challenging" question is how Registry would get the authorization from the rightsholders to grant a license to a competitor.

The most-favored-nation clause only applies if the Registry can find a way of licensing rights from a substantial portion of rightsholders who haven't come forward and claimed rights under the settlement. If 20 million rightsholders' works are present in Google's corpus, and only 1 million rightsholders claim their rights, Google still has 20 million licenses under the settlement. The most-favored nation clause only applies if the Registry is licensing a substantial portion of the 19 million unclaimed works to a third party. In this case, the third party might need to pursue its own class action.

Q: Copyright and scholarly communications are increasingly global and international; how far away are we from getting this type of access in other countries, of other kinds of books?

Cunard: The settlement only covers the rights of copyright owners in the US. Under the settlement, Google is only authorized to make display uses inside the US, but the settlement doesn't actively prohibit Google from scanning and displaying outside the US, so they can try that. Google should get permission for that, but this settlement only deals with the US, so Google can figure out how to pursue analogous results elsewhere.

Frank Pasquale, Visiting Professor of Law, Yale Law School

From Managed Care to Managed Knowledge? A Health Industry Perspective on GBS

There’s a similarity between private managed care organizations in the health care industry and Google. Under the Google Book Settlement, we are going to have middlemen between those who provide info and those who request it. Just as access costs and quality tradeoffs have guided the costs of middlemen in the medical industry, there will be similar access costs/quality tradeoffs when Google distributes digitized books.

Some examples why Google has become the middleman.

1) Market failure – too many publishers to coordinate universal access

2) Public failure - Congress didn’t contribute to the creation of a digital library – no one required a governmental digital deposit requirement

3) Google has imposed order in digital books -- just as Apple iTunes imposed order on a chaotic market for digital music, Google has made a similar contribution to the digitized book industry

After the settlement, Google may be viewed like the "Good insurer" – just like managed care orginazations, Google will drives providers costs down. As they do so, however, they may potentially tier coverage (just as best treatments and providers get the best coverage) and reimburse less to certain groups (different entities may be treated differently).

On tiered access to Google Books, imagine Harvard gets everything, Yale gets almost everything, and school that can “only” pay $100,000 a year to get a “disabled version” of the same product. Is this just?

There may also be concerns about the manual change of book rankings –Google can downrank books that criticize it or match its politics. There may be secret Payola deals that induce Google to rank certain books first on a consistent basis.

If books are eliminated there is a risk as well - the settlement allows books to be excluded for both editorial and non-editorial reasons. Although Google has to tell the books rights registry about these excluded books, there is still uncertainty. Who will post those books? How will the public obtain high levels of ubiquitous access to them through another book database?

Extant law probably won’t help the situation.

1) First amendment – unassailable outputs.

2) Trade secrecy

3) Antitrust -- Problem here is that it's a very slow way to resolve things.  Rule of reason decision could take years and years (want quick look or per se illegality). However, proactive government regulation could be useful in this area.

Pricing Concerns

1) Cocaine pricing – worried that they’d be hooked by cocaine pricing. Bait and switch – there are ways in which you have that same cocaine pricing arrangement. There is a worry analogous to how insurers avoid covering the sick.

2) In addition, Google may not care about reaching the people who they don’t want to access. It also not clear if the settlement will prefer a Low margin/high volume model or a High margin/low volume model. The pricing model should be further scrutinized.

3) Concern about gradual price increases -- prices may start very low and be raised later on in ways that are anticompetitive and exploitative.

Public Alternatives

1) Digital deposit -- The U.S. government could have required deposits of all digitized books into a government-held database.

2) Mandatory Disclosure of the Ranking System -- This would provide useful benefits. This would be in the public interest -- for example, in New York there was a Doctor Rating Settlement that required transparency in the reporting of Pricing/Ranking practices by insurers who rated doctors. Those doctors were only rated according to price. Once doctors realized that they were doing this, they filed suit.

There is a legitimate case for secrecy of the Web Search Rankings that is not there for Google Book Search - we wouldn't want Google to disclose its ranking algorithm because we wouldn't want spammers to game the system. However, in the books case the search domain is bounded by books that have been published.

Medicare could potentially be a model for the settlement. First, it provides care to all in a certain vulnerable population. It also sets a baseline of pricing in the case of collusion -- we may want to look at existing government structures if we consider regulation.

Ultimately, it is up to us to alter the terms of the agreement. To summarize, responsive regulation should balance:

  1. Incentives for private companies to innovate
  2. Public interest in universal access to knowledge

Brewster Kahle, Digital librarian and co-founder of the Internet Archive

Book – Libraries of the Future – Lichliter – he said that everything is going to be digitized by 2000 – he couldn’t find a reasonable copy of it and it was published in 1965.  It is an out-of-print book, and such books are at the core of the Google Books issue.

Some statistics:

Open Content Alliance scans 1000 books a day – 5 countries – have 1.2 million books on the internet.  In court filings google says they have 7 million – OCA will be at about 2 million by the end of the year – all told they’ve done about 2 million books. There is an alternative going on in a really big way. However, they don't have the same level of resources.

He finally found the book mentioned above at the university of Toronto website – the book is available for $130 so it is very out of print – he took the risk and asked it to be digitized. The book kept getting pulled offline – automatic threads pull it off. He overrode all the threads. He wrote to the head of MIT press to get approval to post it, and this process took a very long time -- days of labor.

Interestingly, the book predicting a digital library of a future did not mention copyright at all -- this is very ironic given the current situation.

He Referenced the article in today's New York Times, which is definitely an interesting read for more background.

Google’s search tool has become a digital book store – it will selling subscriptions to Google’s new “exclusive” library. The original lawsuit would’ve at least allowed us to define fair use – this is a new and unsettling form of media consolidation. If approved, there will be 2 court sanctioned monopolies

1) Google the only organization with ability to produce out of print books – no other provider will enjoy the same legal protection

2) Books rights registry will set prices for all digital books

Broad access is ok but providing so much is a danger to principles we hold dear. Free speech, A2K, educational access, etc. will be compromised.


1) Libraries are already digitizing books to borrow, purchase, and read millions of books – not that expensive

a. For the cost of 60 miles of highway can have 10,000,000 books of a digital library

Through a simple search of the internet can find books from many instead of books available for a fee controllable by a corporation

We have wrestled with high-tech monopolies in the past – such strongholds restrict innovation – in these cases the courts stepped in. Have a proposal from monopolies to be created by the court. Monopolies will hinder progress and innovation in this area.

Ultimately, we need legislation to address works that are caught in copyright limbo – need to stop monopolies to create a vibrant library system

We are close to universal access to all knowledge -- we must do what we can to continue that trend, otherwise...

  • Google will feed everything into Android
  • Amazon will feed everything into Kindle

The control of devices will dictate the entire future of this industry, which will be very unfortunate.

He advocates open standards – distributed vending and lending – can find what you want out of search engines to find lists of books, etc. and then directly connect from devices to libraries or booksellers

The Google Settlement is effectively a legislative trick.

And, Brewster reiterated this in a recent article when he said "They are doing an end run around the legislative process" in the following article.


Can you provide a little more detail on how you see the current project moving towards the distributed form?

Google was a great entry – but they needed publishers to come to the table, and it is hard to get DRM that closes every whole. The distributed system requires relaxation of DRM – track record of what happened with the music guys – itunes controlled a lot of their music –music guys going before – conversations they are having with publishers are very productive

Can loan out of print works – a lot less contentious – publishers make things accessible – libraries loaning out-of-print works and relaxing DRM – up to the libraries to step forward in the loan area

Jeff: Brewster and Jeff Agree – Model should not be exclusive – thorny problem on how you get the rights to the 10 million books – authors and publishers have the rights to the books – rights would have completely reverted to the authors. 10 million books in copyright limbo. Hard question we have been grappling with is how do you get the rights to those orphan works – if one was copying a book and lending the book out – libraries don’t have the rights to authorize distribution through lending.

Brewster: Exclusivity – issue is around the orphan works – things that someone did not come forward and put in the registry. Anytime we have gotten people to sign up for anything – it’s “all the rest that is the issue.” Under a class action Google is the only one that has explicit legal permission to sell those works – a books rights registry may come after Open Content Alliance saying they don’t have the rights for those orphan works. Google and Google alone does

Question: Can Microsoft get a similar settlement going? Medicare analogy – it is pretty expensive –

Frank: Terry Fischer’s book is very useful – although there are articles . Comparison between public and private sector unfair – always know in public regime because disclosure is forced – Frank’s big concern is about the trade secrets

Question: In light of how libraries are coming out – huge capital projects – what are they getting out of it

Jeff: A number of libraries have entered into digitization agreements with Google everything from making replacement copies to certain classroom uses to integrating digital copies into finding tools, etc. – can get snippets from the books

Frank: One challenge – what is happening with Google is that it’s an amazing algorithm but it’s secret – librarians should be a vital part of organizing the web and this knowledge – part of the registry

Jeff: Some Numbers

Google is expert at exploiting the Balkinization of the libraries – see themselves at competing with each other – everything is guarded through non-disclosure – Google stopped all of that conversation

Bait and switch – settlement requires more restrictions on the books that were digitized

Costs – Internet Archive – 10 cents/page – about 30 dollars per book

It is much less at Google – 5-10 dollars a book – 7 million books at $5 each is $35 million

Yale’s library budget is $18-19 million per year – library budget in US $12 billion/year – 25-33% goes to publisher’s products – 3-4 billion per year is spent buying things from publishers

If you were to go and build a 10 million book library (Harvard, New york public) at highest grade quality would be a $300 million project – a couple percent of one year’s budget for

Ann: Pushback – no library can begin to meet the needs of all of its users

Allocation of resources – quite user driven – pay attention to student-faculty aids – something in Ann that can’t wait to get her hands on those 7 million books – there needs to be a chance for the library community to sit at the table – know that sounds like heresy but the second this resource becomes available there will be insistence. What are the needs of our users – Google settlement might become a template for the needs of our users on a smaller scale