Thursday, December 27, 2007

Project to produce comprehensive digital archive of 60 million pages of federal government documents

Public.Resource.Org, the Internet Archive, and the Boston Public Library announced the commencement of phase 1 of a project that aims to create a comprehensive digital archive of 60 million pages of government documents over the next two years.

Phase 1 of the project will produce a minimum of 2.5 million pages of digital text using a scanning and optical character recognition (OCR) technology suite developed by the Internet Archive. The Boston Public Library is the first Contributing Library in the program, and has agreed to lend a 50-year run of Congressional Hearings from 1936–1986, as well as a complete copy of the Catalog of Copyright Entries. Scanning will take place at the Boston Library Consortium's Northeast Regional Scanning Center.

No comments: