Philosophy of Content Selection


The Million Book Project envisages to develop a collection of one million digital books by adopting a staged approach as described below. The Million Book project will adhere to the copyright law.

1. Coordination of Selection

Creating one digital copy and mirroring it in different locations will suffice, and will support the multiple uses at any time. Preliminary discussions with OCLC as a host for a registry of scanned items are underway. Certain key projects, such as the Making of America project, are already represented in the OCLC database as digital books. Other large digitization projects may require some data entry of their content in order to avoid duplication.

2. Non-copyrighted materials

Materials which are free of copyright as per the Indian Copyright Act 1957 may be scanned for this project. To reduce the costs of selection, the project will probably develop a strategy of selecting key topics and then removing large runs of books and journals from a selected repository. Having a reasonable turn-around time will be essential to the success of the project. A strategy has been devised to understand the logistics of shipping the materials and the impact of their absence from the home library.

4. Best books approach

The project will seek publisher permission to scan books from Books for College Libraries (BCL), one source for core academic books in English. A previous study done at Carnegie Mellon University Libraries indicates that 22% of publishers granted permission for scanning and mounting on the web. The materials in the study were a random sample of Carnegie Mellon Llibraries’ books and included a broad range of dates, publishers, and in and out of print status.. Numerous difficulties from out of business publishers, lack of publisher records, return of copyright to authors, and other circumstances were identified. Subsequently, Carol Hughes, the collections development officer for Questia, corroborated Carnegie Mellon’s experience.

OCLC owns a database of books from the latest edition of Books for College Libraries. BCL contains about 50,000 titles. A 22% success rate in clearing copyright would result in 10,000 of the best books for college students being included in the project. Clearing copyright is labor intensive and expensive. Bradd Burningham’s recent article estimated those costs (“Copyright Permissions” in Journal of Interlibrary Loan, Document Delivery, and Information Supply, 11:2 (2000), 95-111). The BCL database, however, will allow for sorting by publisher so that permission requests can contain the names of several books. A quick sample indicates that as many as 25,000 publishers may be represented there. Despite the expense, this commitment to quality should be attempted. Carnegie Mellon University Libraries will seek private foundation funding to undertake this project.

Publishers increasingly see that digital presentation of their works can attract buyers. They are interested in exploring ways in which their out of print titles may be returned to profitability. Continued work with publishers through the course of this project may attract many of them to it. That would be most beneficial in enriching the content to be made available.