The primary long-term objective is to capture all books in digital format. It is believed that such a task is impossible and could take hundreds of years, and never be completed. Thus, as a first step we are planning to demonstrate the feasibility by undertaking to digitize 1 million books (less than 1% of all books in all languages ever published) by 2005. We believe such a project has the potential to change how education is conducted in much of the world.
A secondary objective of this project will be to provide a test bed that will support other researchers who are working on improved scanning techniques, improved optical character recognition, and improved indexing. The corpus this project creates will be one to three orders of magnitude larger than any existing free resource.