First, I don’t ever copy anything online that is not from a public domain book. If there are hits in a book that is not public domain then I will see if I can get it from my local library through the Inter-Library Loan program. There are some Usage Guidelines from Google that are important to know:
———————
Usage guidelines
Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing this resource, we have taken steps to prevent abuse by commercial parties, including placing technical restrictions on automated querying.
We also ask that you:
+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for personal, non-commercial purposes.
+ Refrain from automated querying Do not send automated queries of any sort to Google’s system: If you are conducting research on machine translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the use of public domain materials for these purposes and may be able to help.
+ Maintain attribution The Google “watermark” you see on each file is essential for informing people about this project and helping them find additional materials through Google Book Search. Please do not remove it.
+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other countries. Whether a book is still in copyright varies from country to country, and we can’t offer guidance on whether any specific use of any specific book is allowed. Please do not assume that a book’s appearance in Google Book Search means it can be used in any manner anywhere in the world. Copyright infringement liability can be quite severe.
———————
In short, make sure it is public domain, don’t use it for commercial reasons and keep the Google “watermark”.
I normally only copy a few pages from the PDF files and here is how to do that:
Step 1: Get the PDFTK tool from http://www.accesspdf.com/pdftk/
This tool will let you copy only certain pages from a PDF file. It is a very powerful tool and I will only touch on one of the things it can do. For the rest of these steps, I am assuming that you have followed the download directions and have the above mentioned PDFTK tool on your computer (and you have a Public Domain book available).
Step 2: I normally put the book and the PDFTK program in the same folder. For this example I have found the book “Pioneers of old Hopewell.pdf” and I am going to extract some information about the Parke family in New Jersey.
Step 3: Identify the pages in the PDF file you need. In this case I am going to extract page 12 which contains the title and publishing information and pages 200 through 202. Here is a tricky part, the pages in the PDF file may not match the page numbering of the book. In most of the Acrobat readers there is a “pages” tab. Looking on that tab tells you what the page in the PDF file is that you are located on. In this case, Page 200 was actually PDF page 211. Therefore we will be extracting pages 12, 211,212 and 213.
Step 4: PDFTK is a command line program so you can do all this through a Command Prompt window. I prefer to just write a small Windows Batch file and execute it. That way if I make a mistake on the page numbers I can quickly correct it and go on (or come back later and do some more).
In this case I created a file called Do_pdftk.bat in this directory and using “right-click->edit” put in the following line:
pdftk Pioneers_of_old_Hopewell.pdf cat 12 211-213 output Pioneers_of_old_Hopewell_pp200_202.pdf
CAUTION: DO NOT name the batch file pdftk.bat or it won’t be able to find the program to run.
Lets dissect this: pdftk is the name of the utility we are using, Pioneers_of_old_Hopewell.pdf is the original book, cat 12 211-212 tells the utility to get pages 12,211,212 and 213 and output Pioneers_of_old_Hopewell_pp200_202.pdf is where to put the pages when done.
**whew**, not as hard as it looks really….
Step 5: Close the batch file editor and double click it , if everything is typed correctly it will execute (you will see a black command window come up and then go away) and you will now have another file in the directory.
Notice the new file is only a fraction of the size of the original and if you open it, it will only have the pages we listed above.
Also notice that the Google Books “watermark” is preserved.
NOTE: If it didn’t create take a look a the spelling of the command first, spaces and other special characters in filenames are typically not allowed so stick to letters, numbers and the underscore character. Also, you need to give it a new output name, if you put in the original book name it will fail. If you made a mistake on the page numbers just delete the new file, change the numbers in do_pdftk.bat and try again.
Obviously you can do a lot more with this program so keep it legal and use when you want to preserve some pages. I use the rule of thumb that if I am not willing to go and photocopy it then I won’t save it here.
Good luck.