Project Ideas
Digitizing Historic Theses and Dissertations
The Eccles Health Sciences Library has recently started digitizing theses and dissertations, using public services staff to do most of the work. The instructions listed below describe the digitization process at Eccles in detail.
There are three separate work stations used in the digitization process; one for scanning, one for PDF bookmarking using Adobe Acrobat Professional, and an ABBYY FineReader station.
The binding of the thesis is cut, so it can be fed through a scanner hopper several pages at a time. The physical copies of the scanned theses are discarded, but an additional paper copy is preserved in Marriott Library Special Collections.
Scanning Station Instructions and Procedures
1. Before you start scanning
- Check document for color photos. Any pages with color photos need to be scanned according to the directions below—see “For Color and Poor Quality Images”.
- Check document for any double-sided pages (this really only applies to older theses from 40s, 50s, 60s, 70s.) Many times the figure will be on the right side with the description on the left side. Scan those double-sided pages so that both sides are included.
2. Open Adobe Professional
3. Choose Create PDF From Scanner
- For Black and White Text
- Multiple Page hopper
- 400X400 Pix
- Black and White
- 7.800X10.800 in
- For Color and Poor quality images
- Multiple Page hopper
- 600X600 Pix
- Color
- 7.800X10.800 in
4. Center the box around the text
This will ensure that all text gets included and that there will be no black line at the top of the page
5. Saving Scanned Document
Save in a common thesis folder on a network drive. Create an individual thesis subfolder based on the title on the spine: ex: The PDF version of a thesis called “A Pinyon…” goes into a folder called “A Pinyon…” (or whatever file name is on the spine).
6. Check Scanned Thesis
Check to make sure that all pages are there and all are facing the right direction.
If there is a problem:
- If the physical thesis copy is missing a page, then put in a folder marked "Physical Copy Problems"
- If the scanned file is missing a page or isn’t correct, but it is right in the physical copy, then rescan.
7. Place in a shared folder marked "Thesis to Abbyy"
Abbyying Station
Use ABBYY FineReader or another OCR software program to generate a searchable transcript of your thesis.
1) Choose a thesis from “Theses to be Abbyyed” folder
2) Go to Start > Programs > Abby FineReader 8.0
- Click on Scan and Read
- Choose Scan from file
- Browse to commonfolder>IR>theses>PDF folder
- Double click on PDF
- It may take a while depending on the size
- You can be doing other things while it’s scanning
- In Scan and Read Wizard
- Choose English
- Click Next
- It will take a while (do other things while it’s reading) - In Wizard
- Click Next, choose No.
- Click Next, click on ‘Save Pages’ at the top of the list
- Click Next
- Browse to common folder>IR>theses>PDF folder
- Make a new folder within the PDF folder called ‘text’
- Make text file name same as PDF.
- Change “Save as type” to .txt
- Click Save
3) Move entire folder for the PDF (not just the text folder) into the “Ready to be Bookmarked" folder
Bookmarking Station
1) Pick a thesis from the TOP of the list of theses in the folder “Ready to be Bookmarked”
2) Bookmark the thesis
You will need to renumber the document using Adobe Acrobat Professional:
- Go to ‘Advanced’ on the menu bar
- Select Number Pages…
- Under ‘Pages’ choose All
- Under ‘Numbering’ choose “None”
- Click OK
- Go through the document counting the # of pages from the title page to the “Introduction” or “Chapter One” page. Notice the number of pages including the first page but not including the “Introduction” page.
- Under ’Numbering’ choose roman numeral style (i, ii, iii, iv…)
- Click OK
- Click through until you get to the Introduction page
- Go back to Advanced > Number Pages
- Under ‘Pages’ choose From leaving the To as is and entering the last page number listed in the parenthesis in the From field
- Under ‘Numbering’ change the style to 1,2,3…
- Click OK
- Click through the document to make sure it worked OK
- Go to the cover page (the very first page of the document)
- Go to Advanced > Number Pages
- Under ‘Pages’ make sure From is selected
- Under ‘Numbering’ change it to none
- Click OK
Adding Bookmarks:
- Click on the New Bookmark icon (Next to the trash can)
- Replace ‘Untitled’ with ‘Title Page’
- Go to ‘Options’ and select ‘Set Bookmark Destination’
- Say ‘Yes’ to question
- Go to Page 3
- Click on the New Bookmark icon
- Enter ‘Supervisory Committee Approval Form’
- Go to ‘Options’ and select ‘Set Bookmark Destination’
- Say ‘Yes’ to question
- Go to Page 4
- Follow the same steps for every page that would require a bookmark: the TOC, acknowledgments, dedication, list of tables and figures, every new chapter, section, or subsection noted by the author in the TOC.
- Make sure that bookmarks are all applied to the correct destination.
Scanning content provided by Allyson Mower, Eccles Health Sciences Library.
Processing Scanned Theses for the Institutional Repository
The digitized thesis is uploaded to the University of Utah IR using CONTENTdm. Since the theses are already cataloged, data from the original library catalog record is used for metadata in the IR record. For a visual guide on uploading items to CONTENTdm, see Adding Items to CONTENTdm for library staff. The text in bold represents the metadata fields used to describe the thesis.
- Login to CONTENTdm.
- If the thesis you are uploading has a catalog record, open the catalog record in another browser window.
- Copy and paste any relevant information from the catalog record to the appropriate metadata fields. You should be able to copy the title of the thesis, author, and LCSH subject headings.
- Look at the catalog record for additional information to fill in metadata fields for degree, department, and school/college.
- "University of Utah" is listed as both the publisher and degree granting institution.
- For the rights management field the author is the copyright holder, for example "(c) author name"
- Put "application/pdf" for the format medium field.
- Look at the title page of the thesis for the defense date.
- List the source of the thesis that was scanned, for example "Original: University of Utah J. Willard Marriott Library Special Collections".
- Put "eng" in the language field, unless the thesis is written in a language other than English.
- Copy and paste the abstract from the thesis for the description field.
- In the relation field, include the information "Digital reproduction of [Title of Thesis], location of physical copy of thesis." For example: "Digital reproduction of A Study of Oil-Shale Reduction in a Computer Controlled Retort, J. Willard Marriott Library Special Collections."
- Copy and paste the text file generated by Abbyy Fine Reader into the transcript field.
- Locate the scanned PDF of the thesis and add it to your collection.
- After the thesis is uploaded to CONTENTdm, and a reference URL is available, add the reference URL to the 856 field of the library catalog record for the thesis.
- Update OCLC's holdings to include the reference URL.