Institutional Repository Document Workflow
How do documents in different types of formats get placed in a repository? The information available here is based on practices used for the University of Utah Intuitional Repository.
Publisher PDF
If we have permission from the publisher to upload their PDF of a document, the publisher’s PDF of the document is downloaded The PDF is examined to determine if it has embedded text, if it does it is full text searchable.
The way to tell if a PDF has embedded text is using the “Select text” option on the Acrobat toolbar-- if you are able to select text with your mouse then the document has embedded text, if you aren’t then it is an “Image only” PDF.
Metadata for the PDF is added in CONTENTdm and the PDF is uploaded to CONTENTdm Administration. In CONTENTdm Administration the uploaded document is approved and the collection indexed, once indexed the item appears in the collection.
Image Only PDF
Hard Copy
If a hard copy of the article is provided, it is scanned and saved as a PDF. The PDF is run through an optical character recognition (OCR) program, such as ABBYY FineReader, in order to extract the text in the document. The extracted text is saved in a separate file and is copied and pasted in the transcript field during the metadata creation for the document, ensuring the document will be full-text searchable. The PDF is then uploaded to CONTENTdm Administration. In CONTENTdm Administration the uploaded document is approved and the collection indexed, once indexed the item appears in the collection.
Electronic Copy from a Word Processing Program
If the document provided is an electronic file from Microsoft Word or another word processing program, the file can be converted to PDF format by Adobe Acrobat Professional in a variety of ways, the three most commonly used are:
- In Microsoft Word or another word processing program, selecting “print” under the file menu, changing the printer to “Adobe PDF” option and selecting “Print”. Adobe Acrobat Professional generates a PDF for the document.
- In Microsoft Word, or another word processing program, selecting “Adobe PDF” from the menu bar, and “Convert to Adobe PDF”.
- In Adobe Acrobat Professional selecting “Create a PDF” from the menu bar and “From a file” from the drop down menu, navigating to where the file is housed and selecting it, a PDF is created from the file.
Metadata for the PDF is added in CONTENTdm and the PDF is uploaded to CONTENTdm Administration. In CONTENTdm Administration the uploaded document is approved and the collection indexed, once indexed the item appears in the collection.
Content provided by Anne Morrow, University of Utah Marriott Library.