Preparation
Preparation for this workshop will take about 60 to 90 minutes, depending on your experience with the tools used during this session.
To get the most of the workshop, it is very important that you complete the OpenRefine and Jupyter Notebook setup, and at least work through some of the introductory modules.
OpenRefine
For this workshop, you will need OpenRefine and a web browser. Follow the instructions provided by the Library Carpentry to install OpenRefine on your system (whether it is Windows, Mac, or Linux).
- NOTE: When opening OpenRefine for the first time in a Mac, you may need to open your security preferences and permit OpenRefine to run. See Apple Support’s Open a Mac app from an unidentified developer
Once you have installed OpenRefine, please complete modules 1-4 of the Library Carpentry OpenRefine lesson to familiarize yourself with basic OpenRefine operations.
To save time during the workshop, please create the demonstration projects ahead of time.
Project 1: Soul of Reason metadata
- Download the Soul of Reason metadata
- On the OpenRefine home screen, select the
Create Project
tab, upload the CSV, and click “Next” - On the preview screen, ensure that:
Character encoding
is set to UTF-8Columns are separated by
is set to commaTrim leading & trailing whitespace...
is checkedParse next 1 line(s) as column headers
is checkedUse character " to enclose cells...
is checkedStore blank rows
is checkedStore blank cells as nulls
is checked
- Click
Create Project
Project 2: Soul of Reason transcript
- Download the Soul of Reason transcript
- On the OpenRefine home screen, select the
Create Project
tab, upload the txt file, and click “Next” - On the preview screen, ensure that:
Parse data as
is set to Line-based text filesCharacter encoding
is set to UTF-8Store blank rows
is uncheckedStore blank cells as nulls
is checked- Note that you can import multiple text files into a single OpenRefine project; make sure to check “Store file source” when creating the project so that you can retain from which file data originates.
- Click
Create Project
Project 3: OCR text file
- Download the OCR text file of “Some Comments on Correggio in Connection with His Pictures in Dresden” by Bernard Berenson. Article downloadable from JSTOR
- On the OpenRefine home screen, select the
Create Project
tab, upload the txt file, and click “Next” - On the preview screen, ensure that:
Parse data as
is set to Line-based text filesCharacter encoding
is set to UTF-8Store blank rows
is uncheckedStore blank cells as nulls
is checked
- Click
Create Project
If you prefer, you may also open pre-made OpenRefine projects provided at the links below:
- Soul of Reason metadata OpenRefine project (SoR_metadata)
- Soul of Reason transcript OpenRefine project (RG_9_8_184_01-draft-en-txt)
- OCR text file of “Some Comments on Correggio in Connection with His Pictures in Dresden” by Bernard Berenson OpenRefine project (25515893-txt)
Download the tar.gz files and load them in the Import
tab on OpenRefine’s home screen:
Jupyter Notebooks & Python
For our exploration of programmatic approaches to text analysis with Python, we’ll be using Jupyter Notebook, and the very excellent Constellate tutorials. To ensure that you have the base knowledge required for success in this workshop, it is:
- Required that you complete the Getting Started with Jupyter Notebooks lesson,
- strongly recommended that you work through the Python Basics 1 lesson.
When you have completed the pre-workshop tasks, proceed to the introductory presentation to explore the key concepts.