Link Search Menu Expand Document

Preparation

Preparation for this workshop will take about 60 to 90 minutes, depending on your experience with the tools used during this session.
To get the most of the workshop, it is very important that you complete the OpenRefine and Jupyter Notebook setup, and at least work through some of the introductory modules.

OpenRefine

For this workshop, you will need OpenRefine and a web browser. Follow the instructions provided by the Library Carpentry to install OpenRefine on your system (whether it is Windows, Mac, or Linux).

Once you have installed OpenRefine, please complete modules 1-4 of the Library Carpentry OpenRefine lesson to familiarize yourself with basic OpenRefine operations.

To save time during the workshop, please create the demonstration projects ahead of time.

Project 1: Soul of Reason metadata

  1. Download the Soul of Reason metadata
  2. On the OpenRefine home screen, select the Create Project tab, upload the CSV, and click “Next”
  3. On the preview screen, ensure that:
    • Character encoding is set to UTF-8
    • Columns are separated by is set to comma
    • Trim leading & trailing whitespace... is checked
    • Parse next 1 line(s) as column headers is checked
    • Use character " to enclose cells... is checked
    • Store blank rows is checked
    • Store blank cells as nulls is checked
  4. Click Create Project

Configuration for creating the metadata project

Project 2: Soul of Reason transcript

  1. Download the Soul of Reason transcript
  2. On the OpenRefine home screen, select the Create Project tab, upload the txt file, and click “Next”
  3. On the preview screen, ensure that:
    • Parse data as is set to Line-based text files
    • Character encoding is set to UTF-8
    • Store blank rows is unchecked
    • Store blank cells as nulls is checked
    • Note that you can import multiple text files into a single OpenRefine project; make sure to check “Store file source” when creating the project so that you can retain from which file data originates.
  4. Click Create Project

Configuration for creating the transcript text project

Project 3: OCR text file

  1. Download the OCR text file of “Some Comments on Correggio in Connection with His Pictures in Dresden” by Bernard Berenson. Article downloadable from JSTOR
  2. On the OpenRefine home screen, select the Create Project tab, upload the txt file, and click “Next”
  3. On the preview screen, ensure that:
    • Parse data as is set to Line-based text files
    • Character encoding is set to UTF-8
    • Store blank rows is unchecked
    • Store blank cells as nulls is checked
  4. Click Create Project

Configuration for creating the OCR text project

If you prefer, you may also open pre-made OpenRefine projects provided at the links below:

  • Soul of Reason metadata OpenRefine project (SoR_metadata)
  • Soul of Reason transcript OpenRefine project (RG_9_8_184_01-draft-en-txt)
  • OCR text file of “Some Comments on Correggio in Connection with His Pictures in Dresden” by Bernard Berenson OpenRefine project (25515893-txt)

Download the tar.gz files and load them in the Import tab on OpenRefine’s home screen:

OpenRefine import tab

Jupyter Notebooks & Python

For our exploration of programmatic approaches to text analysis with Python, we’ll be using Jupyter Notebook, and the very excellent Constellate tutorials. To ensure that you have the base knowledge required for success in this workshop, it is:


When you have completed the pre-workshop tasks, proceed to the introductory presentation to explore the key concepts.