Link Search Menu Expand Document

About this workshop

Learning Objectives

By the end of this module, you will be able to:

  • List the common methodological approaches used in text preparation and identify when and how to use them based on source materials and analysis objectives.
  • Use OpenRefine to define subsets of a dataset for further processing and normalize textual data and/or metadata
  • Explain the benefits and challenges of applying a scripted or semi-scripted approach to text preparation and analysis; identify situations where scripting your work will be beneficial.
  • Apply prepared computational techniques to perform common text preparation steps and basic analyses.

Schedule

Activity Time Allotted Key Topics / Activities
Pre-workshop activities
(Self-guided)
60 - 90 minutes (completed before workshop) Install OpenRefine
Introduction to OpenRefine
Introduction to Python and Jupyter Notebooks
Introductory lecture + discussion 15 minutes Introduction to text preparation and analysis
- Why prepare your text?
- Overview of concepts and methods
- Key considerations for different source materials and analyses
OpenRefine Part 1
(hands-on)
20 minutes Introduction to OpenRefine
Manual cleanup (e.g. find and replace)
Faceting
Break 10 minutes Break
OpenRefine Part 1
(hands-on)
20 minutes Stemming and clustering
GREL/regular expressions
Programming & Python
(lecture + hands-on)
20 minutes Overview of programmatic approaches
The ‘what’ and ‘when’ to program
Using Python for text preparation
Final Thoughts
(lecture + discussion)
10 minutes Final thoughts & key considerations
Where to learn more

Head to the Preparation page page to get started with your pre-workshop activities.