View on GitHub

Doing Stuff with Data - Learning Resources

Tutorials, how-tos, tips & tricks

Data Cleaning & Text Prep

We’re sorry that we’ll be unable to meet you all in June to explore Data Cleaning and Text Preparation at DSI 2020. Listed below are some suggested introductory readings, lessons, and software packages that may be of interest for those looking to do some self-directed learning on this topic.

Hopefully, we’ll have the opportunity to work with you all in 2021.

Alex Provo, NYU Libraries
Jay Brodeur, McMaster University Library

1. Suggested readings

2. Recommended lessons and useful resources

Below are some recommended self-directed lessons and utilities that cover a variety of approaches using different common tools for data cleaning and text preparation. Some of what you’ll learn here is specifc to the software package being used, while some approaches and concepts can be used in combination (spreadsheets and regular expressions, as an example).

Spreadsheets for data cleaning

OpenRefine

Regular Expressions

Other Software packages

3. Software packages for data cleaning & text preparation

Below is a list of software packages and services that are used for common data cleaning and text preparation applications.

Free/open tools

Paid/freemium/commercial tools: