Learn More
OpenRefine
- Library Carpentry lesson on OpenRefine
- University of Toronto Libraries OpenRefine tutorials
- OpenRefine Manual on Regular Expressions
- Using regular expressions in OpenRefine: Tutorial by Peter Green, includes non-Latin script.
- Regular expression testers
- https://www.regular-expressions.info/
- https://regex101.com/
- Regexr: Interactive regular expression (regex) coder and explainer
Python
Python Integrated Development Environments
- There are many, many different Python IDEs. Find which one is best for you. Jay is partial to Pyzo.
Python packages for text prep and Natural Langauge Processing
- PyTesseract: Simple Python Optical Character Recognition
- spaCy NLP library and documentation
- NLTK NLP library and docmentation
- natas: Library for processing historical English corpora, especially for studying neologisms
- Python phonetics package, which includes methods for matching and clustering words by phonetic similarity
- pyspellchecker: A simple Python-based spell checking algorithm
Other tutorials and resources
- Constellate a comprehensive set of resources to learn how to build your text and data mining skills.
- How to Clean Text for Machine Learning with Python. An excellent step-by-step walkthrough of the fundamentals of text prep with Python.
- Python Regex (Regular Expressions) for Data Scientists
- Cleaning OCR’d text with Regular Expressions by Laura Turner O’Hara for The Programming Historian.
- Natural Language Processing With Python’s NLTK Package: An excellent end-to-end tutorial using the nltk package
- Natural Language Processing with Python: Introduction. This is an excellent step-by-step introduction to basic pre-processing steps (though no clustering or error find/replace)
- Using Binder to connect GitHub repositories to Jupyter Notebooks