Event box

Data Carpentry - Organization/Spreadsheets and Cleaning/OpenRefine (Social Science Dataset)

Learn how to organize tabular data, handle date formatting, carry out quality control and quality assurance and export data to use with downstream applications.

A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identified and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis.

This lesson will teach you to use OpenRefine to effectively clean and format data and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.

General Information

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain.

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Online via Zoom

When: Thursday, November 19, 2020. 9:00am - 12:00pm

Requirements: This lesson requires a working copy of spreadsheet software, such as Microsoft Excel or LibreOffice or OpenOffice.org (see more details in “Setup”) and OpenRefine. To most effectively use these materials, please make sure to install everything before working through this lesson.

Materials will be provided in advance of the workshop. If we can help making learning easier for you (e.g. sign-language interpreters, additional accommodations) please get in touch (using contact details below) and we will attempt to provide them.

Workshop organization note: A full Carpentry workshop typically consists of two days of in-person instruction, covering 4 half-day lessons. Due to moving online to maintain safety and compliance with COVID-19 guidelines, we have separated this curriculum into a series of workshops. For the full workshop curriculum, suggested schedules are below:

Data Carpentry lessons use domain-specific data files and examples (though the techniques are widely applicable), so mixing and matching is not recommended:

Data Carpentry - Ecology Example Data
09/17/20 - Data Carpentry (Ecology) - Organization/Spreadsheets and Cleaning/OpenRefine
09/24/20 - Data Carpentry (Ecology) - SQL
09/25/20 - Data Carpentry (Ecology) - R, Part I
10/02/20 - Data Carpentry (Ecology) - R, Part II


Data Carpentry - Social Science Example Data
11/19/20 - Data Carpentry (Social Science) - Organization/Spreadsheets and Cleaning/OpenRefine
12/03/20 - Data Carpentry (Social Science) - SQL
12/04/20 - Data Carpentry (Social Science) - R, Part I
12/11/20 - Data Carpentry (Social Science) - R, Part II


Other Carpentries workshop sessions:

Software Carpentry - Python    
09/04/20 - Unix Shell  or  10/16/20 - Unix Shell

09/11/20 - Python, Part I and
09/18/20 - Python, Part II


10/23/20 - Python, Part I and
10/30/20 - Python, Part II

10/09/20 - Git  or 11/06/20 - Git


Software Carpentry - R    
09/04/20 - Unix Shell  or 10/16/20 - Unix Shell

09/25/20 - R, Part I and
10/02/20 - R, Part II

 or 11/13/20 - R, Part I and
11/20/20, R, Part II
10/09/20 - Git  or 11/06/20 - Git

Related LibGuide: Digital Scholarship Resources by Steven Pryor

Thursday, November 19, 2020
9:00am - 12:00pm
Online Workshop Software Carpentry Workshop – Registration Required

Registration is required. There are 31 seats available.

More Information

Event Organizer

Profile photo of Steven Pryor
Steven Pryor

Steven Pryor
Digital Scholarship Librarian
University of Missouri Libraries