Skip to Main Content

Research Data Management

The importance of documentation and creating metadata has been discussed in Managing Active or Working Data and the Plan& Design page.

During the analysis stage it is important to document the stages/steps/actions that were taken to clean, process or analyse the data.

Imperial College outline the following reasons why data should be documented

Data documentation is essential for the reproducibility and replication of research findings and the re-analysis of data. Ensuring that data are adequately documented and described supports research transparency and facilitates data sharing and reuse. Documenting your data also minimises the risk of your data being misused or misinterpreted.


 

To a certain extent OpenRefine can assist with documenting the data cleaning stage. OpenRefine can help with:

  • getting an overview of the data set e.g., applying facets to columns to see number counts of values
  • resolving inconsistencies in a dataset e.g., misspelled terms in the same column
  • split data into more granular parts e.g., splitting cells with multiple values into other columns
  • reconcile terms from the data against external services e.g., using an OpenRefine supported controlled vocabulary

While performing any action with OpenRefine the application will:

  • NOT modify or alter original data, a new spreadsheet will be generated
  • save files locally
  • NOT need an internet connection
  • create a list of actions and alterations made to the data

Further Reading