Skip to Main Content

Research Data Management

Data organisation and machine readability

Data organisation is an important part of maintaining research data and making the data human and machine findable. Data organisation may also be known as 'Tidy Data', particularly if R or other programming languages are being used to analyse or visualise the data.


 

The elements of data organisation that must be decided upon and used consistently are:

 

File naming

File and folder names should be carefully considered to promote quick at-a-glance understanding of the contents for humans, but also readability for machines that will work well with default ordering. There much advice on the internet on file naming, this advice is taken from a blog entry called 'Naming conventions and folder structures'.

  • Decide and document with your collaborators on the file naming conventions to be adopted, and then remain consistent to the format.
  • Decide what is the most important element for you and your collaborators to see first
    • For example dates or description of the contents?
  • Use reverse dates e.g., YYYYMMDD (e.g., 20240402) or YYYY-MM-DD or YYMMDD.
    • This format is an ISO standard and should be used in the format Year-Month-Day to remove uncertainty around dates due to cultural expressions of dates e.g., 04/02/2024 could be 4 February 2024 to UK/Australians or 2 April 2024 to the US
  • Do not use spaces or special characters/punctuation in file names (e.g. “!#*^%@|’?/<>)
  • Try and keep file names shorter than 50 characters
  • To promote human readability
    • Break file name into descriptive elements such as 'Date-Description' e.g., 20240402_lab-report1
    • Use hyphens and CamelCase within elements e.g., 2024-04-02_LabReport1
  • If using numbers to differentiate multiple similar file names, the default computer sorting of numbers will order numbers in ways that users may find difficult to find the correct file
    • E.g., 1, 2, 3, 4, 5-10, 11-20, 21-30, 31-40 default ordering will display as 1, 11-20, 2, 21-30, 3, 31-40, 4
    • To work with the computer's default ordering numbers should appear as 01, 02, 03, 04, 05-10.
      If it is known that there may be numbering in the hundreds, the numbers should appear as 001, 002, 003 etc.
 

File structure

Every project will have their own requirements and there is not necessarily a one-size-fits-all recommendation for research project file structures. Searching the internet for advice will yield results pertaining to computer programming and business project management, which may provide useful for your research project or preferred way of working. 

This blog article 'Setting up an organised folder structure for research projects' outlines one person's approach to file structures, and also provides a zip file download of their folder structure, which is in line with advice from other sites.

  • Start with top level project folder
    • Numerous mid level folders such as ProjectManagement,
      • Numerous lower level folders such as Proposal, Finance, Reports

 

Versioning

File versioning is important to maintain data integrity and document management in collaborations or iterative work environments. Versioning involves:

  • Documenting changes: showing a history of edits, and who made those changes. Documenting this information is crucial for accountability and understanding the evolution of a document.
    • If you are using Microsoft 365 versioning is enabled and will track history, restore previous versions and view a version of a document.
  • Error recovery: using file versions provided by file storage options such as SharePoint or OneDrive provided by the University, previous versions of the document can be recovered if important information is lost. However, accessing backed up versions from other sources may be required if a document becomes corrupted.
  • Collaborations: If a project has multiple team members that are working on the same document, it is important to ensure that everyone is working on the same document. Ensure that the file name of a document includes a version number e.g., 2024-04-02_LabReport01_v2

 

Resources