Data organisation is an important part of maintaining research data and making the data human and machine findable. Data organisation may also be known as 'Tidy Data', particularly if R or other programming languages are being used to analyse or visualise the data.
The elements of data organisation that must be decided upon and used consistently are:
File naming
File and folder names should be carefully considered to promote quick at-a-glance understanding of the contents for humans, but also readability for machines that will work well with default ordering. There much advice on the internet on file naming, this advice is taken from a blog entry called 'Naming conventions and folder structures'.
Decide and document with your collaborators on the file naming conventions to be adopted, and then remain consistent to the format.
Decide what is the most important element for you and your collaborators to see first
For example dates or description of the contents?
Use reverse dates e.g., YYYYMMDD (e.g., 20240402) or YYYY-MM-DD or YYMMDD.
This format is an ISO standard and should be used in the format Year-Month-Day to remove uncertainty around dates due to cultural expressions of dates e.g., 04/02/2024 could be 4 February 2024 to UK/Australians or 2 April 2024 to the US
Do not use spaces or special characters/punctuation in file names (e.g. “!#*^%@|’?/<>)
Try and keep file names shorter than 50 characters
To promote human readability
Break file name into descriptive elements such as 'Date-Description' e.g., 20240402_lab-report1
Use hyphens and CamelCase within elements e.g., 2024-04-02_LabReport1
If using numbers to differentiate multiple similar file names, the default computer sorting of numbers will order numbers in ways that users may find difficult to find the correct file
To work with the computer's default ordering numbers should appear as 01, 02, 03, 04, 05-10.
If it is known that there may be numbering in the hundreds, the numbers should appear as 001, 002, 003 etc.
File structure
Every project will have their own requirements and there is not necessarily a one-size-fits-all recommendation for research project file structures. Searching the internet for advice will yield results pertaining to computer programming and business project management, which may provide useful for your research project or preferred way of working.
Numerous mid level folders such as ProjectManagement,
Numerous lower level folders such as Proposal, Finance, Reports
Versioning
File versioning is important to maintain data integrity and document management in collaborations or iterative work environments. Versioning involves:
Documenting changes: showing a history of edits, and who made those changes. Documenting this information is crucial for accountability and understanding the evolution of a document.
If you are using Microsoft 365 versioning is enabled and will track history, restore previous versions and view a version of a document.
Error recovery: using file versions provided by file storage options such as SharePoint or OneDrive provided by the University, previous versions of the document can be recovered if important information is lost. However, accessing backed up versions from other sources may be required if a document becomes corrupted.
Collaborations: If a project has multiple team members that are working on the same document, it is important to ensure that everyone is working on the same document. Ensure that the file name of a document includes a version number e.g., 2024-04-02_LabReport01_v2
Microsoft Support. (2019). How versioning works in lists and libraries. https://support.microsoft.com/en-au/office/how-versioning-works-in-lists-and-libraries-0f6cd105-974f-44a4-aadb-43ac5bdfd247
UK Data Service. (n.d.). Versioning. https://ukdataservice.ac.uk/learning-hub/research-data-management/format-your-data/versioning/