Page content

Organising your data

It is very easy for research (and any other) data to become disorganised, and data organisation isn’t going to be top of your list of priorities.

However data organisation is time well spent.

Below you can find some guidance on:

  • File formats

    There are important things to consider when choosing a file format for digital data, and the choice should be planned early in the research cycle to ensure that the format suits all purposes that might be necessary.

    Formats for long-term accessibility

    When thinking about long-term accessibility and usability of research data, sustainable digital file formats and software are needed. For many formats, there is a danger that they will become obsolete in the future, which would make the data impossible to read and interpret.

    Despite the backward compatibility of many software packages to import data created in previous software versions and the interoperability between competing popular software programmes, the safest option to guarantee long-term data access is to convert data to standard or open formats.

    Not only can most software packages interpret these, but they are also suitable for data interchange and transformation, and are likely to stand a better chance of being reused well into the future.

    Information on the file formats recommended by the UK Data Archive for long-term preservation

    File formats can be proprietary or open

    • Proprietary formats are owned by a company that claims intellectual property rights for the use of the software by granting licenses. Standard formats include the widely used proprietary Microsoft Office software products, (MS Word, Rich Text Format and MS Excel), or the popular SPSS format. These are likely to have long-term sustainability as they are so widely used.
    • Examples of open file formats are PDF/A, CSV, TIFF, OpenDocument Format (ODF), ASCII, tab-delimited format, comma-separated values and XML.
    • File formats can also be lossy or lossless. Lossy formats save space by removing detailed information that is assumed to be unimportant. For example, the lossy format JPEG removes fine detail in images, whilst the lossless format TIFF keeps all the detail. Also, repeatedly editing and saving files in lossy format results in a greater loss of information.

    Is the format suitable for conversion?

    While researchers will use the most suitable data formats and software according to planned analyses during their research, once data analysis is completed and data are to be prepared for long-term storing, data conversion must be considered. Using open, standard, interchangeable and longer-lasting formats, avoids being unable to use the data in the future. This is also recommended for any backups. For long-term digital preservation, data centres and archives hold data in open and standard formats.

  • Versioning

    It can be difficult to locate a correct version or to know how versions differ after some time has elapsed.  A suitable version control strategy depends on whether files are used by single or multiple users, in one or multiple locations, and whether versions across users or locations need to be synchronised or not, so that if information in one location is altered, the related information in other locations is also updated.

    Version control can be done through the following:

    • The date recorded in the file name or within the file, for example, HealthTest-2008-04-06.
    • Version numbering in the file name, for example, HealthTest-00-02 or HealthTest_v2.
    • A file history, version control table or notes included within a file, where versions, dates, authors and details of changes to the file are recorded.
    • Version control facilities within the software used.
    • Using versioning software, e.g. Subversion.
    • Using file-sharing services, such as Dropbox or Google Docs.
    • Controlling rights to file-editing.
    • Manual merging of entries or edits by multiple users.

    Examples of versioning control from UK Data Service.

  • Documentation

    Why is it important?

    Good documentation makes material understandable, verifiable, and reusable.

    Just making data available to others does not make it usable or useful. If you or others come back to your data in some time they need this documentation to understand for example, information about when, why, and by whom the data was created, what methods were used, and explanation of acronyms, or jargon.

    Creating good metadata is part of good practice in research data management.

Metadata is a critical component of  your data documentation

Metadata are data about data. Metadata allows research data to become findable, accessible, interoperable and reusable - by humans and machines.

Metadata have to be added continuously to your research data, not just at the beginning or at the end of a project.

Three types of metadata you should maintain for your research data

Types of Metadata
TypeDescription
Administrative metadata

Data about a project or resource that are relevant for managing it; for example, project/ resource owner, principal investigator, project collaborators, funder, project period. They are usually assigned to the data, before you collect or create them.

Descriptive or citation metadata

data about a dataset or resource that allow people to discover and identify it; for example, authors, title, abstract, keywords, persistent identifier, related publications.

Structural metadata Data about how a dataset or resource came about, but also how it is internally structured. Structural metadata describe, for example, the unit of analysis, collection method, sampling procedure, sample size, categories, variables.

Storing active data during your research

What is Active Data?

Active research data is data that is currently being used, or is planned to be used in the near future.

Backing up your active research data held in electronic form is an essential part of research data management. To 'back-up' data is to make and store copies of your data in more than one place. This means that if one copy fails e.g. the copy on your laptop, there are still other copies available.

University guidance on data back-up

Critical parameters to adhere to when backing up your research data.

  • All original, irreplaceable electronic project data and electronic data from which individuals might be identified must be stored on University-supported media, preferably appropriate centrally-allocated secure server space or similar; such data must never be stored on portable devices or temporary storage media.
  • All other electronic project data must be held on appropriate centrally-allocated secure server space which is accessible to members of the project team; such data must not be held on personal or portable devices unless these are encrypted in line with University requirements and except when this is necessary for the purposes of working off-site; amended documents must be returned to the appropriate University-maintained shared space when the work has been completed.
  • Under no circumstances should original, irreplaceable data or sensitive personal data be stored using cloud storage services as this can place data outside UK and EU legal control.
Suitable for storing active research dataNot suitable for storing active research data
Faculty/School storage systemsExternal hard drives and USB sticks
Network drivesLocal storage that is not backed-up
SharePointThird party cloud storage
  • How often should I back up my data?

    Consider how often you make changes to your data, and which amount of changed data you are prepared to lose between backups. Consider backing up after each change to a data file or at regular intervals, such as daily or weekly.

  • How many copies should I make?

    Most back-up policies would recommend having at least three copies of the data, with at least one being stored offsite.

  • How should I organise my backups?

    If you are making your own backups on removable media, make sure they are well-labelled, indicating the content and date/time, and well-organised. Without some management, achieving the ultimate aim of restoring lost data may prove difficult.

Data Security

Ensuring the security of data requires paying attention to physical security, network security, plus the security of computer systems and files to prevent unauthorised access or unwanted changes to data, disclosure or destruction of data.

Researchers should think carefully about their research processes to ensure data is stored and transferred securely throughout the research project.

This might involve:

  • Physical security
    • Ensuring physical data is stored in secure locations and digitised where appropriate.
    • Logging the removal of, and access to, media or hardcopy material in storerooms.
  • Network security
    • Not storing confidential data, such as those containing personal information on servers or computers connected to an external network, particularly servers that host internet services.
    • Firewall protection, security-related upgrades and patches to operating systems to avoid viruses, trojans and malicious codes.
  • Computer systems and files
    • Anonymising data, or pseudo-anonymising data and storing the key code in a separate location to the data.
    • Implementing password protection of, and controlled access to, individual data files, for example, allocating ‘no access’, ‘read only’, ‘read and write’ or ‘administrator only permissions.
    • Ensure that you lock your computer if you leave it temporarily unattended by pressing Ctrl-Alt-Del and clicking on 'Lock Computer'.
    • Not sending personal or confidential data via email. This should be encrypted and sent via a secure means, not email.
    • Destroying data in a consistent and robust manner when needed.

Ulster University Information Services Directorate maintains a range of policies, standards and guidelines for the secure and reliable delivery of services across the University.

A simplified and practical overview for staff on how these policies can help to protect University information and handling electronic data is available in a staff information handbook.