Reuse of Research Data

Properly functioning data search services make the effective reuse of research data possible. Universities, research institutes and other similar organisations have numerous search services for their own data and publications, but there has been no centralised service to date. The Etsin - Research Data Finder fulfills this need. It retrieves the metadata for data from other services, thus making it possible to enter that data directly into the Kata service.

There are several existing metadata services in Finland. For example, the Finnish Social Science Data Archive (FSD) is a metadata service that contains data related to the social sciences. The service also offers a search user interface for conducting effective searches of the service's approximately 1,000 pieces of data. HELDA, the Digital Repository of the University of Helsinki offers a search user interface for university data and publications. There are over 30,000 pieces of data and publications in HELDA.

The Etsin service will be able to retrieve data from both FSD and HELDA, but it will not allow access to publications in HELDA. When other universities and research institutes establish interfaces for the machine retrieval of data, the metadata for this data will then be added to Kata. In addition to serving as a centralised service, Kata will also contain advanced search capabilities for the effective utilisation of large volumes of data.

There are also metadata and data services at various universities and research facilities.

The services offer the use of data under various licenses. Typically, a license is the same for all data in a service, but it can likewise vary for specific datasets. License information is included in the metadata. Also in the Minimum metadata model, developed in the National Research Data Initiative, a specific field is reserved for license information: terms of use. License information can limit the reuse of data, but it can also protect the data from any reuse that the author does not want. Typically, the data may only be intended for educational and research purposes and not, for example, commercial use. Read more on licenses in the Data management planning section.

Restrictions on reuse

It is rare for research data created by someone else to be made available for use without any terms or conditions. Even if the data was found in Etsin, you would still need the author's permission to copy and edit it. In Etsin it may be stated that the data is available for your use without any terms of use, but even then the minimum requirement is that you follow  best scientific practices. This means that you take the work and achievements of other researchers into proper consideration, respecting the effort they have made, referring to their publications in the appropriate manner and giving their achievements the recognition they deserve in your own research work and the publication of its results.

The data may have a license attached to it. In the license, the author has already specified the terms under which you may use the data. Data has various degrees of publicity, i.e. it might be that you are only allowed to copy the data, but not edit it, link it to others or transfer it to a third party.

In some cases, the data may be difficult to understand, sensitive or its use requires special software. When dealing with this kind of data, you must contact the  contact person specialized on the metadata.

Provenance information

Provenance information of research data is information on the origin and processing history of the data: addenda, omissions, software used, etc. At present, the research data services of the Ministry of education and culture do not collect provenance information. The importance of provenance information is emphasised for data in long-term storage.

Provenance information is important for data that will be stored for longer periods for the following reasons:

  • The proper documentation of the manner in which the data was produced ensures that the data will be processed well and the resulting data answers the question it should.
  • Research is more reproducible if the various phases of data analysis are properly documented.
  • The documentation of sources and contributors improves the transparency of the research in that users of the data can independently assess the reliability and impartiality of the data.

Today, when data processing is mainly automatic, the programs and methods used in processing should be included as is in the supplementary materials. Data processing should otherwise be automated as much as possible, in order to make it easy to reproduce research by hand and in peer reviews.

The following is a short checklist for creating provenance information:

  • Have the events in the data production chain been documented?
  • Are the data processing methods available?
  • Has the data change history been recorded?
  • Has the data been linked with its sources?
  • When collecting data, has information that might affect the quality of the data been recorded? For example in measurement data, the calibration of sensors, or in interview data, special features of the interview situation.