Reuse of Research Data
Properly functioning data search services make the effective reuse of research data possible. Universities, research institutes and other similar organisations have numerous search services for their own data and publications, but there has been no centralised service to date. The Etsin - Research Data Finder fulfills this need. It retrieves the metadata for data from other services, thus making it possible to enter that data directly into the Kata service.
There are several existing metadata services in Finland. For example, the Finnish Social Science Data Archive (FSD) is a metadata service that contains data related to the social sciences. The service also offers a search user interface for conducting effective searches of the service's approximately 1,000 pieces of data. HELDA, the Digital Repository of the University of Helsinki offers a search user interface for university data and publications. There are over 30,000 pieces of data and publications in HELDA.
The Etsin service will be able to retrieve data from both FSD and HELDA, but it will not allow access to publications in HELDA. When other universities and research institutes establish interfaces for the machine retrieval of data, the metadata for this data will then be added to Kata. In addition to serving as a centralised service, Kata will also contain advanced search capabilities for the effective utilisation of large volumes of data.
There are also metadata and data services at various universities and research facilities.
Restrictions on reuse
The data may have a license attached to it. In the license, the author has already specified the terms under which you may use the data. Data has various degrees of publicity, i.e. it might be that you are only allowed to copy the data, but not edit it, link it to others or transfer it to a third party.
In some cases, the data may be difficult to understand, sensitive or its use requires special software. When dealing with this kind of data, you must contact the contact person specialized on the metadata.
Provenance information of research data is information on the origin and processing history of the data: addenda, omissions, software used, etc. At present, the research data services of the Ministry of education and culture do not collect provenance information. The importance of provenance information is emphasised for data in long-term storage.
Provenance information is important for data that will be stored for longer periods for the following reasons:
- The proper documentation of the manner in which the data was produced ensures that the data will be processed well and the resulting data answers the question it should.
- Research is more reproducible if the various phases of data analysis are properly documented.
- The documentation of sources and contributors improves the transparency of the research in that users of the data can independently assess the reliability and impartiality of the data.
Today, when data processing is mainly automatic, the programs and methods used in processing should be included as is in the supplementary materials. Data processing should otherwise be automated as much as possible, in order to make it easy to reproduce research by hand and in peer reviews.
The following is a short checklist for creating provenance information:
- Have the events in the data production chain been documented?
- Are the data processing methods available?
- Has the data change history been recorded?
- Has the data been linked with its sources?
- When collecting data, has information that might affect the quality of the data been recorded? For example in measurement data, the calibration of sensors, or in interview data, special features of the interview situation.