Reusing Existing Data

Can I use someone else’s data?

The short answer is yes!

One of the ideals of open data is that data is as freely and openly accessible as possible to encourage the reuse of data. Data can be expensive and time consuming to collect, so where possible the reuse of data is encouraged to efficiently use available resources and promote collaboration. The reuse of data is viewed favourably by funders. In fact, the ESRC actually requires the researcher to justify the collection of new data as part of their application for funding. Therefore, before collecting research data you should consider if any suitable data already exists which you could reuse.

(One caveat to this is if you're a doctoral student it may be a requirement of your study programme to collect your own data. Please consult your supervisor.)

If you’re planning on using someone else’s data there are a number of things you should be aware of. You need to be aware of the scope and limitations of the data: e.g. when was it collected, whether it can be used to answer your research question, etc. Does the associated metadata (information describing their data) give you sufficient information for you to make that judgement? You will also need to be aware of the licence under which the data you wish to use has been released. This is to ensure that you can use and share the data as you intend.

Where can I find existing data to reuse?

There are variety of resources to explore for secondary datasets, whether you are looking for new data for a study, verifying your own data, calibrating models or for teaching.

Where you look may be driven by the type of data you are looking for, from general purpose to subject-specific repositories, repository directories, aggregators, portals and search engines. Some examples are listed below (by no means an exhaustive list).

Don’t forget to also make use of any personal contacts, including academics colleagues and supervisors!

Within research articles themselves: 

It’s worth highlighting that you may also find datasets within the relevant publications themselves. There’s an increasing drive from publishers for authors to include information about how to access the underlying research data. So when exploring related literature (e.g. journal articles) you may be able to see how to access (e.g. DOI and other links) and use the data in the published literature itself.

Data journals: 

Another avenue for finding data (and publishing data, for that matter) is data journals. Data journals are journals that publish and share datasets for other people to access and reuse. The figure below from Candela et al. (2015) shows the concept of data journals.

Data Journal examples:

Data repositories:

They are national and international online databases, which contain research data. Typically, this research data can be downloaded and re-used.

The Registry of Research Data Repositories is good place to start to look for data across a variety of subject areas. To use the website (below), select Browse from drop-down menu and then click in area of interest on the wheel. This is a good place to start a search as it provides a global registry of data repositories from different academic disciplines.

Some examples of subject-orientated repositories are:

Library subject pages: 

The Library’s subject pages are a useful place to start

Other places to look for research data:

You may also like to look at other database/repository/search options include:

General purpose repositories: DryadFigshareZenodo - General purpose repositories where you can find data from a wide variety of subject areas.

Government web pages from all countries are a source of public data: (UK) (USA) (Australia), Data.gouv (France) 

Services: Statista (Campus License) - Statistics and data within 600 industries and 50+ countries.

Google Dataset Search - Dataset Search enables users to find data sets stored across the web by way of a simple keyword search.