Available datasets
Throughout the course I will do my best to illustrate network-related ideas using various sources of data, e.g. from organizational domains to pupils’ interactions and from well-known and “modal” (typically used in tutorials) to those found in recent academic and non-academic publications and collected by me. Still, for the exam project and 3 out of 4 home assignments which involve programming, you will need to pick a dataset to work with. The list below provides some basic sources of data which you can use for your investigations:
Other highly recommended platforms include GitHub and similar data-sharing sites (see the list below). Many enthusiasts collect and upload network data there, and sometimes these authors also provide accompanying learning materials.
R packages, like “manynet” and, especially, “networkdata”, assemble many network datasets and provide easy access to them. Be aware that the “networkdata” package can take a while to install, as it includes nearly a thousand datasets. I recommend installing both, as I will sometimes use data from these libraries during practical sessions.
The UCI Network Data Repository is another wonderful source, where datasets are tagged by their subject and the phenomena they represent. Please note that some files may be in atypical formats, so you might need to convert them into an R-suitable format for your work.
Datasets provided by UCINET software developers are in Pajek format (these can be used in R with some manipulation), but you will likely find most of these networks in aforementioned “networkdata” package.
Other highly recommended platforms include github and similar data-sharing sites (see some more in the next list). Many enthusiasts collect and upload network data there, and sometimes these authors also provide accompanying learning materials, e.g. here.
Obviously, these sources do not cover the full variety of networks available. Here are some examples of datasets that previous generation of students has found on their own:
Networks of Deezer users in three countries, available at kaggle.com
A collection of networks based on character interactions in Russian drama literature (similar collections are also available for other languages and literatures).
Russian-European literature connections of 18 century, another literature-related source.
Finally, remember that you can collect data yourself. During the course, I will show you examples of not-too-large datasets collected in seconds using the web-scraping techniques.