Toy Father

A toy dataset for predicting who is the father of whom.

Toy Cancer

A toy dataset for predicting properties of social networks: whether a person has cancer based on friendships and smoking habits.

IMDB: Internet Movie Database

Internet Movie Database (IMDB) is an online database of movies, television shows, etc. The goal is to predict whether someone is female.

Cora: Citation Matching

Cora is a dataset based on citations in scientific papers, the goal is to match citation information.


WebKB is a dataset consisting of web pages and hyperlinks from four computer science departments: Cornell University, The University of Texas, The University...

CiteSeer: Citation Matching

A relational dataset consisting of publication citations for Alchemy. This version has modifications to work with BoostSRL.

Financial NLP

A relational dataset extracted from SEC Form S-1 documents and the task is to predict words present in a sentence.

NELL Sports

A relational dataset consisting of players and teams, prediction task is whether a team plays a particular sport.

ICML Co-authors

A relational dataset of publication data from ICML 2018, task is to predict co-authors.