GoTriple

Abstract

These datasets has been used to evaluate the EXODuS approach: EXploratory OLAP over Document Stores. - The games dataset has been collected by Sports Reference LLC. It contains around 32K nested documents representing NBA games in the period 1985-2013. Each document represents a game between two teams with at least 11 players each. It contains 47 attributes; 40 of them are numeric and represent team and player results. - The DBLP dataset contains 2M documents scraped from DBLP in XML format and converted into JSON. Documents are flat and represent eight kinds of publications including conference proceedings, journal articles, books, thesis, etc. The third portion of the dataset represent author pages, containing half the number of fields compared to other kinds. So, documents have shared attributes such as title, author, type, year and unshared ones such as journal and booktitle. - The Twitter dataset contains 2M tweets scraped from the Twitter API. Each document represents a tweet message and its metadata, which contains some nested objects: a user object that represent the author of the tweet, a place object that gives its location and a retweet object if it is a reply. The dataset is heterogeneous and mixes between tweets and documents of an API call for tweet deletes. The sources of the datasets are listed in the Related links Section.