The following datasets are licensed under a Creative Commons Attribution 3.0 Unported License.
The dataset contains sampled anonymized action and reaction events for posts that were created between October 15, 2014 and February 11, 2015 and received at least one reaction. In order to preserve privacy, timestamps were slightly perturbed and user and post ids were anonymized using custom fingerprint functions. The timestamps are Unix epoch time in milliseconds.
- For Facebook, the dataset contains post and comment event timestamps.
- For Twitter, the dataset contains tweet and retweet event timestamps.
Count | ||
---|---|---|
Users | 1,067,026 | 1,423,445 |
Posts | 25,937,525 | 119,435,659 |
Reactions | 104,364,591 | 1,192,210,822 |
Sample | 1K Sample | 1K Sample |
Download | 2.3GB tar.gz | 24GB tar.gz 1 2 3 4 5 6 |
The data is encoded as UTF-8 text in tab-separated format and compressed with Gzip. The columns in the dataset are defined as:
- user_id_fingerprint (signed int64)
- actor_id_fingerprint (signed int64)
- post_id_fingerprint (signed int64)
- post_timestamp (Unix epoch time in milliseconds)
- action_timestamp (Unix epoch time in milliseconds)
- user_timezone (Freebase timezone name eg. Central Standard Time))
The dataset is divided into sets grouped by the date of the post creation.
Paper (acm, arxiv), Video, Slides
If you use the dataset, please cite:
Nemanja Spasojevic, Zhisheng Li, Adithya Rao, Prantik Bhattacharyya,
When-To-Post on Social Networks,
Proceedings of ACM Conference on Knowledge Discovery and Data Mining (KDD), 2015.
BibTex:
@inproceedings{Spasojevic:when-to-post,
author = {Spasojevic, Nemanja and Li, Zhisheng and Rao, Adithya and Bhattacharyya, Prantik},
title = {When-To-Post on Social Networks},
booktitle = {Proc. of ACM Conference on Knowledge Discovery and Data Mining (KDD)},
series = {KDD '15},
year = {2015}
}