Incremental Lossless Graph Summarization
Overview
MoSSo is a streaming algorithm for lossless summarization of fully dynamic graphs.
MoSSo has the following advantages:
- Fast and 'any time': processing each change in near-constant time, up to 7-orders of magnitude faster than running state-of-the-art batch methods.
- Scalable: summarizing graphs with hundreds of millions of edges, requiring sub-linear memory during the process.
- Effective: achieving comparable compression ratios even to state-of-the-art batch methods.
MoSSo is described in the following paper:
-
Incremental Lossless Graph Summarization
Jihoon Ko*, Yunbum Kook*, and Kijung Shin
26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (To Appear)
[PDF] [BIBTEX]
Code
The source code used in the paper is available.
[GitHub Repository]
Datasets
Insertion Only Graphs
Name |
#Nodes |
#Edges |
Description |
Source |
Download |
Protein (PR) |
6,229 |
146,160 |
Protein Interaction |
KONECT |
Link |
Email-Enron (EN) |
86,977 |
297,456 |
Email |
Network Repository |
Link |
Facebook (FB) |
61,095 |
614,797 |
Friendship |
Network Repository |
Link |
Web-EU-05 (EU) |
862,664 |
16,138,468 |
Hyperlinks |
LAW |
Link |
Hollywood (HW) |
1,985,306 |
114,492,816 |
Collaboration |
LAW |
Link |
Web-UK-02 (UK) |
18,483,186 |
261,787,258 |
Hyperlinks |
LAW |
Link |
Fully Dynamic Graphs
Name |
#Nodes |
#Additions |
#Deletions |
Description |
Source |
Download |
DBLP (DB) |
317,080 |
1,049,866 |
116,042 |
Protein Interaction |
SNAP |
Link |
YouTube (YT) |
1,134,890 |
2,987,624 |
331,305 |
Friendship |
SNAP |
Link |
Skitter (SK) |
1,696,415 |
11,095,298 |
1,233,226 |
Internet |
SNAP |
Link |
LiveJournal (LJ) |
3,997,962 |
34,681,189 |
3,854,423 |
Friendship |
SNAP |
Link |
People