D-Cube

Overview

D-Cube (Disk-based Dense-block Detection) is an algorithm for detecting dense subtensors in web-scale tensors.
D-Cube has the following properties:

Scalable: D-Cube handles large data not fitting in memory or even on a disk.
Fast: Even when data fit in memory, D-Cube outperforms its competitors in terms of speed.
Accurate: D-Cube detects dense subtensors in real-world tensors accurately, providing theoretical accuracy guarantees.

D-Cube is described in the following papers:

D-Cube: Dense-Block Detection in Terabyte-Scale Tensors
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos.
The 10th ACM International Conference on Web Search and Data Mining (WSDM) 2017, Cambridge, UK
[PDF] [Supplementary Document] [BIBTEX]
Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos.
Frontiers in Big Data, 2021
[PDF] [Supplementary Document] [BIBTEX]

The source code used in the papers is available. [Github Repository]

Name	Size	#Tuples	Source	Download
Rating Data (User, Item, Timestamp, Rating, 1)
Yelp	552K X 77.1K X 3.80K X 5	2.23M	Yelp	Link
Android	1.32M X 61.3K X 1.28K X 5	2.64M	UCSD	Link
SWM	967K X 15.1K X 1.38K X 5	1.13M	ODDS
Netflix	480K X 17.8K X 2.18K X 5	99.1M	Netflix	Link
YahooM.	1.00M X 625K X 84.4K X 101	253M	Yahoo Labs	Link
Wikipedia Revision History (User, Page, Timestamp, Time, #Revisions)
KoWiki	470K X 1.18M X 101K	11.0M	Wikimedia	Link
EnWiki	44.1M X 38.5M X 129K	483M	Wikimedia	Link
Temporal Social Network (User, User, Timestamp, #Interactions)
Youtube	3.22M X 3.22M X 203	18.7M	MPI SWS	Link
SMS	1.25M X 7.00M X 4.39K	103M	NDA	NDA
TCP Dump (Src IP, Dst IP, Timestamp, #Connections)
DARPA	9.48K X 23.4K X 46.6K	522K	Lincoln Lab	Link
TCP Dump (Protocol, Service, Flags, Src Bytes, Dst Bytes, Counts, Srv Counts, #Connections)
AirForce	3 X 70 X 11 X 7.20K X 21.5K X 512 X 512	648K	UCI KDD Archive	Link