Detecting Group Anomalies in Terabyte-Scale Multi-Aspect Data

Overview

D-Cube (Disk-based Dense-block Detection) is an algorithm for detecting dense subtensors in web-scale tensors.
D-Cube has the following properties:

Paper

D-Cube is described in the following papers:


Code

The source code used in the papers is available. [Github Repository]

Datasets

Name Size #Tuples Source Download
Rating Data (User, Item, Timestamp, Rating, 1)
Yelp 552K X 77.1K X 3.80K X 5 2.23M Yelp Link
Android 1.32M X 61.3K X 1.28K X 5 2.64M UCSD Link
SWM 967K X 15.1K X 1.38K X 5 1.13M ODDS
Netflix480K X 17.8K X 2.18K X 599.1M Netflix Link
YahooM.1.00M X 625K X 84.4K X 101253M Yahoo Labs Link
Wikipedia Revision History (User, Page, Timestamp, Time, #Revisions)
KoWiki470K X 1.18M X 101K11.0M Wikimedia Link
EnWiki44.1M X 38.5M X 129K483M Wikimedia Link
Temporal Social Network (User, User, Timestamp, #Interactions)
Youtube3.22M X 3.22M X 20318.7M MPI SWS Link
SMS1.25M X 7.00M X 4.39K 103M NDA NDA
TCP Dump (Src IP, Dst IP, Timestamp, #Connections)
DARPA9.48K X 23.4K X 46.6K522K Lincoln Lab Link
TCP Dump (Protocol, Service, Flags, Src Bytes, Dst Bytes, Counts, Srv Counts, #Connections)
AirForce3 X 70 X 11 X 7.20K X 21.5K X 512 X 512648K UCI KDD Archive Link

People