Enron words
This is the bipartite document–word dataset of Enron words. Left nodes are
documents and right nodes are words. Edge weights are multiplicities.
Metadata
Statistics
| Size | n = | 67,960
|
| Left size | n1 = | 39,861
|
| Right size | n2 = | 28,099
|
| Volume | m = | 6,412,172
|
| Unique edge count | m̿ = | 3,710,420
|
| Wedge count | s = | 3,214,624,476
|
| Claw count | z = | 2,510,007,422,598
|
| Cross count | x = | 2,191,825,474,071,012
|
| Square count | q = | 45,471,014,642
|
| 4-Tour count | T4 = | 376,634,510,028
|
| Maximum degree | dmax = | 7,190
|
| Maximum left degree | d1max = | 2,120
|
| Maximum right degree | d2max = | 7,190
|
| Average degree | d = | 188.704
|
| Average left degree | d1 = | 160.863
|
| Average right degree | d2 = | 228.199
|
| Fill | p = | 0.003 312 71
|
| Average edge multiplicity | m̃ = | 1.728 15
|
| Size of LCC | N = | 67,960
|
| Diameter | δ = | 6
|
| 50-Percentile effective diameter | δ0.5 = | 2.492 21
|
| 90-Percentile effective diameter | δ0.9 = | 3.606 21
|
| Median distance | δM = | 3
|
| Mean distance | δm = | 2.992 72
|
| Gini coefficient | G = | 0.707 894
|
| Balanced inequality ratio | P = | 0.224 254
|
| Left balanced inequality ratio | P1 = | 0.225 645
|
| Right balanced inequality ratio | P2 = | 0.156 346
|
| Relative edge distribution entropy | Her = | 0.897 344
|
| Power law exponent | γ = | 1.269 14
|
| Tail power law exponent | γt = | 1.991 00
|
| Degree assortativity | ρ = | −0.174 109
|
| Degree assortativity p-value | pρ = | 0.000 00
|
| Spectral separation | |λ1[A] / λ2[A]| = | 1.700 69
|
| Controllability | C = | 14,724
|
| Relative controllability | Cr = | 0.216 657
|
Plots
Matrix decompositions plots
Downloads
References
|
[1]
|
Jérôme Kunegis.
KONECT – The Koblenz Network Collection.
In Proc. Int. Conf. on World Wide Web Companion, pages
1343–1350, 2013.
[ http ]
|
|
[2]
|
M. Lichman.
UCI Machine Learning Repository, 2013.
[ http ]
|