The construction and study of systems that can extract useful knowledge from the massively growing data have become extremely challenging. The conventional knowledge extraction tools, which handle computations on matrices/graphs, do not typically scale to extremely large data sizes. Major difficulties particularly arise when the data correlations are dense like in the case of large scale Internet security and malware analysis: the underlying matrix/graphs cannot be fit into a single machine or, effectively partitioned, parallelized, and communicated within multiple processing units (i.e., system’s bandwidth limitations.)
show details