Selected Publications

Efficient Calculation of Triangle Centrality in Big Data Networks

  • The notion of “centrality” within graph analytics has led to the creation of well-known metrics such as Google's Page Rank [1], which is an extension of eigenvector centrality [2].
  • Triangle centrality is a related metric [3] that utilizes the presence of triangles, which play an important role in network analysis, to quantitatively determine the relative “importance” of a node in a network. Efficiently counting and enumerating these triangles are a major backbone to understanding network characteristics, and linear algebraic methods have utilized the correspondence between sparse adjacency matrices and graphs to perform such calculations, with sparse matrix-matrix multiplication as the main computational kernel.
  • In this paper, we use an intersection representation of graph data implemented as a sparse matrix, and engineer an algorithm to compute the triangle centrality of each vertex within a graph.
  • The main computational task of calculating these sparse matrix-vector products is carefully crafted by employing compressed vectors as accumulators. As with other state-of-the-art algorithms [4], our method avoids redundant work by counting and enumerating each triangle exactly once. We present results from extensive computational experiments on large-scale real-world and synthetic graph in-stances that demonstrate good scalability of our method.
  • We also present a shared memory parallel implementation of our algorithm.

A Sparse Matrix Approach for Covering Large Complex Networks by Cliques

  • A classical NP-hard problem is the Minimum Edge Clique Cover (minECC) problem, which is concerned with covering the edges of a network (graph) with the minimum number of cliques.
  • There are many real-life applications of this problem, such as in food science, computational biology, efficient representation of pairwise information, and so on.
  • Borrowing ideas from [8], we propose using a compact representation, the intersection representation, of network data and design an efficient and scalable algorithm for minECC.
  • Edges are considered for inclusion in cliques in degree-based orders during the clique construction step.
  • The intersection representation of the input graph enabled efficient computer implementation of the algorithm by utilizing an existing sparse matrix package [11].
  • We present results from numerical experiments on a representative set of real-world and synthetically constructed benchmark graph instances.
  • Our algorithm significantly outperforms the current state-of-the-art heuristic algorithm of [4] in terms of the quality of the edge clique covers returned and running time performance on the benchmark test instances.
  • On some of the largest graph instances whilst existing heuristics failed to terminate, our algorithm could finish the computation within a reasonable amount of time.

Intersection Representation of Big Data Networks and Triangle Enumeration

  • Triangles are an essential part of network analysis, representing metrics such as transitivity and clustering coefficient.
  • Using the correspondence between sparse adjacency matrices and graphs, linear algebraic methods have been developed for triangle counting and enumeration, where the main computational kernel is sparse matrix-matrix multiplication.
  • In this paper, we use an intersection representation of graph data implemented as a sparse matrix, and engineer an algorithm to compute the “k-count” distribution of the triangles of the graph.
  • The main computational task of computing sparse matrix-vector products is carefully crafted by employing compressed vectors as accumulators.
  • Our method avoids redundant work by counting and enumerating each triangle exactly once.
  • We present results from extensive computational experiments on large-scale real-world and synthetic graph instances that demonstrate good scalability of our method.
  • In terms of run-time performance, our algorithm has been found to be orders of magnitude faster than the reference implementations of the miniTri data analytics application [18].

Intersection Representation of Big Data Networks and Triangle Counting

  • Triangles are an essential part of network analysis, representing metrics such as transitivity ratio and clustering coefficient because of its diverse applications, enumeration and counting of triangles in large networks has been extensively studied, and continues to draw much interest from many different fields.
  • This has only increased with the introduction of approximate counting, parallel and distributed implementations, and restricted and streaming data access scenarios.
  • We propose a compact and efficient representation of network data based on the intersection of edge labels, and use sparse matrix data structures for its computer implementation.
  • We then present a scalable algorithm that uses this structure to count triangles.
  • On a set of large (the largest with more that 3.6 billion edges) real-world and synthetic networks, our algorithm performs significantly better than the reference implementation miniTri.

Covering Large Complex Networks by Cliques---A Sparse Matrix Approach

  • The Edge Clique Cover (ECC) problem is concerned with covering edges of a graph with the minimum number of cliques, which is an NP-hard problem.
  • This work proposes using a compact representation of network data based on sparse matrix data structures. Building upon an existing ECC heuristic due to Kellerman, we proffer adding vertices during the clique-growing step of the algorithm in judiciously chosen degree-based orders.
  • Our vertex-ordered approach produced a smaller clique cover on a set of standard benchmark instances compared to unordered processing.

A Sparse Matrix Approach for Covering Large Complex Networks by Cliques