Md. Mostofa Ali Patwary
Software: This page is under development...
Source Code released under the GNU General Public License (GPL)- DBSCAN: A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure
- Union-Find algorithms for the Disjoint-Set Data Structure
- Fast algorithms to find maximum clique in massive graphs
- Parallel Graph Matching Algoriths on Distributed Memory Computers
- Graph Ordering and Coloring: New Multithreaded Ordering and Coloring Algorithms for Multicore Architectures
Abstract: DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload resulting low parallel efficiency. We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory. Using data sets containing several hundred million high-dimensional points, we show that PDSDBCAN significantly outperforms the master-slave approach, achieving speedups up to 30.3 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.
Source code: Will be released soon... Informal version is available on request...
Abstract: A package on different union-find algorithms (sequential, multicore and massively parallel)...
Source code:
Abstract: The maximum clique problem is a well known NP-Hard problem with applications in data mining, network analysis, informatics, and many other areas. Although there exist several algorithms with acceptable runtimes for certain classes of graphs, many of them are infeasible for massive graphs. We present a new exact algorithm that employs novel pruning techniques to very quickly find maximum cliques in large sparse graphs. Extensive experiments on several types of synthetic and real-world graphs show that our new algorithm is up to several orders of magnitude faster than existing algorithms for most instances. We also present a heuristic variant that runs orders of magnitude faster than the exact algorithm, while providing optimal or near-optimal solutions.
Source code:
The details on the maximum clique algorithm and source code are available online.
Abstract:
Source code:
Abstract:
Source code: