~$ ./projects
K-medoids Clustering
The k-medoids methods for modeling clustered data have many desirable properties such as robustness to noise and the ability to use non-numerical values. In this project we developed AGORAS, a novel heuristic algorithm for the k-medoids problem where the algorithmic complexity is driven by, k, the number of clusters, rather than, n, the number of data points. Our algorithm attempts to isolate a sample from each individual cluster within a sequence of uniformly drawn samples taken from the complete data. As a result, computing the k-medoids solution using our method only involves solving k trivial sub-problems of centrality. We evaluate AGORAS experimentally against PAM and CLARANS -- two of the best-known existing algorithms for the k-medoids problem. AGORAS can outperform PAM by up to four orders of magnitude for data sets with less than 10,000 points, and it outperforms CLARANS by two orders of magnitude on a dataset of just 64,000 points. Moreover, we find in some cases that AGORAS also outperforms in terms of cluster quality.An implementation of AGORAS, PAM, and CLARANS, as described in our paper AGORAS: A Fast Algorithm for Estimating Medoids in Large Datasets, is publicly available on GitHub. Click here for the project's code repository.
Building Halo Merger Trees
High Performance Surface Density Field Reconstruction
Reconstructing a continuous field from a discrete point set is a fundamental operation in scientific data analysis. From a given set of irregularly distributed points in 3D, the task is to create a field defined on a regular grid from some property defined on the points. This work has an application in cosmology and astrophysics for gravitational lensing simulations, where accurate surface density estimation is a critical and costly step, but the techniques developed generalize to any application requiring high-performance grid interpolation with low noise properties. Shown here is a visualization of a typical surface density field computed during a strong lensing study from an N-body particle simulation.
An implementation of the shared memory kernel described in our paper, Parallel DTFE Surface Density Field Reconstruction, is publicly available on GitHub. Click here for the project's code repository.