English

Robust Machine Learning Applied to Terascale Astronomical Datasets

Astrophysics 2008-04-29 v1 Distributed, Parallel, and Cluster Computing

Abstract

We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of terascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not just performing simulations. Via a modified implementation of the NCSA cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million stars and galaxies in the Sloan Digital Sky Survey, improved distance measures, and a full exploitation of the simple but powerful k-nearest neighbor algorithm. A driving principle of this work is that our methods should be extensible from current terascale datasets to upcoming petascale datasets and beyond. We discuss issues encountered to-date, and further issues for the transition to petascale. In particular, disk I/O will become a major limiting factor unless the necessary infrastructure is implemented.

Keywords

Cite

@article{arxiv.0804.3417,
  title  = {Robust Machine Learning Applied to Terascale Astronomical Datasets},
  author = {Nicholas M. Ball and Robert J. Brunner and Adam D. Myers},
  journal= {arXiv preprint arXiv:0804.3417},
  year   = {2008}
}

Comments

11 pages, 2 figures, uses llncs.cls. To appear in the 9th LCI International Conference on High-Performance Clustered Computing

R2 v1 2026-06-21T10:33:19.779Z