A pseudo-parallel Python environment for database curation

Eckhard Sutorius; Johann Bryant; Ross Collins; Nicholas Cross; Nigel Hambly; Mike Read

A pseudo-parallel Python environment for database curation

Astrophysics 2007-11-14 v1

Authors: Eckhard Sutorius , Johann Bryant , Ross Collins , Nicholas Cross , Nigel Hambly , Mike Read

Abstract

One of the major challenges providing large databases like the WFCAM Science Archive (WSA) is to minimize ingest times for pixel/image metadata and catalogue data. In this article we describe how the pipeline processed data are ingested into the database as the first stage in building a release database which will be succeeded by advanced processing (source merging, seaming, detection quality flagging etc.). To accomplish the ingestion procedure as fast as possible we use a mixed Python/C++ environment and run the required tasks in a simple parallel modus operandi where the data are split into daily chunks and then processed on different computers. The created data files can be ingested into the database immediately as they are available. This flexible way of handling the data allows the most usage of the available CPUs as the comparison with sequential processing shows.

Keywords

scientific software and data analysis tools computational physics software graphics processing unit computing

Cite

@article{arxiv.0711.2042,
  title  = {A pseudo-parallel Python environment for database curation},
  author = {Eckhard Sutorius and Johann Bryant and Ross Collins and Nicholas Cross and Nigel Hambly and Mike Read},
  journal= {arXiv preprint arXiv:0711.2042},
  year   = {2007}
}

Comments

4 pages, 2 figures, ADASS XVII conference proceeding. ASP conference series

A pseudo-parallel Python environment for database curation

Abstract

Keywords

Cite

Comments

Related papers