相关论文: Virtual Data in CMS Production
The use of virtual data for enhancing the collaboration between large groups of scientists is explored in several ways: - by defining ``virtual'' parameter spaces which can be searched and shared in an organized way by a collaboration of…
The CMS collaboration has a long term need to perform large-scale simulation efforts, in which physics events are generated and their manifestations in the CMS detector are simulated. Simulated data are then reconstructed and analyzed by…
We present the current status of CMS data analysis architecture and describe work on future Grid-based distributed analysis prototypes. CMS has two main software frameworks related to data analysis: COBRA, the main framework, and IGUANA,…
Grid computing (GC) systems are large-scale virtual machines, built upon a massive pool of resources (processing time, storage, software) that often span multiple distributed domains. Concurrent users interact with the grid by adding new…
McRunjob is a powerful grid workflow manager used to manage the generation of large numbers of production processing jobs in High Energy Physics. In use at both the DZero and CMS experiments, McRunjob has been used to manage large Monte…
The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the…
GlideinWMS is a workload manager provisioning resources for many experiments, including CMS and DUNE. The software is distributed both as native packages and specialized production containers. Following an approach used in other communities…
The enabling of scientific experiments that are embarrassingly parallel, long running and data-intensive into a cloud-based execution environment is a desirable, though complex undertaking for many researchers. The management of such…
The unprecedented growth in data demand from emerging applications has turned virtual memory (VM) into a major performance bottleneck. Researchers explore new hardware/OS co-designs to optimize VM across diverse applications and systems. To…
Monte Carlo simulation studies are at the core of the modern applied, computational, and theoretical statistical literature. Simulation is a broadly applicable research tool, used to collect data on the relative performance of methods or…
Quantum circuit simulation is crucial for quantum computing such as validating quantum algorithms. We present Qymera, a system that repurposes relational database management systems (RDBMSs) for simulation by translating circuits into SQL…
Over the last few years, with the growth of time-series collecting and storing, there has been a great demand for tools and software for temporal data engineering and modeling. This paper presents a generic workflow for time series data…
Nowadays, machine learning (ML) teams have multiple concurrent ML workflows for different applications. Each workflow typically involves many experiments, iterations, and collaborative activities and commonly takes months and sometimes…
Platform virtualization helps solving major grid computing challenges: share resource with flexible, user-controlled and custom execution environments and in the meanwhile, isolate failures and malicious code. Grid resource management tools…
Nowadays simulations can produce petabytes of data to be stored in parallel filesystems or large-scale databases. This data is accessed over the course of decades often by thousands of analysts and scientists. However, storing these volumes…
As organizations increasingly rely on data-driven insights, the ability to run data intensive applications seamlessly across multiple cloud environments becomes critical for tapping into cloud innovations while complying with various…
Many kinds of industrial projects involve the use of prefabricated modules built offsite, and installation on-site using mobile cranes. Due to their costly operation and safety concerns, utilization of such heavy lift mobile cranes requires…
RefDB is the CMS Monte Carlo Reference Database. It is used for recording and managing all details of physics simulation, reconstruction and analysis requests, for coordinating task assignments to world-wide distributed Regional Centers,…
Delivering a reproducible environment along with complex and up-to-date software stacks on thousands of distributed and heterogeneous worker nodes is a critical task. The CernVM-File System (CVMFS) has been designed to help various…
In this paper, we describe a multidatabase system as 4tiered Client-Server DBMS architectures. We discuss their functional components and provide an overview of their performance characteristics. The first component of this proposed system…