Related papers: Data processing model for the CDF experiment
The data production farm for the CDF experiment is designed and constructed to meet the needs of the Run II data collection at a maximum rate of 20 MByte/sec during the run. The system is composed of a large cluster of personal computers…
The data production for the CDF experiment is conducted on a large Linux PC farm designed to meet the needs of data collection at a maximum rate of 40 MByte/sec. We present two data production models that exploits advances in computing and…
Information Fusion Systems are now widely used in different fusion contexts, like scientific processing, sensor networks, video and image processing. One of the current trends in this area is to cope with distributed systems. In this…
We review the status of, and prospects for, real-time data processing for collider experiments in experimental High Energy Physics. We discuss the historical evolution of data rates and volumes in the field and place them in the context of…
More and more massive parallel codes running on several hundreds of thousands of cores enter the computational science and engineering domain, allowing high-fidelity computations on up to trillions of unknowns for very detailed analyses of…
Each LHC experiment will produce datasets with sizes of order one petabyte per year. All of this data must be stored, processed, transferred, simulated and analyzed, which requires a computing system of a larger scale than ever mounted for…
Efficient matching of incoming events of data streams to persistent queries is fundamental to event stream processing systems. These applications require dealing with high volume and continuous data streams with fast processing time on…
The ALICE experiment has undergone a major upgrade for LHC Run 3 and will collect data at an interaction rate 50 times larger than before. The new computing scheme for Run 3 replaces the traditionally separate online and offline frameworks…
In the upcoming upgrades for Run 3 and 4, the LHC will significantly increase Pb--Pb and pp interaction rates. This goes along with upgrades of all experiments, ALICE, ATLAS, CMS, and LHCb, related to both the detectors and the computing.…
Physics experiments produce enormous amount of raw data, counted in petabytes per day. Hence, there is large effort to reduce this amount, mainly by using some filters. The situation can be improved by additionally applying some data…
In modeling time series data, we often need to augment the existing data records to increase the modeling accuracy. In this work, we describe a number of techniques to extract dynamic information about the current state of a large…
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…
During the upcoming Runs 3 and 4 of the LHC, ALICE will take data at a peak Pb-Pb collision rate of 50 kHz. This will be made possible thanks to the upgrade of the main tracking detectors of the experiment, and with a new data processing…
The design complexity of CNNs has been steadily increasing to improve accuracy. To cope with the massive amount of computation needed for such complex CNNs, the latest solutions utilize blocking of an image over the available dimensions and…
The majority of modern systems exhibit sophisticated concurrent behaviour, where several system components modify and observe the system state with fine-grained atomicity. Many systems (e.g., multi-core processors, real-time controllers)…
In this research, a model is proposed to learn from event log and predict future events of a system. The proposed PEDF model learns based on events' sequences, durations, and extra features. The PEDF model is built by a network made of…
We propose CFS, a distributed file system for large scale container platforms. CFS supports both sequential and random file accesses with optimized storage for both large files and small files, and adopts different replication protocols for…
Over the last two decades, scientific workflow management systems (SWfMS) have emerged as a means to facilitate the design, execution, and monitoring of reusable scientific data processing pipelines. At the same time, the amounts of data…
The CDF and D0 experiments probe the high-energy frontier and as they do so have accumulated hundreds of Terabytes of data on the way to petabytes of data over the next two years. The experiments have made a commitment to use the developing…
Data intensive applications often involve the analysis of large datasets that require large amounts of compute and storage resources. While dedicated compute and/or storage farms offer good task/data throughput, they suffer low resource…