Alex Sim
Monitoring data transfer performance is a crucial task in scientific computing networks. By predicting performance early in the communication phase, potentially sluggish transfers can be identified and selectively monitored, optimizing…
Neural networks are rapidly gaining popularity in scientific research, but training the models is often very time-consuming. Particularly when the training data samples are large high-dimensional arrays, efficient training methodologies…
Large scientific collaborations often have multiple scientists accessing the same set of files while doing different analyses, which create repeated accesses to the large amounts of shared data located far away. These data accesses have…
To ensure reliability, power transformers are monitored for partial discharge (PD) events, which are symptoms of transformer failure. Since failures can have catastrophic cascading consequences, it is critical to preempt them as early as…
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and…
In modeling time series data, we often need to augment the existing data records to increase the modeling accuracy. In this work, we describe a number of techniques to extract dynamic information about the current state of a large…
The XRootD system is used to transfer, store, and cache large datasets from high-energy physics (HEP). In this study we focus on its capability as distributed on-demand storage cache. Through exploring a large set of daily log files between…
Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a…
The importance of intelligent data placement, management, and analysis has become apparent as scientific data volumes across the network continue to increase. To that end, we describe the use of in-network caching service deployments as a…
The volume of data moving through a network increases with new scientific experiments and simulations. Network bandwidth requirements also increase proportionally to deliver data within a certain time frame. We observe that a significant…
Botnet detection is a critical step in stopping the spread of botnets and preventing malicious activities. However, reliable detection is still a challenging task, due to a wide variety of botnets involving ever-increasing types of devices…
Domain science applications and workflow processes are currently forced to view the network as an opaque infrastructure into which they inject data and hope that it emerges at the destination with an acceptable Quality of Experience. There…
Botnets are increasingly used by malicious actors, creating increasing threat to a large number of internet users. To address this growing danger, we propose to study methods to detect botnets, especially those that are hard to capture with…
Many applications such as scientific simulation, sensing, and power grid monitoring tend to generate massive amounts of data, which should be compressed first prior to storage and transmission. These data, mostly comprised of floating-point…
While variable selection is essential to optimize the learning complexity by prioritizing features, automating the selection process is preferred since it requires laborious efforts with intensive analysis otherwise. However, it is not an…
A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in…
Understanding the earth's climate system and how it might be changing is a preeminent scientific challenge. Global climate models are used to simulate past, present, and future climates, and experiments are executed continuously on an array…