Related papers: A Partitioning Methodology for Accelerating Applic…
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation when processing graphs on a parallel computer. When a topology of a distributed system is known an important task…
Hardware accelerators, such as those based on GPUs and FPGAs, offer an excellent opportunity to efficiently parallelize functionalities. Recently, modern embedded platforms started being equipped with such accelerators, resulting in a…
In this treatise, my research on methods to improve efficiency, reliability, and security of reconfigurable hardware systems, i.e., FPGAs, through partial dynamic reconfiguration is outlined. The efficiency of reconfigurable systems can be…
Domain-specific accelerators are used in various computing systems ranging from edge devices to data centers. Coarse-grained reconfigurable arrays (CGRAs) represent an architectural midpoint between the flexibility of an FPGA and the…
This paper proposes an optimized mapping of the FIR filter algorithm that enhances the rate of a reconfigurable computer over a basic mapping previously proposed [1]. It also presents a new interconnection scheme in the reconfigurable part…
Modern embedded and cyber-physical systems require every day more performance, power efficiency and flexibility, to execute several profiles and functionalities targeting the ever growing adaptivity needs and preserving execution…
In this paper, a novel modulation scheme called set partition modulation (SPM) is proposed. In this scheme, set partitioning and ordered subsets in the set partitions are used to form codewords. We define different SPM variants and depict a…
Hardware specialization is becoming a key enabler of energyefficient performance. Future systems will be increasingly heterogeneous, integrating multiple specialized and programmable accelerators, each with different memory demands.…
Increasing AI computing demands and slowing transistor scaling have led to the advent of Multi-Chip-Module (MCMs) based accelerators. MCMs enable cost-effective scalability, higher yield, and modular reuse by partitioning large chips into…
Modern embedded computing platforms consist of a high amount of heterogeneous resources, which allows executing multiple applications on a single device. The number of running application on the system varies with time and so does the…
Spectral efficiency is a key design issue for all wireless communication systems. Orthogonal frequency division multiplexing (OFDM) is a very well-known technique for efficient data transmission over many carriers overlapped in frequency.…
Coarse-grained reconfigurable architectures aim to achieve both goals of high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain.…
The large time and length scales and, not least, the vast number of particles involved in industrial-scale simulations inflate the computational costs of the Discrete Element Method (DEM) excessively. Coarse grain models can help to lower…
Programmable Logic Devices (PLDs) continue to grow in size and currently contain several millions of gates. At the same time, research effort is going into higher-level hardware synthesis methodologies for reconfigurable computing that can…
This paper applies the N-block PCPM algorithm to solve multi-scale multi-stage stochastic programs, with the application to electricity capacity expansion models. Numerical results show that the proposed simplified N-block PCPM algorithm,…
This paper proposes a highly efficient global coded-multiplexing scheme, conceptualized as Orthogonal Frequency Division Multiplexing over a finite field (FF-OFDM), for reliable multiuser communications. By utilizing a prime length cyclic…
This article describes a geometric partitioning software that can be used for quick computation of data partitions on many-core HPC machines. It is most suited for dynamic applications with load distributions that vary with time.…
The operation of large-scale infrastructure networks requires scalable optimization schemes. To guarantee safe system operation, a high degree of feasibility in a small number of iterations is important. Decomposition schemes can help to…
Despite the numerous uses of semidefinite programming (SDP) and its universal solvability via interior point methods (IPMs), it is rarely applied to practical large-scale problems. This mainly owes to the computational cost of IPMs that…
We employ chordal decomposition to reformulate a large and sparse semidefinite program (SDP), either in primal or dual standard form, into an equivalent SDP with smaller positive semidefinite (PSD) constraints. In contrast to previous…