Related papers: An Efficient Synchronous Static Memory design for …
Programming embedded systems applications involve writing concurrent, event-driven and timing-aware programs. Traditionally, such programs are written in low-level machine-oriented programming languages like C or Assembly. We present an…
A theoretical memory with limited processing power and internal connectivity at each element is proposed. This memory carries out parallel processing within itself to solve generic array problems. The applicability of this in-memory…
Hybrid memory systems, comprised of emerging non-volatile memory (NVM) and DRAM, have been proposed to address the growing memory demand of applications. Emerging NVM technologies, such as phase-change memories (PCM), memristor, and 3D…
Application specific simulation is challenging task in various real time high performance embedded devices. In this study specific application is implemented with the help of Xilinx. Xilinx provides SDK and XPS tools, XPS tools used for…
Synchronous programs are used extensively in implementation of safety critical embedded software. Imperative synchronous programming languages model multiple Finite State Machines (FSMs) executing in lockstep at logical clock ticks. The…
In-memory computing (IMC) with single instruction multiple data (SIMD) setup enables memory to perform operations on the stored data in parallel to achieve high throughput and energy saving. To instruct a SIMD IMC hardware to compute a…
An Efficient Simulation of application specific instruction-set processors (ASIP) is a challenging onus in the area of VLSI design. This paper reconnoiters the possibility of use of ASIP simulators for ASIP Simulation. This proposed study…
Deep learning hardware designs have been bottlenecked by conventional memories such as SRAM due to density, leakage and parallel computing challenges. Resistive devices can address the density and volatility issues, but have been limited by…
State space models (SSMs) have recently emerged as a powerful framework for long sequence processing, outperforming traditional methods on diverse benchmarks. Fundamentally, SSMs can generalize both recurrent and convolutional networks and…
In embedded vision systems, parallel computation of the integral image presents several design challenges in terms of hardware resources, speed and power consumption. Although recursive equations significantly reduce the number of…
For the last thirty years, a large variety of memory allocators have been proposed. Since performance, memory usage and energy consumption of each memory allocator differs, software engineers often face difficult choices in selecting the…
This thesis develops signal-processing algorithms and implementation schemes under constraints of minimal parallelism and memory space, with the goal of improving energy efficiency of low-power computing hardware. We propose (i) a…
Plenty of research efforts have been devoted to FPGA-based acceleration, due to its low latency and high energy efficiency. However, using the original low-level hardware description languages like Verilog to program FPGAs requires…
In this paper is proposed a technique to integrate and simulate a dynamic memory in a multiprocessor framework based on C/C++/SystemC. Using host machine's memory management capabilities, dynamic data processing is supported without…
Compute-in-memory (CiM) is a promising approach to improving the computing speed and energy efficiency in dataintensive applications. Beyond existing CiM techniques of bitwise logic-in-memory operations and dot product operations, this…
Electrical static random memory (E-SRAM) is the current standard for internal static memory in Field Programmable Gate Array (FPGA). Despite the dramatic improvement in E-SRAM technology over the past decade, the goal of ultra-fast,…
Computing-in-memory (CIM) has attracted significant attentions in recent years due to its massive parallelism and low power consumption. However, current CIM designs suffer from large area overhead of small CIM macros and bad programmablity…
Long Short-term Memory Networks (LSTMs) are a vital Deep Learning technique suitable for performing on-device time series analysis on local sensor data streams of embedded devices. In this paper, we propose a new hardware accelerator design…
While general-purpose computing follows Von Neumann's architecture, the data movement between memory and processor elements dictates the processor's performance. The evolving compute-in-memory (CiM) paradigm tackles this issue by…
Simulating a shared register can mask the intricacies of designing algorithms for asynchronous message-passing systems subject to crash failures, since it allows them to run algorithms designed for the simpler shared-memory model. Typically…