Tim Fischer — Scifaro

A Lightweight High-Throughput Collective-Capable NoC for Large-Scale ML Accelerators

The exponential increase in Machine Learning (ML) model size and complexity has driven unprecedented demand for high-performance acceleration systems. As technology scaling enables the integration of thousands of computing elements onto a…

Hardware Architecture · Computer Science 2026-05-13 Luca Colagrande , Lorenzo Leone , Chen Wu , Tim Fischer , Raphael Roth , Luca Benini

EPAC: The Last Dance

This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosystem. EPAC is implemented in…

Hardware Architecture · Computer Science 2026-04-15 Filippo Mantovani , Fabio Banchelli , Pablo Vizcaino , Roger Ferrer , Oscar Palomar , Francesco Minervini , Jesus Labarta , Mauro Olivieri , Sebastiano Pomata , Pedro Marcuello , Jordi Cortina , Alberto Moreno , Josep Sans , Roger Espasa , Vassilis Papaefstathiou , Nikolaos Dimou , Georgios Ieronymakis , Antonis Psathakis , Michalis Giaourtas , Iasonas Mastorakis , Manolis Marazakis , Eric Guthmuller , Andrea Bocco , Jérôme Fereyre , César Fuguet , Mate Kovač , Mario Kovač , Luka Mrković , Josip Ramljak , Luca Bertaccini , Tim Fischer , Frank K. Gurkaynak , Paul Scheffler , Luca Benini , Bhavishya Goel , Madhavan Manivannan , Tiago Rocha , Nuno Neves , Jens Krüger

Perspectives - Interactive Document Clustering in the Discourse Analysis Tool Suite

This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities (DH) scholars to explore and organize large, unstructured document collections. Perspectives implements…

Computation and Language · Computer Science 2026-02-18 Tim Fischer , Chris Biemann

Toward Open-Source Chiplets for HPC and AI: Occamy and Beyond

We present a roadmap for open-source chiplet-based RISC-V systems targeting high-performance computing and artificial intelligence, aiming to close the performance gap to proprietary designs. Starting with Occamy, the first open,…

Hardware Architecture · Computer Science 2025-11-20 Paul Scheffler , Thomas Benz , Tim Fischer , Lorenzo Leone , Sina Arjmandpour , Luca Benini

AraXL: A Physically Scalable, Ultra-Wide RISC-V Vector Processor Design for Fast and Efficient Computation on Long Vectors

The ever-growing scale of data parallelism in today's HPC and ML applications presents a big challenge for computing architectures' energy efficiency and performance. Vector processors address the scale-up challenge by decoupling Vector…

Hardware Architecture · Computer Science 2025-08-14 Navaneeth Kunhi Purayil , Matteo Perotti , Tim Fischer , Luca Benini

ControlPULPlet: A Flexible Real-time Multi-core RISC-V Controller for 2.5D Systems-in-package

The growing complexity of real-time control algorithms with increasing performance demands, along with the shift to 2.5D technology, drive the need for scalable controllers to manage chiplets' coupled operation in 2.5D systems-in-package.…

Hardware Architecture · Computer Science 2025-08-14 Alessandro Ottaviano , Robert Balas , Tim Fischer , Thomas Benz , Andrea Bartolini , Luca Benini

TeraNoC: A Multi-Channel 32-bit Fine-Grained, Hybrid Mesh-Crossbar NoC for Efficient Scale-up of 1000+ Core Shared-L1-Memory Clusters

A key challenge in on-chip interconnect design is to scale up bandwidth while maintaining low latency and high area efficiency. 2D-meshes scale with low wiring area and congestion overhead; however, their end-to-end latency increases with…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-05 Yichao Zhang , Zexin Fu , Tim Fischer , Yinrong Li , Marco Bertuletti , Luca Benini

HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation

Despite regulations imposed by nations and social media platforms, e.g. (Government of India, 2021; European Parliament and Council of the European Union, 2022), inter alia, hateful content persists as a significant challenge. Existing…

Computation and Language · Computer Science 2025-07-08 Naquee Rizwan , Seid Muhie Yimam , Daryna Dementieva , Florian Skupin , Tim Fischer , Daniil Moskovskiy , Aarushi Ajay Borkar , Robert Geislinger , Punyajoy Saha , Sarthak Roy , Martin Semmann , Alexander Panchenko , Chris Biemann , Animesh Mukherjee

FlooNoC: A 645 Gbps/link 0.15 pJ/B/hop Open-Source NoC with Wide Physical Links and End-to-End AXI4 Parallel Multi-Stream Support

The new generation of domain-specific AI accelerators is characterized by rapidly increasing demands for bulk data transfers, as opposed to small, latency-critical cache line transfers typical of traditional cache-coherent systems. In this…

Hardware Architecture · Computer Science 2025-03-28 Tim Fischer , Michael Rogenmoser , Thomas Benz , Frank K. Gürkaynak , Luca Benini

Occamy: A 432-Core Dual-Chiplet Dual-HBM2E 768-DP-GFLOP/s RISC-V System for 8-to-64-bit Dense and Sparse Computing in 12nm FinFET

ML and HPC applications increasingly combine dense and sparse memory access computations to maximize storage efficiency. However, existing CPUs and GPUs struggle to flexibly handle these heterogeneous workloads with consistently high…

Hardware Architecture · Computer Science 2025-01-14 Paul Scheffler , Thomas Benz , Viviane Potocnik , Tim Fischer , Luca Colagrande , Nils Wistoff , Yichao Zhang , Luca Bertaccini , Gianmarco Ottavi , Manuel Eggimann , Matheus Cavalcante , Gianna Paulin , Frank K. Gürkaynak , Davide Rossi , Luca Benini

MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores

Low-precision formats have recently driven major breakthroughs in neural network (NN) training and inference by reducing the memory footprint of the NN models and improving the energy efficiency of the underlying hardware architectures.…

Hardware Architecture · Computer Science 2024-10-28 Luca Bertaccini , Gianna Paulin , Tim Fischer , Stefan Mach , Luca Benini

Large Language Models Are Overparameterized Text Encoders

Large language models (LLMs) demonstrate strong performance as text embedding models when finetuned with supervised contrastive training. However, their large size balloons inference time and memory requirements. In this paper, we show that…

Computation and Language · Computer Science 2024-10-21 Thennal D K , Tim Fischer , Chris Biemann

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration…

Hardware Architecture · Computer Science 2024-07-29 Gamze İslamoğlu , Moritz Scherer , Gianna Paulin , Tim Fischer , Victor J. B. Jung , Angelo Garofalo , Luca Benini

Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management

Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such…

Computation and Language · Computer Science 2024-07-01 Seid Muhie Yimam , Daryna Dementieva , Tim Fischer , Daniil Moskovskiy , Naquee Rizwan , Punyajoy Saha , Sarthak Roy , Martin Semmann , Alexander Panchenko , Chris Biemann , Animesh Mukherjee

Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET

We present Occamy, a 432-core RISC-V dual-chiplet 2.5D system for efficient sparse linear algebra and stencil computations on FP64 and narrow (32-, 16-, 8-bit) SIMD FP data. Occamy features 48 clusters of RISC-V cores with custom…

Hardware Architecture · Computer Science 2024-06-24 Gianna Paulin , Paul Scheffler , Thomas Benz , Matheus Cavalcante , Tim Fischer , Manuel Eggimann , Yichao Zhang , Nils Wistoff , Luca Bertaccini , Luca Colagrande , Gianmarco Ottavi , Frank K. Gürkaynak , Davide Rossi , Luca Benini

Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform

Transformer-based foundation models have become crucial for various domains, most notably natural language processing (NLP) or computer vision (CV). These models are predominantly deployed on high-performance GPUs or hardwired accelerators…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-30 Viviane Potocnik , Luca Colagrande , Tim Fischer , Luca Bertaccini , Daniele Jahier Pagliari , Alessio Burrello , Luca Benini

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

2.5D integration is an important technique to tackle the growing cost of manufacturing chips in advanced technology nodes. This poses the challenge of providing high-performance inter-chiplet interconnects (ICIs). As the number of chiplets…

Hardware Architecture · Computer Science 2023-10-10 Patrick Iff , Maciej Besta , Matheus Cavalcante , Tim Fischer , Luca Benini , Torsten Hoefler

FlooNoC: A Multi-Tbps Wide NoC for Heterogeneous AXI4 Traffic

Meeting the staggering bandwidth requirements of today's applications challenges the traditional narrow and serialized NoCs, which hit hard bounds on the maximum operating frequency. This paper proposes FlooNoC, an open-source, low-latency,…

Hardware Architecture · Computer Science 2023-08-29 Tim Fischer , Michael Rogenmoser , Matheus Cavalcante , Frank K. Gürkaynak , Luca Benini

Sparse Hamming Graph: A Customizable Network-on-Chip Topology

Chips with hundreds to thousands of cores require scalable networks-on-chip (NoCs). Customization of the NoC topology is necessary to reach the diverse design goals of different chips. We introduce sparse Hamming graph, a novel NoC topology…

Hardware Architecture · Computer Science 2023-06-29 Patrick Iff , Maciej Besta , Matheus Cavalcante , Tim Fischer , Luca Benini , Torsten Hoefler

TCN-CUTIE: A 1036 TOp/s/W, 2.72 uJ/Inference, 12.2 mW All-Digital Ternary Accelerator in 22 nm FDX Technology

Tiny Machine Learning (TinyML) applications impose uJ/Inference constraints, with a maximum power consumption of tens of mW. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the…

Hardware Architecture · Computer Science 2022-12-02 Moritz Scherer , Alfio Di Mauro , Tim Fischer , Georg Rutishauser , Luca Benini