Related papers: A Simple Multi-Processor Computer Based on Subleq

AISC: Approximate Instruction Set Computer

This paper makes the case for a single-ISA heterogeneous computing platform, AISC, where each compute engine (be it a core or an accelerator) supports a different subset of the very same ISA. An ISA subset may not be functionally complete,…

Hardware Architecture · Computer Science 2018-03-20 Alexandra Ferreron , Jesus Alastruey-Benede , Dario Suarez-Gracia , Ulya R. Karpuzcu

Small and Practical BERT Models for Sequence Labeling

We propose a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU. Starting from a public multilingual BERT checkpoint, our final…

Computation and Language · Computer Science 2019-09-04 Henry Tsai , Jason Riesa , Melvin Johnson , Naveen Arivazhagan , Xin Li , Amelia Archer

Retrofitting Parallelism onto OCaml

OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml is also one of the few modern managed system programming languages to lack support for shared memory parallel programming.…

Programming Languages · Computer Science 2020-07-03 KC Sivaramakrishnan , Stephen Dolan , Leo White , Sadiq Jaffer , Tom Kelly , Anmol Sahoo , Sudha Parimala , Atul Dhiman , Anil Madhavapeddy

staq -- A full-stack quantum processing toolkit

We describe 'staq', a full-stack quantum processing toolkit written in standard C++. 'staq' is a quantum compiler toolkit, comprising of tools that range from quantum optimizers and translators to physical mappers for quantum devices with…

Quantum Physics · Physics 2020-08-07 Matthew Amy , Vlad Gheorghiu

An Asynchronous Parallel Stochastic Coordinate Descent Algorithm

We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong…

Optimization and Control · Mathematics 2014-11-12 Ji Liu , Stephen J. Wright , Christopher Ré , Victor Bittorf , Srikrishna Sridhar

Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties

We describe an asynchronous parallel stochastic proximal coordinate descent algorithm for minimizing a composite objective function, which consists of a smooth convex function plus a separable convex function. In contrast to previous…

Optimization and Control · Mathematics 2015-12-14 Ji Liu , Stephen J. Wright

Improving Inference Performance of Machine Learning with the Divide-and-Conquer Principle

Many popular machine learning models scale poorly when deployed on CPUs. In this paper we explore the reasons why and propose a simple, yet effective approach based on the well-known Divide-and-Conquer Principle to tackle this problem of…

Machine Learning · Computer Science 2023-03-03 Alex Kogan

Sequential & Parallel Algorithms for Big-Integer Numbers Subtraction

Many emerging computer applications require the processing of large numbers, larger than what a CPU can handle. In fact, the top of the line PCs can only manipulate numbers not longer than 32 bits or 64 bits. This is due to the size of the…

Data Structures and Algorithms · Computer Science 2012-04-03 Youssef Bassil , Aziz Barbar

A Logic Programming Framework for Combinational Circuit Synthesis

Logic Programming languages and combinational circuit synthesis tools share a common "combinatorial search over logic formulae" background. This paper attempts to reconnect the two fields with a fresh look at Prolog encodings for the…

Logic in Computer Science · Computer Science 2008-12-18 Paul Tarau , Brenda Luderman

Scaling LLM Inference with Optimized Sample Compute Allocation

Sampling is a basic operation in many inference-time algorithms of large language models (LLMs). To scale up inference efficiently with a limited compute, it is crucial to find an optimal allocation for sample compute budgets: Which…

Computation and Language · Computer Science 2024-10-31 Kexun Zhang , Shang Zhou , Danqing Wang , William Yang Wang , Lei Li

C-slow Technique vs Multiprocessor in designing Low Area Customized Instruction set Processor for Embedded Applications

The demand for high performance embedded processors, for consumer electronics, is rapidly increasing for the past few years. Many of these embedded processors depend upon custom built Instruction Ser Architecture (ISA) such as game…

Hardware Architecture · Computer Science 2012-04-06 Muhammad Adeel Akram , Aamir Khan , Muhammad Masood Sarfaraz

SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling

Test-time compute scaling has emerged as a powerful paradigm for enhancing mathematical reasoning in large language models (LLMs) by allocating additional computational resources during inference. However, current methods employ uniform…

Computation and Language · Computer Science 2025-12-02 Yang Xiao , Chunpu Xu , Ruifeng Yuan , Jiashuo Wang , Wenjie Li , Pengfei Liu

SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving

As large language models (LLMs) scale out with tensor parallelism (TP) and pipeline parallelism (PP) and production stacks have aggressively optimized the data plane (attention/GEMM and KV cache), sampling, the decision plane that turns…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-02 Bohan Zhao , Zane Cao , Yongchao He

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and…

Computation and Language · Computer Science 2019-04-03 Myle Ott , Sergey Edunov , Alexei Baevski , Angela Fan , Sam Gross , Nathan Ng , David Grangier , Michael Auli

Automatic Parallelization of Sequential Programs

Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-21 Peter Kraft , Amos Waterland , Daniel Y Fu , Anitha Gollamudi , Shai Szulanski , Margo Seltzer

Quantitative Expressiveness of Instruction Sequence Classes for Computation on Single Bit Registers

The number of instructions of an instruction sequence is taken for its logical SLOC, and is abbreviated with LLOC. A notion of quantitative expressiveness is based on LLOC and in the special case of operation over a family of single bit…

Programming Languages · Computer Science 2019-04-19 Jan A. Bergstra

The Case for RISP: A Reduced Instruction Spiking Processor

In this paper, we introduce RISP, a reduced instruction spiking processor. While most spiking neuroprocessors are based on the brain, or notions from the brain, we present the case for a spiking processor that simplifies rather than…

Neural and Evolutionary Computing · Computer Science 2022-06-29 James S. Plank , ChaoHui Zheng , Bryson Gullett , Nicholas Skuda , Charles Rizzo , Catherine D. Schuman , Garrett S. Rose

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue…

Machine Learning · Computer Science 2020-03-03 Vincent S. Chen , Sen Wu , Zhenzhen Weng , Alexander Ratner , Christopher Ré

Segmentation of Subspaces in Sequential Data

We propose Ordered Subspace Clustering (OSC) to segment data drawn from a sequentially ordered union of subspaces. Similar to Sparse Subspace Clustering (SSC) we formulate the problem as one of finding a sparse representation but include an…

Computer Vision and Pattern Recognition · Computer Science 2015-04-17 Stephen Tierney , Yi Guo , Junbin Gao

Periodic Single-Pass Instruction Sequences

A program is a finite piece of data that produces a (possibly infinite) sequence of primitive instructions. From scratch we develop a linear notation for sequential, imperative programs, using a familiar class of primitive instructions and…

Programming Languages · Computer Science 2013-04-17 Jan A. Bergstra , Alban Ponse