Related papers: Benchmarking JSON BinPack

A Survey of JSON-compatible Binary Serialization Specifications

In this paper, we present the recent advances that highlight the characteristics of JSON-compatible binary serialization specifications. We motivate the discussion by covering the history and evolution of binary serialization specifications…

Databases · Computer Science 2022-01-11 Juan Cruz Viotti , Mital Kinderkhedia

A Benchmark of JSON-compatible Binary Serialization Specifications

We present a comprehensive benchmark of JSON-compatible binary serialization specifications using the SchemaStore open-source test suite collection of over 400 JSON documents matching their respective schemas and representative of their use…

Software Engineering · Computer Science 2022-01-11 Juan Cruz Viotti , Mital Kinderkhedia

JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language Models

Reliably generating structured outputs has become a critical capability for modern language model (LM) applications. Constrained decoding has emerged as the dominant technology across sectors for enforcing structured outputs during…

Computation and Language · Computer Science 2025-02-28 Saibo Geng , Hudson Cooper , Michał Moskal , Samuel Jenkins , Julian Berman , Nathan Ranchin , Robert West , Eric Horvitz , Harsha Nori

BiBench: Benchmarking and Analyzing Network Binarization

Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings by minimizing the bit-width. However, recent research has shown that applying existing binarization…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Haotong Qin , Mingyuan Zhang , Yifu Ding , Aoyu Li , Zhongang Cai , Ziwei Liu , Fisher Yu , Xianglong Liu

Synthesizing JSON Schema Transformers

JSON (JavaScript Object Notation) is a data encoding that allows structured data to be used in a standardized and straightforward manner across systems. Schemas for JSON-formatted data can be constructed using the JSON Schema standard,…

Programming Languages · Computer Science 2025-08-13 Jack Stanek , Daniel Killough

JSON Schema Inclusion through Refutational Normalization: Reconciling Efficiency and Completeness

JSON Schema is the de facto standard for describing the structure of JSON documents. Reasoning about JSON Schema inclusion -- whether every instance satisfying a schema S1 also satisfies a schema S2 -- is a key building block for a variety…

Databases · Computer Science 2026-04-14 Mohamed-Amine Baazizi , Nour El Houda Ben Ali , Dario Colazzo , Giorgio Ghelli , Stefan Klessinger , Carlo Sartiani , Stefanie Scherzinger

Consistent Weighted Sampling Made Fast, Small, and Easy

Document sketching using Jaccard similarity has been a workable effective technique in reducing near-duplicates in Web page and image search results, and has also proven useful in file system synchronization, compression and learning…

Data Structures and Algorithms · Computer Science 2014-10-17 Bernhard Haeupler , Mark Manasse , Kunal Talwar

JTON: A Token-Efficient JSON Superset with Zen Grid Tabular Encoding for Large Language Models

When LLMs process structured data, the serialization format directly affects cost and context utilization. Standard JSON wastes tokens repeating key names in every row of a tabular array--overhead that scales linearly with row count. This…

Artificial Intelligence · Computer Science 2026-04-08 Gowthamkumar Nandakishore

Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation

Recently presented Token-Oriented Object Notation (TOON) aims to replace JSON as a serialization format for passing structured data to LLMs with significantly reduced token usage. While showing solid accuracy in LLM comprehension, there is…

Computation and Language · Computer Science 2026-03-05 Ivan Matveev

Binsparse: A Specification for Cross-Platform Storage of Sparse Matrices and Tensors

Sparse matrices and tensors are ubiquitous throughout multiple subfields of computing. The widespread usage of sparse data has inspired many in-memory and on-disk storage formats, but the only widely adopted storage specifications are the…

Mathematical Software · Computer Science 2025-06-25 Benjamin Brock , Willow Ahrens , Hameer Abbasi , Timothy A. Davis , Juni Kim , James Kitchen , Spencer Patty , Isaac Virshup , Erik Welch

Bloscpack: a compressed lightweight serialization format for numerical data

This paper introduces the Bloscpack file format and the accompanying Python reference implementation. Bloscpack is a lightweight, compressed binary file-format based on the Blosc codec and is designed for lightweight, fast serialization of…

Mathematical Software · Computer Science 2014-04-30 Valentin Haenel

JSONoid: Monoid-based Enrichment for Configurable and Scalable Data-Driven Schema Discovery

Schema discovery is an important aspect to working with data in formats such as JSON. Unlike relational databases, JSON data sets often do not have associated structural information. Consumers of such datasets are often left to browse…

Databases · Computer Science 2023-07-07 Michael J. Mior

Parsing Gigabytes of JSON per Second

JavaScript Object Notation or JSON is a ubiquitous data exchange format on the Web. Ingesting JSON documents can become a performance bottleneck due to the sheer volume of data. We are thus motivated to make JSON parsing as fast as…

Databases · Computer Science 2024-07-25 Geoff Langdale , Daniel Lemire

Sequential File Programming Patterns and Performance with .NET

Programming patterns for sequential file access in the .NET Framework are described and the performance is measured. The default behavior provides excellent performance on a single disk - 50 MBps both reading and writing. Using large…

Performance · Computer Science 2007-05-23 Peter Kukol , Jim Gray

BinSub: The Simple Essence of Polymorphic Type Inference for Machine Code

Recovering high-level type information in binaries is a key task in reverse engineering and binary analysis. Binaries contain very little explicit type information. The structure of binary code is incredibly flexible allowing for ad-hoc…

Programming Languages · Computer Science 2024-09-04 Ian Smith

APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference

Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We present APack, a simple and effective, lossless, off-chip memory compression…

Hardware Architecture · Computer Science 2022-01-24 Alberto Delmas Lascorz , Mostafa Mahmoud , Andreas Moshovos

ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction

Unstructured documents like PDFs contain valuable structured information, but downstream systems require this data in reliable, standardized formats. LLMs are increasingly deployed to automate this extraction, making accuracy and…

Machine Learning · Computer Science 2026-02-17 Nick Ferguson , Josh Pennington , Narek Beghian , Aravind Mohan , Douwe Kiela , Sheshansh Agrawal , Thien Hang Nguyen

The Forgotten Document-Oriented Database Management Systems: An Overview and Benchmark of Native XML DODBMSes in Comparison with JSON DODBMSes

In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve…

Databases · Computer Science 2021-02-05 Ciprian-Octavian Truică , Elena-Simona Apostol , Jérôme Darmont , Torben Bach Pedersen

Serializing Java Objects in Plain Code

In managed languages, serialization of objects is typically done in bespoke binary formats such as Protobuf, or markup languages such as XML or JSON. The major limitation of these formats is readability. Human developers cannot read binary…

Software Engineering · Computer Science 2025-12-16 Julian Wachter , Deepika Tiwari , Martin Monperrus , Benoit Baudry

The Behavioral Diversity of Java JSON Libraries

JSON is an essential file and data format in do-mains that span scientific computing, web APIs or configuration management. Its popularity has motivated significant software development effort to build multiple libraries to process JSON…

Software Engineering · Computer Science 2021-08-30 Nicolas Harrand , Thomas Durieux , David Broman , Benoit Baudry