Related papers: Information Planning for Text Data

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While…

Computation and Language · Computer Science 2023-08-24 Kushal Tirumala , Daniel Simig , Armen Aghajanyan , Ari S. Morcos

An Entropy-Based Model for Hierarchical Learning

Machine learning is the dominant approach to artificial intelligence, through which computers learn from data and experience. In the framework of supervised learning, a necessity for a computer to learn from data accurately and efficiently…

Machine Learning · Statistics 2023-01-25 Amir R. Asadi

Patterns for Learning with Side Information

Supervised, semi-supervised, and unsupervised learning estimate a function given input/output samples. Generalization of the learned function to unseen data can be improved by incorporating side information into learning. Side information…

Machine Learning · Computer Science 2016-02-11 Rico Jonschkowski , Sebastian Höfer , Oliver Brock

Multi-view Information Bottleneck Without Variational Approximation

By "intelligently" fusing the complementary information across different views, multi-view learning is able to improve the performance of classification tasks. In this work, we extend the information bottleneck principle to a supervised…

Machine Learning · Computer Science 2022-04-25 Qi Zhang , Shujian Yu , Jingmin Xin , Badong Chen

Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining

Most studies on language model pretraining focus on large datasets, leaving open questions about optimization in data-constrained settings. In such settings, the effects of training data order and of including alternative versions of the…

Computation and Language · Computer Science 2025-09-30 Matthew Theodore Roque , Dan John Velasco

Making Better Use of Unlabelled Data in Bayesian Active Learning

Fully supervised models are predominant in Bayesian active learning. We argue that their neglect of the information present in unlabelled data harms not just predictive performance but also decisions about what data to acquire. Our proposed…

Machine Learning · Computer Science 2024-04-29 Freddie Bickford Smith , Adam Foster , Tom Rainforth

A Survey of Na\"ive Bayes Machine Learning approach in Text Document Classification

Text Document classification aims in associating one or more predefined categories based on the likelihood suggested by the training set of labeled documents. Many machine learning algorithms play a vital role in training the system with…

Machine Learning · Computer Science 2010-03-10 Vidhya. K. A , G. Aghila

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Improvements in language models are often driven by improving the quality of the data we train them on, which can be limiting when strong supervision is scarce. In this work, we show that paired preference data consisting of individually…

Artificial Intelligence · Computer Science 2025-07-09 Scott Geng , Hamish Ivison , Chun-Liang Li , Maarten Sap , Jerry Li , Ranjay Krishna , Pang Wei Koh

Information plane and compression-gnostic feedback in quantum machine learning

The information plane (Tishby et al. arXiv:physics/0004057, Shwartz-Ziv et al. arXiv:1703.00810) has been proposed as an analytical tool for studying the learning dynamics of neural networks. It provides quantitative insight on how the…

Quantum Physics · Physics 2024-11-05 Nathan Haboury , Mo Kordzanganeh , Alexey Melnikov , Pavel Sekatski

Iterative Data Programming for Expanding Text Classification Corpora

Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data…

Machine Learning · Computer Science 2020-02-05 Neil Mallinar , Abhishek Shah , Tin Kam Ho , Rajendra Ugrani , Ayush Gupta

Focus of Attention Improves Information Transfer in Visual Features

Unsupervised learning from continuous visual streams is a challenging problem that cannot be naturally and efficiently managed in the classic batch-mode setting of computation. The information stream must be carefully processed accordingly…

Machine Learning · Computer Science 2020-06-17 Matteo Tiezzi , Stefano Melacci , Alessandro Betti , Marco Maggini , Marco Gori

Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

We study prediction of future outcomes with supervised models that use privileged information during learning. The privileged information comprises samples of time series observed between the baseline time of prediction and the future…

Machine Learning · Computer Science 2022-05-06 Rickard K. A. Karlsson , Martin Willbo , Zeshan Hussain , Rahul G. Krishnan , David Sontag , Fredrik D. Johansson

Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning

Curriculum learning-organizing training data from easy to hard-has improved efficiency across machine learning domains, yet remains underexplored for language model pretraining. We present the first systematic investigation of curriculum…

Computation and Language · Computer Science 2026-01-29 Yang Zhang , Amr Mohamed , Hadi Abdine , Guokan Shang , Michalis Vazirgiannis

Content Reduction, Surprisal and Information Density Estimation for Long Documents

Many computational linguistic methods have been proposed to study the information content of languages. We consider two interesting research questions: 1) how is information distributed over long documents, and 2) how does content…

Computation and Language · Computer Science 2023-09-13 Shaoxiong Ji , Wei Sun , Pekka Marttinen

The Minimum Information Principle for Discriminative Learning

Exponential models of distributions are widely used in machine learning for classiffication and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we…

Machine Learning · Computer Science 2012-07-19 Amir Globerson , Naftali Tishby

Improved and Efficient Text Adversarial Attacks using Target Information

There has been recently a growing interest in studying adversarial examples on natural language models in the black-box setting. These methods attack natural language classifiers by perturbing certain important words until the classifier…

Machine Learning · Computer Science 2021-05-04 Mahmoud Hossam , Trung Le , He Zhao , Viet Huynh , Dinh Phung

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping

Large language models (LLMs) augmented with multi-step reasoning and action generation abilities have shown promise in leveraging external tools to tackle complex tasks that require long-horizon planning. However, existing approaches either…

Artificial Intelligence · Computer Science 2025-10-16 Wei Fan , Wenlin Yao , Zheng Li , Feng Yao , Xin Liu , Liang Qiu , Qingyu Yin , Yangqiu Song , Bing Yin

Learning the Information Divergence

Information divergence that measures the difference between two nonnegative matrices or tensors has found its use in a variety of machine learning problems. Examples are Nonnegative Matrix/Tensor Factorization, Stochastic Neighbor…

Machine Learning · Computer Science 2014-06-06 Onur Dikmen , Zhirong Yang , Erkki Oja

Prioritizing Informative Features and Examples for Deep Learning from Noisy Data

In this dissertation, we propose a systemic framework that prioritizes informative features and examples to enhance each stage of the development process. Specifically, we prioritize informative features and examples and improve the…

Machine Learning · Computer Science 2024-08-13 Dongmin Park

A New View on Planning in Online Reinforcement Learning

This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with…

Machine Learning · Computer Science 2024-06-04 Kevin Roice , Parham Mohammad Panahi , Scott M. Jordan , Adam White , Martha White