Philipp Singer — Scifaro

TabPFN-3: Technical Report

Tabular data underpins most high-value prediction problems in science and industry, and TabPFN has driven the foundation model revolution for this modality. Designed with feedback from our users, TabPFN-3 builds on this foundation to scale…

Machine Learning · Computer Science 2026-05-29 Léo Grinsztajn , Klemens Flöge , Oscar Key , Felix Birkel , Philipp Jund , Brendan Roof , Mihir Manium , Shi Bin Hoo , Magnus Bühler , Anurag Garg , Dominik Safaric , Jake Robertson , Benjamin Jäger , Simone Alessi , Adrian Hayler , Vladyslav Moroshan , Lennart Purucker , Philipp Singer , Alan Arazi , Julien Siems , Jan Hendrik Metzen , Georg Grab , Nick Erickson , Siyuan Guo , Eliott Kalfon , Simon Bing , David Salinas , Clara Cornu , Lilly Charlotte Wehrhahn , Diana Kriuchkova , Kursat Kaya , Lydia Sidhoum , Marie Salmon , Jerry Chen , Madelon Hulsebos , Yann LeCun , Samuel Müller , Bernhard Schölkopf , Sauraj Gambhir , Noah Hollmann , Frank Hutter

H2O-Danube3 Technical Report

We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English…

Computation and Language · Computer Science 2024-07-15 Pascal Pfeiffer , Philipp Singer , Yauhen Babakhin , Gabor Fodor , Nischay Dhankhar , Sri Satish Ambati

H2O-Danube-1.8B Technical Report

We present H2O-Danube, a series of small 1.8B language models consisting of H2O-Danube-1.8B, trained on 1T tokens, and the incremental improved H2O-Danube2-1.8B trained on an additional 2T tokens. Our models exhibit highly competitive…

Computation and Language · Computer Science 2024-04-16 Philipp Singer , Pascal Pfeiffer , Yauhen Babakhin , Maximilian Jeblick , Nischay Dhankhar , Gabor Fodor , Sri Satish Ambati

H2O Open Ecosystem for State-of-the-art Large Language Models

Large Language Models (LLMs) represent a revolution in AI. However, they also pose many significant risks, such as the presence of biased, private, copyrighted or harmful text. For this reason we need open, transparent and safe solutions.…

Computation and Language · Computer Science 2023-10-24 Arno Candel , Jon McKinney , Philipp Singer , Pascal Pfeiffer , Maximilian Jeblick , Chun Ming Lee , Marcos V. Conde

h2oGPT: Democratizing Large Language Models

Applications built on top of Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their human-level capabilities in natural language processing. However, they also pose many significant risks such as the presence…

Computation and Language · Computer Science 2023-06-19 Arno Candel , Jon McKinney , Philipp Singer , Pascal Pfeiffer , Maximilian Jeblick , Prithvi Prabhu , Jeff Gambera , Mark Landry , Shivam Bansal , Ryan Chesler , Chun Ming Lee , Marcos V. Conde , Pasha Stetsenko , Olivier Grellier , SriSatish Ambati

Recognizing bird species in diverse soundscapes under weak supervision

We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF2021 challenge. We illustrate how to make full use of pre-trained convolutional neural networks, by…

Sound · Computer Science 2021-07-19 Christof Henkel , Pascal Pfeiffer , Philipp Singer

Supporting large-scale image recognition with out-of-domain samples

This article presents an efficient end-to-end method to perform instance-level recognition employed to the task of labeling and ranking landmark images. In a first step, we embed images in a high dimensional feature space using…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Christof Henkel , Philipp Singer

Visibility of minorities in social networks

Homophily can put minority groups at a disadvantage by restricting their ability to establish links with people from a majority group. This can limit the overall visibility of minorities in the network. Building on a Barab\'{a}si-Albert…

Physics and Society · Physics 2020-10-06 Fariba Karimi , Mathieu Génois , Claudia Wagner , Philipp Singer , Markus Strohmaier

Backtesting the predictability of COVID-19

The advent of the COVID-19 pandemic has instigated unprecedented changes in many countries around the globe, putting a significant burden on the health sectors, affecting the macro economic conditions, and altering social interactions…

Physics and Society · Physics 2020-07-23 Dmitry Gordeev , Philipp Singer , Marios Michailidis , Mathias Müller , SriSatish Ambati

MixedTrails: Bayesian hypothesis comparison on heterogeneous sequential data

Sequential traces of user data are frequently observed online and offline, e.g., as sequences of visited websites or as sequences of locations captured by GPS. However, understanding factors explaining the production of sequence data is a…

Social and Information Networks · Computer Science 2017-07-12 Martin Becker , Florian Lemmerich , Philipp Singer , Markus Strohmaier , Andreas Hotho

Why We Read Wikipedia

Wikipedia is one of the most popular sites on the Web, with millions of users relying on it to satisfy a broad range of information needs every day. Although it is crucial to understand what exactly these needs are in order to be able to…

Social and Information Networks · Computer Science 2017-03-17 Philipp Singer , Florian Lemmerich , Robert West , Leila Zia , Ellery Wulczyn , Markus Strohmaier , Jure Leskovec

Evidence of Online Performance Deterioration in User Sessions on Reddit

This article presents evidence of performance deterioration in online user sessions quantified by studying a massive dataset containing over 55 million comments posted on Reddit in April 2015. After segmenting the sessions (i.e., periods of…

Social and Information Networks · Computer Science 2017-03-07 Philipp Singer , Emilio Ferrara , Farshad Kooti , Markus Strohmaier , Kristina Lerman

What Makes a Link Successful on Wikipedia?

While a plethora of hypertext links exist on the Web, only a small amount of them are regularly clicked. Starting from this observation, we set out to study large-scale click data from Wikipedia in order to understand what makes a link…

Social and Information Networks · Computer Science 2017-02-21 Dimitar Dimitrov , Philipp Singer , Florian Lemmerich , Markus Strohmaier

Sampling from Social Networks with Attributes

Sampling from large networks represents a fundamental challenge for social network research. In this paper, we explore the sensitivity of different sampling techniques (node sampling, edge sampling, random walk sampling, and snowball…

Social and Information Networks · Computer Science 2017-02-20 Claudia Wagner , Philipp Singer , Fariba Karimi , Jürgen Pfeffer , Markus Strohmaier

Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains

Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases (ICD) as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring,…

Social and Information Networks · Computer Science 2016-03-01 Simon Walk , Philipp Singer , Markus Strohmaier , Tania Tudorache , Mark A. Musen , Natalya F. Noy

How to Apply Markov Chains for Modeling Sequential Edit Patterns in Collaborative Ontology-Engineering Projects

With the growing popularity of large-scale collaborative ontology-engineering projects, such as the creation of the 11th revision of the International Classification of Diseases, we need new methods and insights to help project- and…

Human-Computer Interaction · Computer Science 2016-02-17 Simon Walk , Philipp Singer , Markus Strohmaier , Denis Helic , Natalya F. Noy , Mark Musen

Discovering and Characterizing Mobility Patterns in Urban Spaces: A Study of Manhattan Taxi Data

Nowadays, human movement in urban spaces can be traced digitally in many cases. It can be observed that movement patterns are not constant, but vary across time and space. In this work,we characterize such spatio-temporal patterns with an…

Social and Information Networks · Computer Science 2016-02-11 Lisette Espín-Noboa , Florian Lemmerich , Philipp Singer , Markus Strohmaier

HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the…

Social and Information Networks · Computer Science 2015-03-27 Philipp Singer , Denis Helic , Andreas Hotho , Markus Strohmaier

Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order

One of the most frequently used models for understanding human navigation on the Web is the Markov chain model, where Web pages are represented as states and hyperlinks as probabilities of navigating from one page to another. Predominantly,…

Social and Information Networks · Computer Science 2014-07-15 Philipp Singer , Denis Helic , Behnam Taraghi , Markus Strohmaier

Evolution of Reddit: From the Front Page of the Internet to a Self-referential Community?

In the past few years, Reddit -- a community-driven platform for submitting, commenting and rating links and text posts -- has grown exponentially, from a small community of users into one of the largest online communities on the Web. To…

Social and Information Networks · Computer Science 2014-06-24 Philipp Singer , Fabian Flöck , Clemens Meinhart , Elias Zeitfogel , Markus Strohmaier