Kai Puolamäki — Scifaro

ExplainReduce: Generating global explanations from many local explanations

Most commonly used non-linear machine learning methods are closed-box models, uninterpretable to humans. The field of explainable artificial intelligence (XAI) aims to develop tools to examine the inner workings of these closed boxes. An…

Machine Learning · Computer Science 2026-05-26 Lauri Seppäläinen , Mudong Guo , Kai Puolamäki

Information Hidden in Gradients of Regression with Target Noise

Second-order information -- such as curvature or data covariance -- is critical for optimisation, diagnostics, and robustness. However, in many modern settings, only the gradients are observable. We show that the gradients alone can reveal…

Machine Learning · Computer Science 2026-04-08 Arash Jamshidi , Katsiaryna Haitsiukevich , Kai Puolamäki

PhiPlot: A Web-Based Interactive EDA Environment for Atmospherically Relevant Molecules

Advances in computational chemistry have produced high-dimensional datasets on atmospherically relevant molecules. To aid exploration of such datasets, particularly for the study of atmospheric aerosol formation, we introduce PhiPlot: a…

Human-Computer Interaction · Computer Science 2026-03-13 Matias Loukojärvi , Ananth Mahadevan , Katsiaryna Haitsiukevich , Kai Puolamäki

GRADSTOP: Early Stopping of Gradient Descent via Posterior Sampling

Machine learning models are often learned by minimising a loss function on the training data using a gradient descent algorithm. These models often suffer from overfitting, leading to a decline in predictive performance on unseen data. A…

Machine Learning · Computer Science 2026-01-28 Arash Jamshidi , Lauri Seppäläinen , Katsiaryna Haitsiukevich , Hoang Phuc Hau Luu , Anton Björklund , Kai Puolamäki

Fast and Interpretable Machine Learning Modelling of Atmospheric Molecular Clusters

Understanding how atmospheric molecular clusters form and grow is key to resolving one of the biggest uncertainties in climate modelling: the formation of new aerosol particles. While quantum chemistry offers accurate insights into these…

Machine Learning · Computer Science 2025-09-16 Lauri Seppäläinen , Jakub Kubečka , Jonas Elm , Kai Puolamäki

Non-geodesically-convex optimization in the Wasserstein space

We study a class of optimization problems in the Wasserstein space (the space of probability measures) where the objective function is nonconvex along generalized geodesics. Specifically, the objective exhibits some difference-of-convex…

Optimization and Control · Mathematics 2025-01-08 Hoang Phuc Hau Luu , Hanlin Yu , Bernardo Williams , Petrus Mikkola , Marcelo Hartmann , Kai Puolamäki , Arto Klami

Gradient Boosting Mapping for Dimensionality Reduction and Feature Extraction

A fundamental problem in supervised learning is to find a good set of features or distance measures. If the new set of features is of lower dimensionality and can be obtained by a simple transformation of the original data, they can make…

Machine Learning · Computer Science 2024-05-15 Anri Patron , Ayush Prasad , Hoang Phuc Hau Luu , Kai Puolamäki

Using Slisemap to interpret physical data

Manifold visualisation techniques are commonly used to visualise high-dimensional datasets in physical sciences. In this paper we apply a recently introduced manifold visualisation method, called Slise, on datasets from physics and…

Machine Learning · Computer Science 2024-01-29 Lauri Seppäläinen , Anton Björklund , Vitus Besel , Kai Puolamäki

$\chi$iplot: web-first visualisation platform for multidimensional data

$\chi$iplot is an HTML5-based system for interactive exploration of data and machine learning models. A key aspect is interaction, not only for the interactive plots but also between plots. Even though $\chi$iplot is not restricted to any…

Human-Computer Interaction · Computer Science 2023-10-16 Akihiro Tanaka , Juniper Tyree , Anton Björklund , Jarmo Mäkelä , Kai Puolamäki

SLISEMAP: Supervised dimensionality reduction through local explanations

Existing methods for explaining black box learning models often focus on building local explanations of model behaviour for a particular data item. It is possible to create global explanations for all data items, but these explanations…

Machine Learning · Computer Science 2023-10-16 Anton Björklund , Jarmo Mäkelä , Kai Puolamäki

Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach

Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing methods use predefined criteria to choose the representation of data. There is a lack of methods that (i) elicit…

Machine Learning · Statistics 2021-11-08 Kai Puolamäki , Emilia Oikarinen , Bo Kang , Jefrey Lijffijt , Tijl De Bie

Subjectively Interesting Subgroup Discovery on Real-valued Targets

Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely…

Machine Learning · Statistics 2021-11-08 Jefrey Lijffijt , Bo Kang , Wouter Duivesteijn , Kai Puolamäki , Emilia Oikarinen , Tijl De Bie

Interactive Causal Structure Discovery in Earth System Sciences

Causal structure discovery (CSD) models are making inroads into several domains, including Earth system sciences. Their widespread adaptation is however hampered by the fact that the resulting models often do not take into account the…

Data Analysis, Statistics and Probability · Physics 2021-07-05 Laila Melkas , Rafael Savvides , Suyog Chandramouli , Jarmo Mäkelä , Tuomo Nieminen , Ivan Mammarella , Kai Puolamäki

Guided Visual Exploration of Relations in Data Sets

Efficient explorative data analysis systems must take into account both what a user knows and wants to know. This paper proposes a principled framework for interactive visual exploration of relations in data, through views most informative…

Machine Learning · Statistics 2021-07-02 Kai Puolamäki , Emilia Oikarinen , Andreas Henelius

Low-Cost Outdoor Air Quality Monitoring and Sensor Calibration: A Survey and Critical Analysis

The significance of air pollution and the problems associated with it are fueling deployments of air quality monitoring stations worldwide. The most common approach for air quality monitoring is to rely on environmental monitoring stations,…

Signal Processing · Electrical Eng. & Systems 2021-01-26 Francesco Concas , Julien Mineraud , Eemil Lagerspetz , Samu Varjonen , Xiaoli Liu , Kai Puolamäki , Petteri Nurmi , Sasu Tarkoma

Tell Me Something I Don't Know: Randomization Strategies for Iterative Data Mining

There is a wide variety of data mining methods available, and it is generally useful in exploratory data analysis to use many different methods for the same dataset. This, however, leads to the problem of whether the results found by one…

Machine Learning · Computer Science 2020-06-18 Sami Hanhijärvi , Markus Ojala , Niko Vuokko , Kai Puolamäki , Nikolaj Tatti , Heikki Mannila

Estimating regression errors without ground truth values

Regression analysis is a standard supervised machine learning method used to model an outcome variable in terms of a set of predictor variables. In most real-world applications we do not know the true value of the outcome variable being…

Machine Learning · Statistics 2019-10-10 Henri Tiittanen , Emilia Oikarinen , Andreas Henelius , Kai Puolamäki

Randomisation Algorithms for Large Sparse Matrices

In many domains it is necessary to generate surrogate networks, e.g., for hypothesis testing of different properties of a network. Furthermore, generating surrogate networks typically requires that different properties of the network is…

Data Structures and Algorithms · Computer Science 2019-06-05 Kai Puolamäki , Andreas Henelius , Antti Ukkonen

Human-guided data exploration using randomisation

An explorative data analysis system should be aware of what the user already knows and what the user wants to know of the data: otherwise the system cannot provide the user with the most informative and useful views of the data. We propose…

Machine Learning · Statistics 2019-01-01 Kai Puolamäki , Emilia Oikarinen , Buse Atli , Andreas Henelius

Human-Guided Data Exploration

The outcome of the explorative data analysis (EDA) phase is vital for successful data analysis. EDA is more effective when the user interacts with the system used to carry out the exploration. In the recently proposed paradigm of iterative…

Machine Learning · Statistics 2018-04-11 Andreas Henelius , Emilia Oikarinen , Kai Puolamäki