Related papers: Factor Engine: A Python Library for Systematic Fin…
We study alpha factor mining, the automated discovery of predictive signals from noisy, non-stationary market data-under a practical requirement that mined factors be directly executable and auditable, and that the discovery process remain…
Factor analysis is a flexible technique for assessment of multivariate dependence and codependence. Besides being an exploratory tool used to reduce the dimensionality of multivariate data, it allows estimation of common factors that often…
Formulaic alpha factor mining is a critical yet challenging task in quantitative investment, characterized by a vast search space and the need for domain-informed, interpretable signals. However, finding novel signals becomes increasingly…
This paper describes the autofeat Python library, which provides scikit-learn style linear regression and classification models with automated feature engineering and selection capabilities. Complex non-linear machine learning models, such…
The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated…
We introduce Genome-Factory, the first integrated Python library for tuning, deploying, and interpreting genomic foundation models. Our core contribution is to simplify and unify the workflow for genomic model development: data collection,…
Predictive systems, in particular machine learning algorithms, can take important, and sometimes legally binding, decisions about our everyday life. In most cases, however, these systems and decisions are neither regulated nor certified.…
NIMFA is an open-source Python library that provides a unified interface to nonnegative matrix factorization algorithms. It includes implementations of state-of-the-art factorization methods, initialization approaches, and quality scoring.…
Pattern matching is a powerful tool for symbolic computations, based on the well-defined theory of term rewriting systems. Application domains include algebraic expressions, abstract syntax trees, and XML and JSON data. Unfortunately, no…
Factor modeling is a powerful statistical technique that permits to capture the common dynamics in a large panel of data with a few latent variables, or factors, thus alleviating the curse of dimensionality. Despite its popularity and…
Data-driven modeling is an approach in energy systems modeling that has been gaining popularity. In data-driven modeling, machine learning methods such as linear regression, neural networks or decision-tree based methods are being applied.…
Pattern matching is a powerful tool for symbolic computations. Applications include term rewriting systems, as well as the manipulation of symbolic expressions, abstract syntax trees, and XML and JSON data. It also allows for an intuitive…
Factorised databases are relational databases that use compact factorised representations at the physical layer to reduce data redundancy and boost query performance. This paper introduces FDB, an in-memory query engine for…
The advent of modern data processing has led to an increasing tendency towards interdisciplinarity, which frequently involves the importation of different technical approaches. Consequently, there is an urgent need for a unified data…
This paper proposes the design and implementation strategy of a novel computing architecture, the Factor Machine. The work is a step towards a general-purpose parallel system operating in a non-sequential manner, exploiting…
Alpha factor mining aims to discover investment signals from the historical financial market data, which can be used to predict asset returns and gain excess profits. Powerful deep learning methods for alpha factor mining lack…
Factor analysis or sometimes referred to as variable analysis has been extensively used in classification problems for identifying specific factors that are significant to particular classes. This type of analysis has been widely used in…
A large amount of data is produced every second from modern information systems such as mobile devices, the world wide web, Internet of Things, social media, etc. Analysis and mining of this massive data requires a lot of advanced tools and…
This paper presents Deepchecks, a Python library for comprehensively validating machine learning models and data. Our goal is to provide an easy-to-use library comprising of many checks related to various types of issues, such as model…
Machine learning solutions are very popular in the field of chemoinformatics, where they have numerous applications, such as novel drug discovery or molecular property prediction. Molecular fingerprints are algorithms commonly used for…