Related papers: Deep Data Flow Analysis

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

The increasing complexity of computing systems places a tremendous burden on optimizing compilers, requiring ever more accurate and aggressive optimizations. Machine learning offers significant benefits for constructing optimization…

Machine Learning · Computer Science 2020-03-25 Chris Cummins , Zacharias V. Fisches , Tal Ben-Nun , Torsten Hoefler , Hugh Leather

CompilerGPT: Leveraging Large Language Models for Analyzing and Acting on Compiler Optimization Reports

Current compiler optimization reports often present complex, technical information that is difficult for programmers to interpret and act upon effectively. This paper assesses the capability of large language models (LLM) to understand…

Programming Languages · Computer Science 2025-06-16 Peter Pirkelbauer , Chunhua Liao

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc…

Machine Learning · Computer Science 2025-12-19 Hao Liang , Xiaochen Ma , Zhou Liu , Zhen Hao Wong , Zhengyang Zhao , Zimo Meng , Runming He , Chengyu Shen , Qifeng Cai , Zhaoyang Han , Meiyi Qiang , Yalin Feng , Tianyi Bai , Zewei Pan , Ziyi Guo , Yizhen Jiang , Jingwen Deng , Qijie You , Peichao Lai , Tianyu Guo , Chi Hsu Tsai , Hengyi Feng , Rui Hu , Wenkai Yu , Junbo Niu , Bohan Zeng , Ruichuan An , Lu Ma , Jihao Huang , Yaowei Zheng , Conghui He , Linpeng Tang , Bin Cui , Weinan E , Wentao Zhang

Capturing Semantic Flow of ML-based Systems

ML-based systems are software systems that incorporates machine learning components such as Deep Neural Networks (DNNs) or Large Language Models (LLMs). While such systems enable advanced features such as high performance computer vision,…

Software Engineering · Computer Science 2025-03-14 Shin Yoo , Robert Feldt , Somin Kim , Naryeong Kim

Instrumentation and Analysis of Native ML Pipelines via Logical Query Plans

Machine Learning (ML) is increasingly used to automate impactful decisions, which leads to concerns regarding their correctness, reliability, and fairness. We envision highly-automated software platforms to assist data scientists with…

Databases · Computer Science 2024-09-04 Stefan Grafberger

ML-driven Hardware Cost Model for MLIR

During early optimization passes, compilers must make predictions for machine-dependent characteristics such as execution unit utilization, number of register spills, latency, throughput etc. to generate better code. Often a hand-written…

Machine Learning · Computer Science 2023-02-23 Dibyendu Das , Sandya Mannarswamy

Scheduling That Speaks: An Interpretable Programmatic Reinforcement Learning Framework

Deep reinforcement learning (DRL) has recently emerged as a promising approach to solve combinatorial optimization problems such as job shop scheduling. However, the policies learned by DRL are typically represented by deep neural networks…

Machine Learning · Computer Science 2026-05-19 Chengpeng Hu , Yingqian Zhang , Hendrik Baier

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

The need to analyze graphs is ubiquitous across various fields, from social networks to biological research and recommendation systems. Therefore, enabling the ability of large language models (LLMs) to process graphs is an important step…

Computation and Language · Computer Science 2025-11-04 Xin Li , Weize Chen , Qizhi Chu , Haopeng Li , Zhaojun Sun , Ran Li , Chen Qian , Yiwei Wei , Zhiyuan Liu , Chuan Shi , Maosong Sun , Cheng Yang

STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison

Machine learning (ML) offers powerful methods for detecting and modeling associations often in data with large feature spaces and complex associations. Many useful tools/packages (e.g. scikit-learn) have been developed to make the various…

Machine Learning · Computer Science 2022-06-27 Ryan J. Urbanowicz , Robert Zhang , Yuhan Cui , Pranshu Suri

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization

Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of code and compiler optimization remains underexplored. Training…

Programming Languages · Computer Science 2024-07-04 Chris Cummins , Volker Seeker , Dejan Grubisic , Baptiste Roziere , Jonas Gehring , Gabriel Synnaeve , Hugh Leather

Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction

Existing defects in software components is unavoidable and leads to not only a waste of time and money but also many serious consequences. To build predictive models, previous studies focus on manually extracting features or using tree…

Software Engineering · Computer Science 2018-02-15 Anh Viet Phan , Minh Le Nguyen , Lam Thu Bui

A Scalable AutoML Approach Based on Graph Neural Networks

AutoML systems build machine learning models automatically by performing a search over valid data transformations and learners, along with hyper-parameter optimization for each learner. Many AutoML systems use meta-learning to guide search…

Machine Learning · Computer Science 2022-07-18 Mossad Helali , Essam Mansour , Ibrahim Abdelaziz , Julian Dolby , Kavitha Srinivas

PERFOGRAPH: A Numerical Aware Program Graph Representation for Performance Optimization and Program Analysis

The remarkable growth and significant success of machine learning have expanded its applications into programming languages and program analysis. However, a key challenge in adopting the latest machine learning methods is the representation…

Programming Languages · Computer Science 2023-12-01 Ali TehraniJamsaz , Quazi Ishtiaque Mahmud , Le Chen , Nesreen K. Ahmed , Ali Jannesari

Benchmark and Survey of Automated Machine Learning Frameworks

Machine learning (ML) has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning…

Machine Learning · Computer Science 2021-01-27 Marc-André Zöller , Marco F. Huber

Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models

High-level automation is increasingly critical in AI, driven by rapid advances in large language models (LLMs) and AI agents. However, LLMs, despite their general reasoning power, struggle significantly in specialized, data-sensitive tasks…

Machine Learning · Computer Science 2026-02-12 Jialiang Wang , Hanmo Liu , Shimin Di , Zhili Wang , Jiachuan Wang , Lei Chen , Xiaofang Zhou

Incremental Search Space Construction for Machine Learning Pipeline Synthesis

Automated machine learning (AutoML) aims for constructing machine learning (ML) pipelines automatically. Many studies have investigated efficient methods for algorithm selection and hyperparameter optimization. However, methods for ML…

Machine Learning · Computer Science 2021-01-27 Marc-André Zöller , Tien-Dung Nguyen , Marco F. Huber

ComPile: A Large IR Dataset from Production Sources

Code is increasingly becoming a core data modality of modern machine learning research impacting not only the way we write code with conversational agents like OpenAI's ChatGPT, Google's Bard, or Anthropic's Claude, the way we translate…

Programming Languages · Computer Science 2024-05-01 Aiden Grossman , Ludger Paehler , Konstantinos Parasyris , Tal Ben-Nun , Jacob Hegna , William Moses , Jose M Monsalve Diaz , Mircea Trofin , Johannes Doerfert

Data Pipeline Training: Integrating AutoML to Optimize the Data Flow of Machine Learning Models

Data Pipeline plays an indispensable role in tasks such as modeling machine learning and developing data products. With the increasing diversification and complexity of Data sources, as well as the rapid growth of data volumes, building an…

Machine Learning · Computer Science 2024-02-21 Jiang Wu , Hongbo Wang , Chunhe Ni , Chenwei Zhang , Wenran Lu

End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning

To enable heterogeneous computing systems with autonomous programming and optimization capabilities, we propose a unified, end-to-end, programmable graph representation learning (PGL) framework that is capable of mining the complexity of…

Machine Learning · Computer Science 2022-04-27 Yao Xiao , Guixiang Ma , Nesreen K. Ahmed , Mihai Capota , Theodore Willke , Shahin Nazarian , Paul Bogdan

LLM-Aided Customizable Profiling of Code Data Based On Programming Language Concepts

Data profiling is critical in machine learning for generating descriptive statistics, supporting both deeper understanding and downstream tasks like data valuation and curation. This work addresses profiling specifically in the context of…

Software Engineering · Computer Science 2025-03-21 Pankaj Thorat , Adnan Qidwai , Adrija Dhar , Aishwariya Chakraborty , Anand Eswaran , Hima Patel , Praveen Jayachandran