English
Related papers

Related papers: Code4ML: a Large-scale Dataset of annotated Machin…

200 papers

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation.…

Over the last several decades, software has been woven into the fabric of every aspect of our society. As software development surges and code infrastructure of enterprise applications ages, it is now more critical than ever to increase…

The availability of vast amounts of publicly accessible data of source code and the advances in modern language models, coupled with increasing computational resources, have led to a remarkable surge in the development of large language…

Software Engineering · Computer Science 2024-10-01 Zhou Yang , Jieke Shi , Premkumar Devanbu , David Lo

Motivated by recent work on lifelong learning applications for language models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused on code changes. Our contribution addresses a notable research gap marked by the absence…

Software Engineering · Computer Science 2023-12-21 Martin Weyssow , Claudio Di Sipio , Davide Di Ruscio , Houari Sahraoui

The availability of large-scale datasets on which to train, benchmark and test algorithms has been central to the rapid development of machine learning as a discipline and its maturity as a research discipline. Despite considerable…

Quantum Physics · Physics 2021-08-17 Elija Perrier , Akram Youssry , Chris Ferrie

One of the central tasks in software maintenance is being able to understand and develop code changes. Thus, given a natural language description of the desired new operation of a function, an agent (human or AI) might be asked to generate…

Software Engineering · Computer Science 2025-02-05 Kunal Pai , Premkumar Devanbu , Toufique Ahmed

Machine Learning for Source Code (ML4Code) is an active research field in which extensive experimentation is needed to discover how to best use source code's richly structured information. With this in mind, we introduce JEMMA, an…

Software Engineering · Computer Science 2022-12-20 Anjan Karmakar , Miltiadis Allamanis , Romain Robbes

Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only…

Computation and Language · Computer Science 2023-06-28 Ryo Sekizawa , Nan Duan , Shuai Lu , Hitomi Yanaka

The coding capabilities of large language models (LLMs) have opened up new opportunities for automatic statistical analysis in machine learning and data science. However, before their widespread adoption, it is crucial to assess the…

Applications · Statistics 2025-02-26 Xinyi Song , Lina Lee , Kexin Xie , Xueying Liu , Xinwei Deng , Yili Hong

Effective code documentation is essential for collaboration, comprehension, and long-term software maintainability, yet developers often neglect it due to its repetitive nature. Automated documentation generation has evolved from heuristic…

Software Engineering · Computer Science 2026-02-10 Mojtaba Mostafavi Ghahfarokhi , Hamed Jahantigh , Alireza Asadi , Abbas Heydarnoori

As big data grows ubiquitous across many domains, more and more stakeholders seek to develop Machine Learning (ML) applications on their data. The success of an ML application usually depends on the close collaboration of ML experts and…

Software Engineering · Computer Science 2022-11-10 Md Abdullah Al Alamin , Gias Uddin

Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language (NL) prompts. However, in the real world, NL is often too ambiguous to capture the true intent behind programming problems,…

Machine Learning · Computer Science 2024-03-18 Yeming Wen , Pengcheng Yin , Kensen Shi , Henryk Michalewski , Swarat Chaudhuri , Alex Polozov

Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources…

Machine Learning · Computer Science 2025-10-28 Amal Abed , Ivan Lukic , Jörg K. H. Franke , Frank Hutter

In recent years, a lot of technological advances in computer science have aided software programmers to create innovative and real-time user-friendly software. With the creation of the software and the urging interest of people to learn to…

Software Engineering · Computer Science 2025-03-25 Kuldeep Gautam , S. VenkataKeerthy , Ramakrishna Upadrasta

Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case…

Software Engineering · Computer Science 2023-10-30 Xinyu She , Yue Liu , Yanjie Zhao , Yiling He , Li Li , Chakkrit Tantithamthavorn , Zhan Qin , Haoyu Wang

The adoption of Large Language Models (LLMs) for code generation in data science offers substantial potential for enhancing tasks such as data manipulation, statistical analysis, and visualization. However, the effectiveness of these models…

Software Engineering · Computer Science 2024-11-20 Nathalia Nascimento , Everton Guimaraes , Sai Sanjna Chintakunta , Santhosh Anitha Boominathan

The generation of large, high-quality datasets for code understanding and generation remains a significant challenge, particularly when aligning decompiled binaries with their original source code. To address this, we present CodableLLM, a…

Software Engineering · Computer Science 2025-07-31 Dylan Manuel , Paul Rad

Machine Learning (ML) has revamped every domain of life as it provides powerful tools to build complex systems that learn and improve from experience and data. Our key insight is that to solve a machine learning problem, data scientists do…

Software Engineering · Computer Science 2018-02-07 Muhammad Zubair Malik , Muhammad Nawaz , Nimrah Mustafa , Junaid Haroon Siddiqui

Classifying network traffic is the basis for important network applications. Prior research in this area has faced challenges on the availability of representative datasets, and many of the results cannot be readily reproduced. Such a…

Cryptography and Security · Computer Science 2020-04-29 Onur Barut , Yan Luo , Tong Zhang , Weigang Li , Peilong Li

Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring,…

Machine Learning · Computer Science 2026-02-05 Nadia Daoudi , Jordi Cabot
‹ Prev 1 2 3 10 Next ›