Related papers: A Toolkit for Generating Code Knowledge Graphs
Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model…
The continuous growth of scientific literature brings innovations and, at the same time, raises new challenges. One of them is related to the fact that its analysis has become difficult due to the high volume of published papers for which…
We tackle the problem of automatic generation of computer programs from a few pairs of input-output examples. The starting point of this work is the observation that in many applications a solution program must use external knowledge not…
Code search aims to retrieve accurate code snippets based on a natural language query to improve software productivity and quality. With the massive amount of available programs such as (on GitHub or Stack Overflow), identifying and…
Understanding large software systems is a challenging task, especially when code is distributed across multiple repositories and microservices. Developers often need to reason not only about the structure of the code, but also about its…
Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured…
Software comprehension can be extremely time-consuming due to the ever-growing size of codebases. Consequently, there is an increasing need to accelerate the code comprehension process to facilitate maintenance and reduce associated costs.…
Generative models for source code are an interesting structured prediction problem, requiring to reason about both hard syntactic and semantic constraints as well as about natural, likely programs. We present a novel model for this problem…
Virtually every sector of society is experiencing a dramatic growth in the volume of unstructured textual data that is generated and published, from news and social media online interactions, through open access scholarly communications and…
The scientific literature is a rich source of information for data mining with conceptual knowledge graphs; the open science movement has enriched this literature with complementary source code that implements scientific models. To exploit…
Significant advancements have been made in the capabilities of code large language models, leading to their rapid adoption and application across a wide range of domains. However, their further advancements are often constrained by the…
Chart-to-code reconstruction -- the task of recovering executable plotting scripts from chart images -- provides important insights into a model's ability to ground data visualizations in precise, machine-readable form. Yet many existing…
As the demands for large-scale information processing have grown, knowledge graph-based approaches have gained prominence for representing general and domain knowledge. The development of such general representations is essential,…
The availability of vast amounts of visual data with heterogeneous features is a key factor for developing, testing, and benchmarking of new computer vision (CV) algorithms and architectures. Most visual datasets are created and curated for…
This study explores Graph Neural Networks (GNNs) as a transformative tool for code refactoring, using abstract syntax trees (ASTs) to boost software maintainability. It analyzes a dataset of 2 million snippets from CodeSearchNet and a…
Fine-tuning for large language models (LLMs) typically requires substantial amounts of high-quality supervised data, which is both costly and labor-intensive to acquire. While synthetic data generation has emerged as a promising solution,…
We introduce graphcodes, a novel multi-scale summary of the topological properties of a dataset that is based on the well-established theory of persistent homology. Graphcodes handle datasets that are filtered along two real-valued scale…
Enterprise level software is implemented using multi-layer architecture. These layers are often implemented using de-coupled solutions with millions of lines of code. Programmers often have to track and debug a function call from user…
Knowledge graphs are an efficient method for representing and connecting information across various concepts, useful in reasoning, question answering, and knowledge base completion tasks. They organize data by linking points, enabling…
The online programing services, such as Github,TopCoder, and EduCoder, have promoted a lot of social interactions among the service users. However, the existing social interactions is rather limited and inefficient due to the rapid…