Related papers: Lightweight Multilingual Software Analysis
Analysis of multilingual codebases is a topic of increasing importance. In prior work, we have proposed the MLSA (MultiLingual Software Analysis) architecture, an approach to the lightweight analysis of multilingual codebases, and have…
Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis. As software systems grow in complexity, integrating LLMs into code analysis workflows becomes essential for enhancing…
Polyglot programming is gaining popularity as developers integrate multiple programming languages to harness their individual strengths. With the recent popularity of platforms like GraalVM and other multi-language runtimes, creating and…
Identifying dependency call graphs of multilanguage software systems using static code analysis is challenging. The different languages used in developing today's systems often have different lexical, syntactical, and semantic rules that…
This paper presents LOLA, a massively multilingual large language model trained on more than 160 languages using a sparse Mixture-of-Experts Transformer architecture. Our architectural and implementation choices address the challenge of…
Large Language Models (LLMs) are used for many different software engineering tasks. In software architecture, they have been applied to tasks such as classification of design decisions, detection of design patterns, and generation of…
The growing need to integrate information from a large number of diverse sources poses significant scalability challenges for data integration systems. These systems often rely on manually written schema mappings, which are complex,…
Despite various approaches being employed to detect vulnerabilities, the number of reported vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused…
The increasing complexity of software systems has driven significant advancements in program analysis, as traditional methods unable to meet the demands of modern software development. To address these limitations, deep learning techniques,…
Integrating Large Language Models (LLMs) into autonomous agents marks a significant shift in the research landscape by offering cognitive abilities that are competitive with human planning and reasoning. This paper explores the…
Understanding large-scale, complex software systems is a major challenge for developers, who spend a significant portion of their time on program comprehension. Traditional tools such as static visualizations and reverse engineering…
Programming languages possess rich semantic information - such as data flow - that is represented by graphs and not available from the surface form of source code. Recent code language models have scaled to billions of parameters, but model…
Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…
The rapid evolution of large language models (LLMs) has opened new possibilities for automating various tasks in software development. This paper evaluates the capabilities of the Llama 2-70B model in automating these tasks for scientific…
Large Language Models (LLMs) face critical challenges when analyzing security vulnerabilities in real world codebases: token limits prevent loading entire repositories, code embeddings fail to capture inter procedural data flows, and LLMs…
Large Language Models (LLMs) have demonstrated substantial efficacy in advancing graph-structured data analysis. Prevailing LLM-based graph methods excel in adapting LLMs to text-rich graphs, wherein node attributes are text descriptions.…
While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose…
Large Language Models (LLMs) have become increasingly popular for their advanced text generation capabilities across various domains. However, like any software, they face security challenges, including the risk of 'jailbreak' attacks that…
The application of Large Language Models (LLMs) in software engineering, particularly in static analysis tasks, represents a paradigm shift in the field. In this paper, we investigate the role that current LLMs can play in improving…
Large language models have transformed AI-assisted software engineering, but current research remains biased toward high-resource languages such as Python, with weaker performance in languages like Rust and OCaml. Since real-world systems…