Related papers: Fine-Tuning Pre-Trained Code Models for AI-Generat…

UCSC-NLP at SemEval-2026 Task 13: Multi-View Generalization and Diagnostic Analysis of Machine-Generated Code Detection

With the rapid growth of large language models for code generation, distinguishing between human-written and AI-generated code has become increasingly critical for academic integrity, hiring evaluations, and software security. We present…

Software Engineering · Computer Science 2026-05-01 Kargi Chauhan , Sadiba Nusrat Nur

mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

Multi-domain detection of the machine-generated code snippets in various programming languages is a challenging task. SemEval-2026 Task~13 copes with this challenge in various angles, as a binary detection problem as well as attribution of…

Machine Learning · Computer Science 2026-04-24 Adam Skurla , Dominik Macko , Jakub Simko

AI-generated Text Detection: A Multifaceted Approach to Binary and Multiclass Classification

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing across a wide range of styles and genres. However, such capabilities are prone to potential misuse, such as fake…

Computation and Language · Computer Science 2025-05-20 Harika Abburi , Sanmitra Bhattacharya , Edward Bowen , Nirmala Pudota

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

SemEval-2026 Task 13 investigates machine-generated code detection across multiple programming languages and application scenarios, asking participating systems to generalize to unseen languages and domains. This paper describes our…

Computation and Language · Computer Science 2026-05-07 Elitsa Yotkova , Violeta Kastreva , Dimitar Dimitrov , Ivan Koychev , Preslav Nakov

AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text

SemEval-2024 Task 8 provides a challenge to detect human-written and machine-generated text. There are 3 subtasks for different detection scenarios. This paper proposes a system that mainly deals with Subtask B. It aims to detect if given…

Computation and Language · Computer Science 2024-04-02 Renhua Gu , Xiangfeng Meng

Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking

The paper describes a system designed by Advacheck team to recognise machine-generated and human-written texts in the monolingual subtask of GenAI Detection Task 1 competition. Our developed system is a multi-task architecture with shared…

Computation and Language · Computer Science 2024-11-19 German Gritsai , Anastasia Voznyuk , Ildar Khabutdinov , Andrey Grabovoy

Fine-tuning Large Language Models for Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection

SemEval-2024 Task 8 introduces the challenge of identifying machine-generated texts from diverse Large Language Models (LLMs) in various languages and domains. The task comprises three subtasks: binary classification in monolingual and…

Computation and Language · Computer Science 2024-01-24 Feng Xiong , Thanet Markchom , Ziwei Zheng , Subin Jung , Varun Ojha , Huizhi Liang

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining…

Computation and Language · Computer Science 2024-04-23 Yuxia Wang , Jonibek Mansurov , Petar Ivanov , Jinyan Su , Artem Shelmanov , Akim Tsvigun , Osama Mohammed Afzal , Tarek Mahmoud , Giovanni Puccetti , Thomas Arnold , Chenxi Whitehouse , Alham Fikri Aji , Nizar Habash , Iryna Gurevych , Preslav Nakov

CodeT: Code Generation with Generated Tests

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select…

Computation and Language · Computer Science 2022-11-24 Bei Chen , Fengji Zhang , Anh Nguyen , Daoguang Zan , Zeqi Lin , Jian-Guang Lou , Weizhu Chen

Findings of the Counter Turing Test: AI-Generated Text Detection

The growing capability of large language models to produce fluent, contextually coherent text has created mounting pressure on the systems and institutions responsible for ensuring the authenticity of digital content. Advanced generative…

Computation and Language · Computer Science 2026-05-26 Rajarshi Roy , Gurpreet Singh , Ashhar Aziz , Shashwat Bajpai , Nasrin Imanpour , Shwetangshu Biswas , Kapil Wanaskar , Parth Patwa , Subhankar Ghosh , Shreyas Dixit , Nilesh Ranjan Pal , Vipula Rawte , Ritvik Garimella , Amitava Das , Amit Sheth , Vasu Sharma , Aishwarya Naresh Reganti , Vinija Jain , Aman Chadha

KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection

SemEval-2024 Task 8 is focused on multigenerator, multidomain, and multilingual black-box machine-generated text detection. Such a detection is important for preventing a potential misuse of large language models (LLMs), the newest of which…

Computation and Language · Computer Science 2024-06-18 Michal Spiegel , Dominik Macko

FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning

The growing collaboration between humans and AI models in generative tasks has introduced new challenges in distinguishing between human-written, LLM-generated, and human-LLM collaborative texts. In this work, we collect a multilingual,…

Computation and Language · Computer Science 2026-02-10 Minh Ngoc Ta , Dong Cao Van , Duc-Anh Hoang , Minh Le-Anh , Truong Nguyen , My Anh Tran Nguyen , Yuxia Wang , Preslav Nakov , Sang Dinh

Fine-Grained Detection of AI-Generated Text Using Sentence-Level Segmentation

Generation of Artificial Intelligence (AI) texts in important works has become a common practice that can be used to misuse and abuse AI at various levels. Traditional AI detectors often rely on document-level classification, which…

Computation and Language · Computer Science 2025-09-24 Lekkala Sai Teja , Annepaka Yadagiri , Partha Pakray , Chukhu Chunka , Mangadoddi Srikar Vardhan

LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for Sexism Detection and Classification

Misogyny and sexism are growing problems in social media. Advances have been made in online sexism detection but the systems are often uninterpretable. SemEval-2023 Task 10 on Explainable Detection of Online Sexism aims at increasing…

Computation and Language · Computer Science 2023-06-09 Konstantin Chernyshev , Ekaterina Garanina , Duygu Bayram , Qiankun Zheng , Lukas Edman

Automatic Code Generation using Pre-Trained Language Models

Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly…

Computation and Language · Computer Science 2021-02-23 Luis Perez , Lizi Ottens , Sudharshan Viswanathan

Attention at SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS)

In this paper, we have worked on interpretability, trust, and understanding of the decisions made by models in the form of classification tasks. The task is divided into 3 subtasks. The first task consists of determining Binary Sexism…

Computation and Language · Computer Science 2023-04-11 Debashish Roy , Manish Shrivastava

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding

With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…

Software Engineering · Computer Science 2022-03-16 Deze Wang , Zhouyang Jia , Shanshan Li , Yue Yu , Yun Xiong , Wei Dong , Xiangke Liao

RFBES at SemEval-2024 Task 8: Investigating Syntactic and Semantic Features for Distinguishing AI-Generated and Human-Written Texts

Nowadays, the usage of Large Language Models (LLMs) has increased, and LLMs have been used to generate texts in different languages and for different tasks. Additionally, due to the participation of remarkable companies such as Google and…

Computation and Language · Computer Science 2024-02-26 Mohammad Heydari Rad , Farhan Farsi , Shayan Bali , Romina Etezadi , Mehrnoush Shamsfard

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X

Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we…

Machine Learning · Computer Science 2024-07-11 Qinkai Zheng , Xiao Xia , Xu Zou , Yuxiao Dong , Shan Wang , Yufei Xue , Zihan Wang , Lei Shen , Andi Wang , Yang Li , Teng Su , Zhilin Yang , Jie Tang