Related papers: Native Language Identification using Stacked Gener…

Native Language Identification on Text and Speech

This paper presents an ensemble system combining the output of multiple SVM classifiers to native language identification (NLI). The system was submitted to the NLI Shared Task 2017 fusion track which featured students essays and spoken…

Computation and Language · Computer Science 2017-07-25 Marcos Zampieri , Alina Maria Ciobanu , Liviu P. Dinu

Unravelling Interlanguage Facts via Explainable Machine Learning

Native language identification (NLI) is the task of training (via supervised machine learning) a classifier that guesses the native language of the author of a text. This task has been extensively researched in the last decade, and the…

Computation and Language · Computer Science 2022-08-03 Barbara Berti , Andrea Esuli , Fabrizio Sebastiani

Scaling Native Language Identification with Transformer Adapters

Native language identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is useful for a variety of purposes including marketing,…

Computation and Language · Computer Science 2022-11-21 Ahmet Yavuz Uluslu , Gerold Schneider

Native Language Identification with Big Bird Embeddings

Native Language Identification (NLI) intends to classify an author's native language based on their writing in another language. Historically, the task has heavily relied on time-consuming linguistic feature engineering, and…

Computation and Language · Computer Science 2023-09-14 Sergey Kramp , Giovanni Cassani , Chris Emmery

Native Language Identification with Large Language Models

We present the first experiments on Native Language Identification (NLI) using LLMs such as GPT-4. NLI is the task of predicting a writer's first language by analyzing their writings in a second language, and is used in second language…

Computation and Language · Computer Science 2023-12-14 Wei Zhang , Alexandre Salle

Leveraging Open-Source Large Language Models for Native Language Identification

Native Language Identification (NLI) - the task of identifying the native language (L1) of a person based on their writing in the second language (L2) - has applications in forensics, marketing, and second language acquisition.…

Computation and Language · Computer Science 2025-01-22 Yee Man Ng , Ilia Markov

Robust Native Language Identification through Agentic Decomposition

Large language models (LLMs) often achieve high performance in native language identification (NLI) benchmarks by leveraging superficial contextual clues such as names, locations, and cultural stereotypes, rather than the underlying…

Computation and Language · Computer Science 2025-09-23 Ahmet Yavuz Uluslu , Tannon Kew , Tilia Ellendorff , Gerold Schneider , Rico Sennrich

Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation

Code vulnerability detection is crucial for ensuring the security and reliability of modern software systems. Recently, Large Language Models (LLMs) have shown promising capabilities in this domain. However, notable discrepancies in…

Software Engineering · Computer Science 2025-09-19 Zhihong Sun , Jia Li , Yao Wan , Chuanyi Li , Hongyu Zhang , Zhi jin , Ge Li , Hong Liu , Chen Lyu , Songlin Hu

How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics

Natural Language Inference (NLI) evaluation is crucial for assessing language understanding models; however, popular datasets suffer from systematic spurious correlations that artificially inflate actual model performance. To address this,…

Computation and Language · Computer Science 2024-10-07 Adrian Cosma , Stefan Ruseti , Mihai Dascalu , Cornelia Caragea

Can string kernels pass the test of time in Native Language Identification?

We describe a machine learning approach for the 2017 shared task on Native Language Identification (NLI). The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character…

Computation and Language · Computer Science 2017-08-07 Radu Tudor Ionescu , Marius Popescu

Native Language Identification using i-vector

The task of determining a speaker's native language based only on his speeches in a second language is known as Native Language Identification or NLI. Due to its increasing applications in various domains of speech signal processing, this…

Computation and Language · Computer Science 2018-11-15 Ahmed Nazim Uddin , Md Ashequr Rahman , Md. Rafidul Islam , Mohammad Ariful Haque

XStacking: Explanation-Guided Stacked Ensemble Learning

Ensemble Machine Learning (EML) techniques, especially stacking, have been shown to improve predictive performance by combining multiple base models. However, they are often criticized for their lack of interpretability. In this paper, we…

Machine Learning · Computer Science 2025-09-16 Moncef Garouani , Ayah Barhrhouj , Olivier Teste

EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code

Automated detection of software vulnerabilities is critical for enhancing security, yet existing methods often struggle with the complexity and diversity of modern codebases. In this paper, we introduce EnStack, a novel ensemble stacking…

Software Engineering · Computer Science 2024-11-26 Shahriyar Zaman Ridoy , Md. Shazzad Hossain Shaon , Alfredo Cuzzocrea , Mst Shapna Akter

Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference

Large language models (LLMs) are increasingly applied in multilingual contexts, yet their capacity for consistent, logically grounded alignment across languages remains underexplored. We present a controlled evaluation framework for…

Computation and Language · Computer Science 2025-08-21 Samir Abdaljalil , Erchin Serpedin , Khalid Qaraqe , Hasan Kurban

Analyzing Compositionality-Sensitivity of NLI Models

Success in natural language inference (NLI) should require a model to understand both lexical and compositional semantics. However, through adversarial evaluation, we find that several state-of-the-art models with diverse architectures are…

Computation and Language · Computer Science 2018-11-20 Yixin Nie , Yicheng Wang , Mohit Bansal

A Comparative Survey of Recent Natural Language Interfaces for Databases

Over the last few years natural language interfaces (NLI) for databases have gained significant traction both in academia and industry. These systems use very different approaches as described in recent survey papers. However, these systems…

Databases · Computer Science 2019-09-05 Katrin Affolter , Kurt Stockinger , Abraham Bernstein

Happiness is Sharing a Vocabulary: A Study of Transliteration Methods

Transliteration has emerged as a promising means to bridge the gap between various languages in multilingual NLP, showing promising results especially for languages using non-Latin scripts. We investigate the degree to which shared script,…

Computation and Language · Computer Science 2026-03-25 Haeji Jung , Jinju Kim , Kyungjin Kim , Youjeong Roh , David R. Mortensen

Improving Natural Language Inference with a Pretrained Parser

We introduce a novel approach to incorporate syntax into natural language inference (NLI) models. Our method uses contextual token-level vector representations from a pretrained dependency parser. Like other contextual embedders, our method…

Computation and Language · Computer Science 2019-09-19 Deric Pang , Lucy H. Lin , Noah A. Smith

Learning Meta-Embeddings by Using Ensembles of Embedding Sets

Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured…

Computation and Language · Computer Science 2015-12-31 Wenpeng Yin , Hinrich Schütze

Reliable Evaluations for Natural Language Inference based on a Unified Cross-dataset Benchmark

Recent studies show that crowd-sourced Natural Language Inference (NLI) datasets may suffer from significant biases like annotation artifacts. Models utilizing these superficial clues gain mirage advantages on the in-domain testing set,…

Computation and Language · Computer Science 2020-10-16 Guanhua Zhang , Bing Bai , Jian Liang , Kun Bai , Conghui Zhu , Tiejun Zhao