English
Related papers

Related papers: Developing Open Data Models for Linguistic Field D…

200 papers

The Data Web refers to the vast and rapidly increasing quantity of scientific, corporate, government and crowd-sourced data published in the form of Linked Open Data, which encourages the uniform representation of heterogeneous data items…

The scientific enterprise depends critically on the preservation of and open access to published data. This basic tenet applies acutely to phylogenies (estimates of evolutionary relationships among species). Increasingly, phylogenies are…

Populations and Evolution · Quantitative Biology 2017-02-08 Andrew F. Magee , Michael R. May , Brian R. Moore

Qualitative data analysis is labor-intensive, yet the privacy risks associated with commercial Large Language Models (LLMs) often preclude their use in sensitive research. To address this, we introduce ChatQDA, an on-device framework…

Human-Computer Interaction · Computer Science 2026-02-23 Tung T. Ngo , Dai Nguyen Van , Anh-Minh Nguyen , Phuong-Anh Do , Anh Nguyen-Quoc

Large language models (LLMs) are increasingly deployed on edge devices under strict computation and quantization constraints, yet their security implications remain unclear. We study query-based knowledge extraction from quantized…

Cryptography and Security · Computer Science 2026-03-26 Ao Ding , Hongzong Li , Zi Liang , Zhanpeng Shi , Shuxin Zhuang , Shiqin Tang , Rong Feng , Ping Lu

Linguistic fieldwork is an important component in language documentation and preservation. However, it is a long, exhaustive, and time-consuming process. This paper presents a novel model that guides a linguist during the fieldwork and…

Computation and Language · Computer Science 2024-12-17 Aso Mahmudi , Borja Herce , Demian Inostroza Amestica , Andreas Scherbakov , Eduard Hovy , Ekaterina Vylomova

Large Language Models (LLMs) have rapidly increased in size and apparent capabilities in the last three years, but their training data is largely English text. There is growing interest in multilingual LLMs, and various efforts are striving…

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks due to large training datasets and powerful transformer architecture. However, the reliability of responses from LLMs remains a question.…

Computation and Language · Computer Science 2025-02-26 Tiejin Chen , Xiaoou Liu , Longchao Da , Jia Chen , Vagelis Papalexakis , Hua Wei

Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity. Despite their significance, these languages face critical challenges, including data scarcity and…

Recent digitisation efforts in natural history museums have produced large volumes of collection data, yet their scale and scientific complexity often hinder public access and understanding. Conventional data management tools, such as…

Human-Computer Interaction · Computer Science 2026-03-12 Yiyuan Wang , Andrew Johnston , Zoë Sadokierski , Rhiannon Stephens , Shane T. Ahyong

Analytics on structured data is a mature field with many successful methods. However, most real world data exists in unstructured form, such as images and conversations. We investigate the potential of Large Language Models (LLMs) to enable…

The usage and amount of information available on the internet increase over the past decade. This digitization leads to the need for automated answering system to extract fruitful information from redundant and transitional knowledge…

Computation and Language · Computer Science 2022-02-03 Hariom A. Pandya , Brijesh S. Bhatt

Speech datasets are crucial for training Speech Language Technologies (SLT); however, the lack of diversity of the underlying training data can lead to serious limitations in building equitable and robust SLT products, especially along…

The growing number of languages considered in multilingual NLP, including new datasets and tasks, poses challenges regarding properly and accurately reporting which languages are used and how. For example, datasets often use different…

Computation and Language · Computer Science 2026-03-03 Wessel Poelman , Yiyi Chen , Miryam de Lhoneux

Recent progress in NLP is driven by pretrained models leveraging massive datasets and has predominantly benefited the world's political and economic superpowers. Technologically underserved languages are left behind because they lack such…

Computation and Language · Computer Science 2022-03-21 Clarissa Forbes , Farhan Samir , Bruce Harold Oliver , Changbing Yang , Edith Coates , Garrett Nicolai , Miikka Silfverberg

Large Language Models are designed to understand complex Human Language. Yet, Understanding of animal language has long intrigued researchers striving to bridge the communication gap between humans and other species. This research paper…

Computation and Language · Computer Science 2023-06-13 Anees Aslam

The rapid development of large language models (LLMs) is redefining the landscape of human-computer interaction, and their integration into various user-service applications is becoming increasingly prevalent. However, transmitting user…

Computation and Language · Computer Science 2025-02-20 Guangwei Li , Yuansen Zhang , Yinggui Wang , Shoumeng Yan , Lei Wang , Tao Wei

Natural Language Processing (NLP) is today a very active field of research and innovation. Many applications need however big sets of data for supervised learning, suitably labelled for the training purpose. This includes applications for…

Computation and Language · Computer Science 2021-02-23 ElMehdi Boujou , Hamza Chataoui , Abdellah El Mekki , Saad Benjelloun , Ikram Chairi , Ismail Berrada

`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions - audio, video and/or physiological recordings - or it may be textual. The added…

Computation and Language · Computer Science 2007-05-23 Steven Bird , Mark Liberman

`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added…

Computation and Language · Computer Science 2007-05-23 Steven Bird , Mark Liberman

This paper introduces UQA, a novel dataset for question answering and text comprehension in Urdu, a low-resource language with over 70 million native speakers. UQA is generated by translating the Stanford Question Answering Dataset…

Computation and Language · Computer Science 2024-07-24 Samee Arif , Sualeha Farid , Awais Athar , Agha Ali Raza
‹ Prev 1 2 3 10 Next ›