English
Related papers

Related papers: Data Curation APIs

200 papers

Over the past years, there has been many efforts to curate and increase the added value of the raw data. Data curation has been defined as activities and processes an analyst undertakes to transform the raw data into contextualized data and…

Information Retrieval · Computer Science 2020-07-20 Alireza Tabebordbar

Data curation - the process of discovering, integrating, and cleaning data - is one of the oldest, hardest, yet inevitable data management problems. Despite decades of efforts from both researchers and practitioners, it is still one of the…

Databases · Computer Science 2019-03-26 Saravanan Thirumuruganathan , Nan Tang , Mourad Ouzzani , AnHai Doan

Social media platforms have empowered the democratization of the pulse of people in the modern era. Due to its immense popularity and high usage, data published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich ocean of…

Social and Information Networks · Computer Science 2020-02-24 Kushal Vaghani

Effective data-driven biomedical discovery requires data curation: a time-consuming process of finding, organizing, distilling, integrating, interpreting, annotating, and validating diverse information into a structured form suitable for…

This report provides practical guidance to teams designing or developing AI-enabled systems for how to promote trustworthiness during the data curation phase of development. In this report, the authors first define data, the data curation…

Data curation is the process of making a dataset fit-for-use and archiveable. It is critical to data-intensive science because it makes complex data pipelines possible, makes studies reproducible, and makes data (re)usable. Yet the…

AI tools are increasingly deployed in community contexts. However, datasets used to evaluate AI are typically created by developers and annotators outside a given community, which can yield misleading conclusions about AI performance. How…

Human-Computer Interaction · Computer Science 2024-02-23 Tzu-Sheng Kuo , Aaron Halfaker , Zirui Cheng , Jiwoo Kim , Meng-Hsin Wu , Tongshuang Wu , Kenneth Holstein , Haiyi Zhu

Using APIs to develop software applications is the norm. APIs help developers to build applications faster as they do not need to reinvent the wheel. It is therefore important for developers to understand the APIs that they plan to use.…

Software Engineering · Computer Science 2023-04-06 Ferdian Thung , Kisub Kim , Ting Zhang , Ivana Clairine Irsan , Ratnadira Widyasari , Zhou Yang , David Lo

In code review, generating structured and relevant comments is crucial for identifying code issues and facilitating accurate code changes that ensure an efficient code review process. Well-crafted comments not only streamline the code…

Software Engineering · Computer Science 2025-02-06 Oussama Ben Sghaier , Martin Weyssow , Houari Sahraoui

Mainstream knowledge management researchers generally agree that knowledge extracted from unstructured data and semi-structured data have become imperative for organizational strategic decision making. In this research, we develop a…

Information Retrieval · Computer Science 2020-07-15 Gerald Onwujekwe , Kweku-Muata Osei-Bryson , Nnatubemugo Ngwum

In the evolving landscape of clinical informatics, the integration and utilization of software tools developed through governmental funding represent a pivotal advancement in research and application. However, the dispersion of these tools…

Digital Libraries · Computer Science 2024-03-28 Jeremy R. Harper

Curated databases have become important sources of information across scientific disciplines, and due to the manual work of experts, often become important reference works. Features such as provenance tracking, archiving, and data citation…

Programming Languages · Computer Science 2021-07-20 Simon Fowler , Simon D. Harding , Joanna Sharman , James Cheney

Big data analysis has become an active area of study with the growth of machine learning techniques. To properly analyze data, it is important to maintain high-quality data. Thus, research on data cleaning is also important. It is difficult…

Databases · Computer Science 2019-10-25 Toshiyuki Shimizu , Hiroki Omori , Masatoshi Yoshikawa

As the volume of publicly available data continues to grow, researchers face the challenge of limited diversity in benchmarking machine learning tasks. Although thousands of datasets are available in public repositories, the sheer abundance…

Information Retrieval · Computer Science 2025-02-25 Mara Graziani , Malina Molnar , Irina Espejo Morales , Joris Cadow-Gossweiler , Teodoro Laino

Data stream algorithms tackle operations on high-volume sequences of read-once data items. Data stream scenarios include inherently real-time systems like sensor networks and financial markets. They also arise in purely-computational…

Data Structures and Algorithms · Computer Science 2024-03-04 Matthew Andres Moreno , Santiago Rodriguez Papa , Emily Dolson

Big data refers to large and complex data sets that, under existing approaches, exceed the capacity and capability of current compute platforms, systems software, analytical tools and human understanding. Numerous lessons on the scalability…

The application of AI tools to the legal field feels natural: large legal document collections could be used with specialized AI to improve workflow efficiency for lawyers and ameliorate the "justice gap" for underserved clients. However,…

Computation and Language · Computer Science 2025-04-03 Allison Koenecke , Jed Stiglitz , David Mimno , Matthew Wilkens

Twitter introduced user lists in late 2009, allowing users to be grouped according to meaningful topics or themes. Lists have since been adopted by media outlets as a means of organising content around news stories. Thus the curation of…

Social and Information Networks · Computer Science 2012-07-03 Derek Greene , Gavin Sheridan , Barry Smyth , Pádraig Cunningham

Many questions in computational social science rely on datasets assembled from heterogeneous online sources, a process that is often labor-intensive, costly, and difficult to reproduce. Recent advances in large language models enable…

Computation and Language · Computer Science 2026-01-07 Mengyi Sun

Improving data quality in unstructured documents is a long-standing challenge. Unstructured data, especially in textual form, inherently lacks defined semantics, which poses significant challenges for effective processing and for ensuring…

Databases · Computer Science 2025-02-26 Besat Kassaie , Frank Wm. Tompa
‹ Prev 1 2 3 10 Next ›