English
Related papers

Related papers: Towards Explaining STEM Document Classification us…

200 papers

In this paper, we investigate mathematical content representations suitable for the automated classification of and the similarity search in STEM documents using standard machine learning algorithms: the Latent Dirichlet Allocation (LDA)…

Information Retrieval · Computer Science 2021-10-11 Michal Růžička , Petr Sojka

In this paper, we show how selecting and combining encodings of natural and mathematical language affect classification and clustering of documents with mathematical content. We demonstrate this by using sets of documents, sections, and…

Digital Libraries · Computer Science 2020-05-25 Philipp Scharpf , Moritz Schubotz , Abdou Youssef , Felix Hamborg , Norman Meuschke , Bela Gipp

In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning…

Databases · Computer Science 2018-06-14 Markus Schröder , Christian Jilek , Jörn Hees , Andreas Dengel

In this article we report on an initial exploration to assess the viability of using the general large language models (LLMs), recently made public, to classify mathematical documents. Automated classification would be useful from the…

Information Retrieval · Computer Science 2024-06-18 Patrick D. F. Ion , Stephen M. Watt

Advances in large language models (LLMs) have spurred research into enhancing their reasoning capabilities, particularly in math-rich STEM (Science, Technology, Engineering, and Mathematics) documents. While LLMs can generate equations or…

Computation and Language · Computer Science 2025-06-03 Jiaru Zou , Qing Wang , Pratyush Thakur , Nickvash Kani

Current language understanding approaches focus on small documents, such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents like legal briefs,…

Computation and Language · Computer Science 2017-09-05 Muhammad Mahbubur Rahman , Tim Finin

Forms are a widespread type of template-based document used in a great variety of fields including, among others, administration, medicine, finance, or insurance. The automatic extraction of the information included in these documents is…

Computation and Language · Computer Science 2021-12-15 María Villota , César Domínguez , Jónathan Heras , Eloy Mata , Vico Pascual

Companies regularly spend millions of dollars producing electronically-stored documents in legal matters. Recently, parties on both sides of the 'legal aisle' are accepting the use of machine learning techniques like text classification to…

Information Retrieval · Computer Science 2019-12-23 Christian J. Mahoney , Jianping Zhang , Nathaniel Huber-Fliflet , Peter Gronvall , Haozhen Zhao

US corporations regularly spend millions of dollars reviewing electronically-stored documents in legal matters. Recently, attorneys apply text classification to efficiently cull massive volumes of data to identify responsive documents for…

Information Retrieval · Computer Science 2023-11-16 Christian Mahoney , Peter Gronvall , Nathaniel Huber-Fliflet , Jianping Zhang

Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However,…

Digital Libraries · Computer Science 2019-06-28 Norman Meuschke , Vincent Stange , Moritz Schubotz , Michael Karmer , Bela Gipp

Verifying mathematical proofs is difficult, but can be automated with the assistance of a computer. Autoformalization is the task of automatically translating natural language mathematics into a formal language that can be verified by a…

Computation and Language · Computer Science 2024-07-11 Nilay Patel , Rahul Saha , Jeffrey Flanigan

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many…

Information Retrieval · Computer Science 2018-07-23 Tru H. Cao , Vuong M. Ngo , Dung T. Hong , Tho T. Quan

Entity retrieval is the task of finding entities such as people or products in response to a query, based solely on the textual documents they are associated with. Recent semantic entity retrieval algorithms represent queries and experts in…

Information Retrieval · Computer Science 2017-07-26 Christophe Van Gysel , Maarten de Rijke , Evangelos Kanoulas

Text classification helps analyse texts for semantic meaning and relevance, by mapping the words against this hierarchy. An analysis of various types of texts is invaluable to understanding both their semantic meaning, as well as their…

Machine Learning · Computer Science 2022-11-16 Chaitanya Chadha , Vandit Gupta , Deepak Gupta , Ashish Khanna

This master thesis describes an algorithm for automated categorization of scientific documents using deep learning techniques and compares the results to the results of existing classification algorithms. As an additional goal a reusable…

Information Retrieval · Computer Science 2017-06-20 Thomas Krause

Automatic legal text classification systems have been proposed in the literature to address knowledge extraction from judgments and detect their aspects. However, most of these systems are black boxes even when their models are…

In the artificial intelligence area, one of the ultimate goals is to make computers understand human language and offer assistance. In order to achieve this ideal, researchers of computer science have put forward a lot of models and…

Computation and Language · Computer Science 2015-12-07 Mengyun Cao , Jiao Tian , Dezhi Cheng , Jin Liu , Xiaoping Sun

The scientific literature is growing faster than ever. Finding an expert in a particular scientific domain has never been as hard as today because of the increasing amount of publications and because of the ever growing diversity of…

Information Retrieval · Computer Science 2020-04-09 Robin Brochier , Antoine Gourru , Adrien Guille , Julien Velcin

The amount of information stored in the form of documents on the internet has been increasing rapidly. Thus it has become a necessity to organize and maintain these documents in an optimum manner. Text classification algorithms study the…

Computation and Language · Computer Science 2022-02-22 Vedangi Wagh , Snehal Khandve , Isha Joshi , Apurva Wani , Geetanjali Kale , Raviraj Joshi

Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…

Information Retrieval · Computer Science 2019-09-18 Madjid Khalilian , Shiva Hassanzadeh
‹ Prev 1 2 3 10 Next ›