Related papers: PubSqueezer: A Text-Mining Web Tool to Transform U…
Highly specific datasets of scientific literature are important for both research and education. However, it is difficult to build such datasets at scale. A common approach is to build these datasets reductively by applying topic modeling…
The majority of big data is unstructured and of this majority the largest chunk is text. While data mining techniques are well developed and standardized for structured, numerical data, the realm of unstructured data is still largely…
Nowadays, with the booming development of the Internet, people benefit from its convenience due to its open and sharing nature. A large volume of natural language texts is being generated by users in various forms, such as search queries,…
Transitive text mining - also named Swanson Linking (SL) after its primary and principal researcher - tries to establish meaningful links between literature sets which are virtually disjoint in the sense that each does not mention the main…
With an exponentially growing number of scientific papers published each year, advanced tools for exploring and discovering publications of interest are becoming indispensable. To empower users beyond a simple keyword search provided e.g.…
Scientific research is highly structured and some of that structure is reflected in research reports. Traditional scientific research reports are yielding to interactive documents which expose their internal structure and are richly linked…
Keeping track of the ever-increasing body of scientific literature is an escalating challenge. We present PubTree a hierarchical search tool that efficiently searches the PubMed/MEDLINE dataset based upon a decision tree constructed using…
It is now commonplace to observe that we are facing a deluge of online information. Researchers have of course long acknowledged the potential value of this information since digital traces make it possible to directly observe, describe and…
We present a novel system providing summaries for Computer Science publications. Through a qualitative user study, we identified the most valuable scenarios for discovery, exploration and understanding of scientific documents. Based on…
Biomedical research yields a wealth of information, much of which is only accessible through the literature. Consequently, literature search is an essential tool for building on prior knowledge in clinical and biomedical research. Although…
The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and…
Tables in scientific papers contain a wealth of valuable knowledge for the scientific enterprise. To help the many of us who frequently consult this type of knowledge, we present Tab2Know, a new end-to-end system to build a Knowledge Base…
A new clinical literature search engine, called CupQ, is presented. It aims to help clinicians stay updated with medical knowledge. Although PubMed is currently one of the most widely used digital libraries for biomedical information, it…
The explosion of scientific literature has made the efficient and accurate extraction of structured data a critical component for advancing scientific knowledge and supporting evidence-based decision-making. However, existing tools often…
Collections of research article data harvested from the web have become common recently since they are important resources for experimenting on tasks such as named entity recognition, text summarization, or keyword generation. In fact,…
The growing volume of scientific literature makes it challenging for scientists to move from a list of papers to a synthesized understanding of a topic. Because of the constant influx of new papers on a daily basis, even if a scientist…
Since the beginning of COVID pandemic, there have been around 700000 scientific papers published on the subject. A human researcher cannot possibly get acquainted with such a huge text corpus -- and therefore developing AI-based tools to…
The Open Access movement in scientific publishing and search engines like Google Scholar have made scientific articles more broadly accessible. During the last decade, the availability of scientific papers in full text has become more and…
Identifying critical research within the growing body of academic work is an intrinsic aspect of conducting quality research. Systematic review processes used in evidence-based medicine formalise this as a procedure that must be followed in…
Scientific information expresses human understanding of nature. This knowledge is largely disseminated in different forms of text, including scientific papers, news articles, and discourse among people on social media. While important for…