Related papers: Popularity Driven Data Integration
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…
Large-scale registries have collected vast amounts of data which has enabled investigators to efficiently conduct studies of observational data. Common practice is for investigators to use all data meeting the inclusion criteria of their…
Reuse of data in new contexts beyond the purposes for which it was originally collected has contributed to technological innovation and reducing the consent burden on data subjects. One of the legal mechanisms that makes such reuse possible…
Big data presents potential but unresolved value as a source for analysis and inference. However,selection bias, present in many of these datasets, needs to be accounted for so that appropriate inferences can be made on the target…
We present Populous, a tool for gathering content with which to populate an ontology. Domain experts need to add content, that is often repetitive in its form, but without having to tackle the underlying ontological representation. Populous…
In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial…
Data only generates value for a few organizations with expertise and resources to make data shareable, discoverable, and easy to integrate. Sharing data that is easy to discover and integrate is hard because data owners lack information…
Data is a precious resource in today's society, and is generated at an unprecedented and constantly growing pace. The need to store, analyze, and make data promptly available to a multitude of users introduces formidable challenges in…
Sharing and reusing research data can effectively reduce redundant efforts in data collection and curation, especially for small labs and research teams conducting human-centered system research, and enhance the replicability of evaluation…
Collective intelligence, which aggregates the shared information from large crowds, is often negatively impacted by unreliable information sources with the low quality data. This becomes a barrier to the effective use of collective…
Popularity is often included in experimental evaluation to provide a reference performance for a recommendation task. To understand how popularity baseline is defined and evaluated, we sample 12 papers from top-tier conferences including…
With the increasing amount of data and use of computation in science, software has become an important component in many different domains. Computing is now being used more often and in more aspects of scientific work including data…
Social scientists have long sought to understand why certain people, items, or options become more popular than others. One seemingly intuitive theory is that inherent value drives popularity. An alternative theory claims that popularity is…
Informatics and technological advancements have triggered generation of huge volume of data with varied complexity in its management and analysis. Big Data analytics is the practice of revealing hidden aspects of such data and making…
Retrievability measures the influence a retrieval system has on the access to information in a given collection of items. This measure can help in making an evaluation of the search system based on which insights can be drawn. In this…
Faced with over 100M open source projects most empirical investigations select a subset. Most research papers in leading venues investigated filtering projects by some measure of popularity with explicit or implicit arguments that unpopular…
Advent of the Internet-of-Things will allow us to optimize equipment and resource usage, enabling increased efficiencies in automation and enabling new and more cost efficient business model. As tremendous growth opportunities emerge, so do…
Big Data may not be the solution many are looking for. The latest rise of Big Data methods and systems is partly due to the new abilities these techniques provide, partly to the simplicity of the software design and partly because the…
We present PLUTO (Public VaLUe Assessment TOol), a framework for assessing the public value of specific instances of data use. Grounded in the concept of data solidarity, PLUTO aims to empower diverse stakeholders-including regulatory…
Recommendation and ranking systems are known to suffer from popularity bias; the tendency of the algorithm to favor a few popular items while under-representing the majority of other items. Prior research has examined various approaches for…