English
Related papers

Related papers: Minimalist Data Wrangling with Python

200 papers

In these lecture notes, a selection of frequently required statistical tools will be introduced and illustrated. They allow to post-process data that stem from, e.g., large-scale numerical simulations (aka sequence of random experiments).…

Data Analysis, Statistics and Probability · Physics 2012-07-26 O. Melchert

A large amount of data is produced every second from modern information systems such as mobile devices, the world wide web, Internet of Things, social media, etc. Analysis and mining of this massive data requires a lot of advanced tools and…

Machine Learning · Computer Science 2020-01-13 Rising Odegua , Festus Ikpotokin

Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the…

Other Statistics · Statistics 2015-03-20 Ben Baumer

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done…

Databases · Computer Science 2021-09-16 Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov , Arash Termehchy

We describe how Python can be leveraged to streamline the curation, modelling and dissemination of drug discovery data as well as the development of innovative, freely available tools for the related scientific community. We look at various…

Other Computer Science · Computer Science 2016-07-05 Michał Nowotka , George Papadatos , Mark Davies , Nathan Dedman , Anne Hersey

This paper explores an innovative approach to teaching data wrangling skills to students through hands-on activities before transitioning to coding. Data wrangling, a critical aspect of data analysis, involves cleaning, transforming, and…

Human-Computer Interaction · Computer Science 2025-03-24 Lucy D'Agostino McGowan

Python has become the prime language for application development in the Data Science and Machine Learning domains. However, data scientists are not necessarily experienced programmers. While Python lets them quickly implement their…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-24 Oscar Castro , Pierrick Bruneau , Jean-Sébastien Sottet , Dario Torregrossa

Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases…

Cryptography and Security · Computer Science 2024-08-21 Judith Sáinz-Pardo Díaz , Álvaro López García

Surveys are an important research tool, providing unique measurements on subjective experiences such as sentiment and opinions that cannot be measured by other means. However, because survey data is collected from a self-selected group of…

Computation · Statistics 2023-07-14 Tal Sarig , Tal Galili , Roee Eilat

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the…

Machine Learning · Computer Science 2020-04-01 Sebastian Raschka , Joshua Patterson , Corey Nolet

The process of preparing potentially large and complex data sets for further analysis or manual examination is often called data wrangling. In classical warehousing environments, the steps in such a process have been carried out using…

The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the…

We describe an introductory data science course, entitled Introduction to Data Science, offered at the University of Illinois at Urbana-Champaign. The course introduced general programming concepts by using the Python programming language…

Other Statistics · Statistics 2016-04-27 Robert J. Brunner , Edward J. Kim

Data mining is about obtaining new knowledge from existing datasets. However, the data in the existing datasets can be scattered, noisy, and even incomplete. Although lots of effort is spent on developing or fine-tuning data mining models…

Machine Learning · Computer Science 2019-06-21 Canchen Li

Managing the data for Information Retrieval (IR) experiments can be challenging. Dataset documentation is scattered across the Internet and once one obtains a copy of the data, there are numerous different data formats to work with. Even…

Information Retrieval · Computer Science 2021-05-11 Sean MacAvaney , Andrew Yates , Sergey Feldman , Doug Downey , Arman Cohan , Nazli Goharian

Data minimisation is a privacy-enhancing principle considered as one of the pillars of personal data regulations. This principle dictates that personal data collected should be no more than necessary for the specific purpose consented by…

Cryptography and Security · Computer Science 2016-11-18 Thibaud Antignac , David Sands , Gerardo Schneider

Exploratory visual data analysis tools empower data analysts to efficiently and intuitively explore data insights throughout the entire analysis cycle. However, the gap between common programmatic analysis (e.g., within computational…

Human-Computer Interaction · Computer Science 2025-01-08 Yue Yu , Leixian Shen , Fei Long , Huamin Qu , Hao Chen

Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. Clustering is a fundamental process in many different disciplines. Hence, researchers from different…

Machine Learning · Computer Science 2014-08-26 Sibei Yang , Liangde Tao , Bingchen Gong

Data clustering is an approach to seek for structure in sets of complex data, i.e., sets of "objects". The main objective is to identify groups of objects which are similar to each other, e.g., for classification. Here, an introduction to…

Data Analysis, Statistics and Probability · Physics 2016-02-17 Alexander K. Hartmann

Ambient air pollution is a pervasive issue with wide-ranging effects on human health, ecosystem vitality, and economic structures. Utilizing data on ambient air pollution concentrations, researchers can perform comprehensive analyses to…

Physics and Society · Physics 2024-03-07 Liam J Berrisford , Ronaldo Menezes
‹ Prev 1 2 3 10 Next ›