Related papers: Preprocessing: A Prerequisite for Discovering Patt…

Preprocessing: A Prerequisite for Discovering Patterns in WUM Process

Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data…

Databases · Computer Science 2011-04-13 C. Ramya , K S Shreedhara , G Kavitha

A Survey on Preprocessing Methods for Web Usage Data

World Wide Web is a huge repository of web pages and links. It provides abundance of information for the Internet users. The growth of web is tremendous as approximately one million pages are added daily. Users' accesses are recorded in web…

Information Retrieval · Computer Science 2010-04-09 V. Chitraa , Dr. Antony Selvdoss Davamani

An innovative data collection method to eliminate the preprocessing phase in web usage mining

The underlying data source for web usage mining (WUM) is commonly thought to be server logs. However, access log files ensure quite limited data about the clients. Identifying sessions from this messy data takes a considerable effort, and…

Information Retrieval · Computer Science 2025-01-09 Ozkan Canay , Umit Kocabicak

A Fuzzy Clustering Based Approach for Mining Usage Profiles from Web Log Data

The World Wide Web continues to grow at an amazing rate in both the size and complexity of Web sites and is well on its way to being the main reservoir of information and data. Due to this increase in growth and complexity of WWW, web site…

Databases · Computer Science 2015-09-03 Zahid Ansari , Mohammad Fazle Azeem , A. Vinaya Babu , Waseem Ahmed

An Efficient Preprocessing Methodology for Discovering Patterns and Clustering of Web Users using a Dynamic ART1 Neural Network

In this paper, a complete preprocessing methodology for discovering patterns in web usage mining process to improve the quality of data by reducing the quantity of data has been proposed. A dynamic ART1 neural network clustering algorithm…

Neural and Evolutionary Computing · Computer Science 2011-09-07 C. Ramya , G. Kavitha

Web Usage mining framework for Data Cleaning and IP address Identification

The World Wide Web is the most wide known information source that is easily available and searchable. It consists of billions of interconnected documents Web pages are authored by millions of people. Accesses made by various users to pages…

Databases · Computer Science 2014-08-26 Priyanka Verma , Nishtha Kesswani

Preprocessing Methods and Pipelines of Data Mining: An Overview

Data mining is about obtaining new knowledge from existing datasets. However, the data in the existing datasets can be scattered, noisy, and even incomplete. Although lots of effort is spent on developing or fine-tuning data mining models…

Machine Learning · Computer Science 2019-06-21 Canchen Li

Turning Logs into Lumber: Preprocessing Tasks in Process Mining

Event logs are invaluable for conducting process mining projects, offering insights into process improvement and data-driven decision-making. However, data quality issues affect the correctness and trustworthiness of these insights, making…

Databases · Computer Science 2023-10-09 Ying Liu , Vinicius Stein Dani , Iris Beerepoot , Xixi Lu

Discovering More Accurate Frequent Web Usage Patterns

Web usage mining is a type of web mining, which exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web users. As in classical data mining, data preparation and pattern discovery are the…

Databases · Computer Science 2008-12-18 Murat Ali Bayir , Ismail Hakki Toroslu , Ahmet Cosar , Guven Fidan

Data Mining to Measure and Improve the Success of Web Sites

For many companies, competitiveness in e-commerce requires a successful presence on the web. Web sites are used to establish the company's image, to promote and sell goods and to provide customer support. The success of a web site affects…

Machine Learning · Computer Science 2007-05-23 Myra Spiliopoulou , Carsten Pohle

Preprocessing is All You Need: Boosting the Performance of Log Parsers With a General Preprocessing Framework

Log parsing has been a long-studied area in software engineering due to its importance in identifying dynamic variables and constructing log templates. Prior work has proposed many statistic-based log parsers (e.g., Drain), which are highly…

Software Engineering · Computer Science 2024-12-09 Qiaolin Qin , Roozbeh Aghili , Heng Li , Ettore Merlo

Mining Frequent Patterns in Process Models

Process mining has emerged as a way to analyze the behavior of an organization by extracting knowledge from event logs and by offering techniques to discover, monitor and enhance real processes. In the discovery of process models,…

Artificial Intelligence · Computer Science 2017-10-17 David Chapela-Campa , Manuel Mucientes , Manuel Lama

Extracting and Pre-Processing Event Logs

Event data is the basis for all process mining analysis. Most process mining techniques assume their input to be an event log. However, event data is rarely recorded in an event log format, but has to be extracted from raw data. Event log…

Data Structures and Algorithms · Computer Science 2022-11-09 Dirk Fahland

Predictive modeling and anomaly detection in large-scale web portals through the CAWAL framework

This study presents an approach that uses session and page view data collected through the CAWAL framework, enriched through specialized processes, for advanced predictive modeling and anomaly detection in web usage mining (WUM)…

Machine Learning · Computer Science 2025-02-04 Ozkan Canay , Umit Kocabicak

Web Log Data Analysis by Enhanced Fuzzy C Means Clustering

World Wide Web is a huge repository of information and there is a tremendous increase in the volume of information daily. The number of users are also increasing day by day. To reduce users browsing time lot of research is taken place. Web…

Information Retrieval · Computer Science 2014-05-22 V. Chitraa , Antony Selvadoss Thanamani

Web Usage Mining: Pattern Discovery and Forecasting

Web usage mining: automatic discovery of patterns in clickstreams and associated data collected or generated as a result of user interactions with one or more Web sites. This paper describes web usage mining for our college log files to…

Databases · Computer Science 2013-10-10 Dhanamma Jagli , Sangeeta Oswal

Discovering Redundant Activities in Event Logs for the Simplification of Process Models

Process mining acts as a valuable tool to analyse the behaviour of an organisation by offering techniques to discover, monitor and enhance real processes. The key to process mining is to discovery understandable process models. However,…

Information Retrieval · Computer Science 2021-04-23 Qifan Chen , Yang Lu , Simon Poon

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

Large volumes of text data have contributed significantly to the development of large language models (LLMs) in recent years. This data is typically acquired by scraping the internet, leading to pretraining datasets comprised of noisy web…

Computation and Language · Computer Science 2023-09-12 Max Marion , Ahmet Üstün , Luiza Pozzobon , Alex Wang , Marzieh Fadaee , Sara Hooker

Extension of Dictionary-Based Compression Algorithms for the Quantitative Visualization of Patterns from Log Files

Many services today massively and continuously produce log files of different and varying formats. These logs are important since they contain information about the application activities, which is necessary for improvements by analyzing…

Information Retrieval · Computer Science 2023-04-11 Igor Cherepanov , Jonathan Geraldi Joewono , Arjan Kuijper , Jörn Kohlhammer

A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done…

Databases · Computer Science 2021-09-16 Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov , Arash Termehchy