Kumarjit Pathak
Industrial experimentation requires both factor screening to identify critical variables and response optimization to find optimal operating conditions. Traditional approaches treat these as separate phases, necessitating costly sequential…
K-Nearest Neighbors (KNN) is one of the most used ML classifiers. However, if we observe closely, standard distance-weighted KNN and relative variants assume all 'k' neighbors are equally reliable. In heterogeneous feature space, this…
In machine learning and data mining, Cluster analysis is one of the most widely used unsupervised learning technique. Philosophy of this algorithm is to find similar data items and group them together based on any distance function in…
In statistical modelling the biggest threat is concept drift which makes the model gradually showing deteriorating performance over time. There are state of the art methodologies to detect the impact of concept drift, however general…
To predict the employee attrition beforehand and to enable management to take individualized preventive action. Using Ensemble classification modeling techniques and Linear Regression. Model could predict over 91% accurate employee…
Customer Satisfaction is the most important factors in the industry irrespective of domain. Key Driver Analysis is a common practice in data science to help the business to evaluate the same. Understanding key features, which influence the…
Classification is one of the widely used analytical techniques in data science domain across different business to associate a pattern which contribute to the occurrence of certain event which is predicted with some likelihood. This Paper…
High volume of data, perceived as either challenge or opportunity. Deep learning architecture demands high volume of data to effectively back propagate and train the weights without bias. At the same time, large volume of data demands…
In regression modelling approach, the main step is to fit the regression line as close as possible to the target variable. In this process most algorithms try to fit all of the data in a single line and hence fitting all parts of target…