Related papers: Data Augmentation for Sparse Multidimensional Lear…
Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often…
Learning performance data, such as correct or incorrect responses to questions in Intelligent Tutoring Systems (ITSs) is crucial for tracking and assessing the learners' progress and mastery of knowledge. However, the issue of data…
Learner performance data collected by Intelligent Tutoring Systems (ITSs), such as responses to questions, is essential for modeling and predicting learners' knowledge states. However, missing responses due to skips or incomplete attempts…
Generative data augmentation, which scales datasets by obtaining fake labeled examples from a trained conditional generative model, boosts classification performance in various learning tasks including (semi-)supervised learning, few-shot…
Tutoring is an effective instructional method for enhancing student learning, yet its success relies on the skill and experience of the tutors. This reliance presents challenges for the widespread implementation of tutoring, particularly in…
Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a…
Data sparsity is one of the key challenges associated with model development in Natural Language Understanding (NLU) for conversational agents. The challenge is made more complex by the demand for high quality annotated utterances commonly…
Knowledge Tracing (KT), which aims to model student knowledge level and predict their performance, is one of the most important applications of user modeling. Modern KT approaches model and maintain an up-to-date state of student knowledge…
Producing large complex simulation datasets can often be a time and resource consuming task. Especially when these experiments are very expensive, it is becoming more reasonable to generate synthetic data for downstream tasks. Recently,…
Dynamic tensor data are becoming prevalent in numerous applications. Existing tensor clustering methods either fail to account for the dynamic nature of the data, or are inapplicable to a general-order tensor. Also there is often a gap…
In recent years, the rapid growth in technology has increased the opportunity for longitudinal human behavioral studies. Rich multimodal data, from wearables like Fitbit, online social networks, mobile phones etc. can be collected in…
Beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. While possessing many desirable…
How can we predict missing values in multi-dimensional data (or tensors) more accurately? The task of tensor completion is crucial in many applications such as personalized recommendation, image and video restoration, and link prediction in…
Tensor train (TT) decomposition, a powerful tool for analyzing multidimensional data, exhibits superior performance in many machine learning tasks. However, existing methods for TT decomposition either suffer from noise overfitting, or…
Data augmentation techniques are widely used for enhancing the performance of machine learning models by tackling class imbalance issues and data sparsity. State-of-the-art generative language models have been shown to provide significant…
Data limitation is one of the most common issues in training machine learning classifiers for medical applications. Due to ethical concerns and data privacy, the number of people that can be recruited to such experiments is generally…
Tucker decomposition is the cornerstone of modern machine learning on tensorial data analysis, which have attracted considerable attention for multiway feature extraction, compressive sensing, and tensor completion. The most challenging…
We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data. Our paper introduces two key contributions: a new powerful class of forest-based models fit for such tasks and a simple…
We consider tensor factorizations based on sparse measurements of the components of relatively high rank tensors. The measurements are designed in a way that the underlying graph of interactions is a random graph. The setup will be useful…
The use of deep learning for database optimization has gained significant traction, offering improvements in indexing, cardinality estimation, and query optimization. However, acquiring high-quality training data remains a significant…