Related papers: On-Device Document Classification using multimodal…

Handheld Video Document Scanning: A Robust On-Device Model for Multi-Page Document Scanning

Document capture applications on smartphones have emerged as popular tools for digitizing documents. For many individuals, capturing documents with their smartphones is more convenient than using dedicated photocopiers or scanners, even if…

Computer Vision and Pattern Recognition · Computer Science 2024-11-04 Curtis Wigington

Deep learning pipeline for image classification on mobile phones

This article proposes and documents a machine-learning framework and tutorial for classifying images using mobile phones. Compared to computers, the performance of deep learning model performance degrades when deployed on a mobile phone and…

Image and Video Processing · Electrical Eng. & Systems 2022-06-02 Muhammad Muneeb , Samuel F. Feng , Andreas Henschel

On-Device Tag Generation for Unstructured Text

With the overwhelming transition to smart phones, storing important information in the form of unstructured text has become habitual to users of mobile devices. From grocery lists to drafts of emails and important speeches, users store a…

Computation and Language · Computer Science 2022-01-03 Manish Chugani , Shubham Vatsal , Gopi Ramena , Sukumar Moharana , Naresh Purre

Guess What's on my Screen? Clustering Smartphone Screenshots with Active Learning

A significant proportion of individuals' daily activities is experienced through digital devices. Smartphones in particular have become one of the preferred interfaces for content consumption and social interaction. Identifying the content…

Computer Vision and Pattern Recognition · Computer Science 2019-01-11 Agnese Chiatti , Dolzodmaa Davaasuren , Nilam Ram , Prasenjit Mitra , Byron Reeves , Thomas Robinson

Document classification methods

Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…

Information Retrieval · Computer Science 2019-09-18 Madjid Khalilian , Shiva Hassanzadeh

Adaptive Beam Search to Enhance On-device Abstractive Summarization

We receive several essential updates on our smartphones in the form of SMS, documents, voice messages, etc. that get buried beneath the clutter of content. We often do not realize the key information without going through the full content.…

Computation and Language · Computer Science 2022-02-08 Harichandana B S S , Sumit Kumar

A Multi-Modal Multilingual Benchmark for Document Image Classification

Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents. We show that…

Computation and Language · Computer Science 2023-10-26 Yoshinari Fujinuma , Siddharth Varia , Nishant Sankaran , Srikar Appalaraju , Bonan Min , Yogarshi Vyas

Multimodal OCR: Parse Anything from Documents

We present Multimodal OCR (MOCR), a document parsing paradigm that jointly parses text and graphics into unified textual representations. Unlike conventional OCR systems that focus on text recognition and leave graphical regions as cropped…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Handong Zheng , Yumeng Li , Kaile Zhang , Liang Xin , Guangwei Zhao , Hao Liu , Jiayu Chen , Jie Lou , Qi Fu , Rui Yang , Shuo Jiang , Weijian Luo , Weijie Su , Weijun Zhang , Xingyu Zhu , Yabin Li , Yiwei ma , Yu Chen , Yuqiu Ji , Zhaohui Yu , Guang Yang , Colin Zhang , Lei Zhang , Yuliang Liu , Xiang Bai

On- Device Information Extraction from Screenshots in form of tags

We propose a method to make mobile screenshots easily searchable. In this paper, we present the workflow in which we: 1) preprocessed a collection of screenshots, 2) identified script presentin image, 3) extracted unstructured text from…

Computer Vision and Pattern Recognition · Computer Science 2020-12-15 Sumit Kumar , Gopi Ramena , Manoj Goyal , Debi Mohanty , Ankur Agarwal , Benu Changmai , Sukumar Moharana

Efficient Classification of Long Documents Using Transformers

Several methods have been proposed for classifying long textual documents using Transformers. However, there is a lack of consensus on a benchmark to enable a fair comparison among different approaches. In this paper, we provide a…

Computation and Language · Computer Science 2022-03-23 Hyunji Hayley Park , Yogarshi Vyas , Kashif Shah

On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation

Modern recommender systems operate in a fully server-based fashion. To cater to millions of users, the frequent model maintaining and the high-speed processing for concurrent user requests are required, which comes at the cost of a huge…

Information Retrieval · Computer Science 2022-04-26 Xin Xia , Hongzhi Yin , Junliang Yu , Qinyong Wang , Guandong Xu , Nguyen Quoc Viet Hung

Unifying Multimodal Retrieval via Document Screenshot Embedding

In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process…

Information Retrieval · Computer Science 2024-12-03 Xueguang Ma , Sheng-Chieh Lin , Minghan Li , Wenhu Chen , Jimmy Lin

Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task

Information extraction from copy-heavy documents, characterized by massive volumes of structurally similar content, represents a critical yet understudied challenge in enterprise document processing. We present a systematic framework that…

Computation and Language · Computer Science 2025-10-14 Zilong Wang , Xiaoyu Shen

Exploring Light-Weight Object Recognition for Real-Time Document Detection

Object Recognition and Document Skew Estimation have come a long way in terms of performance and efficiency. New models follow one of two directions: improving performance using larger models, and improving efficiency using smaller models.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Lucas Wojcik , Luiz Coelho , Roger Granada , David Menotti

Enabling On-Device Learning via Experience Replay with Efficient Dataset Condensation

Upon deployment to edge devices, it is often desirable for a model to further learn from streaming data to improve accuracy. However, extracting representative features from such data is challenging because it is typically unlabeled,…

Machine Learning · Computer Science 2024-05-28 Gelei Xu , Ningzhi Tang , Jun Xia , Wei Jin , Yiyu Shi

Identity documents recognition and detection using semantic segmentation with convolutional neural network

Object recognition and detection are well-studied problems with a developed set of almost standard solutions. Identity documents recognition, classification, detection, and localization are the tasks required in a number of applications,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Mykola Kozlenko , Volodymyr Sendetskyi , Oleksiy Simkiv , Nazar Savchenko , Andy Bosyi

On-Device Model Fine-Tuning with Label Correction in Recommender Systems

To meet the practical requirements of low latency, low cost, and good privacy in online intelligent services, more and more deep learning models are offloaded from the cloud to mobile devices. To further deal with cross-device data…

Information Retrieval · Computer Science 2022-11-03 Yucheng Ding , Chaoyue Niu , Fan Wu , Shaojie Tang , Chengfei Lyu , Guihai Chen

Deep Learning for Technical Document Classification

In large technology companies, the requirements for managing and organizing technical documents created by engineers and managers have increased dramatically in recent years, which has led to a higher demand for more scalable, accurate, and…

Machine Learning · Computer Science 2025-10-31 Shuo Jiang , Jie Hu , Christopher L. Magee , Jianxi Luo

MIDV-2019: Challenges of the modern mobile-based document OCR

Recognition of identity documents using mobile devices has become a topic of a wide range of computer vision research. The portfolio of methods and algorithms for solving such tasks as face detection, document detection and rectification,…

Computer Vision and Pattern Recognition · Computer Science 2020-02-12 Konstantin Bulatov , Daniil Matalov , Vladimir V. Arlazarov

Mobile Cloud Computing: A Review on Smartphone Augmentation Approaches

Smartphones have recently gained significant popularity in heavy mobile processing while users are increasing their expectations toward rich computing experience. However, resource limitations and current mobile computing advancements…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-20 Saeid Abolfazli , Zohreh Sanaei , Abdullah Gani