Related papers: On-Device Document Classification using multimodal…
Document capture applications on smartphones have emerged as popular tools for digitizing documents. For many individuals, capturing documents with their smartphones is more convenient than using dedicated photocopiers or scanners, even if…
This article proposes and documents a machine-learning framework and tutorial for classifying images using mobile phones. Compared to computers, the performance of deep learning model performance degrades when deployed on a mobile phone and…
With the overwhelming transition to smart phones, storing important information in the form of unstructured text has become habitual to users of mobile devices. From grocery lists to drafts of emails and important speeches, users store a…
A significant proportion of individuals' daily activities is experienced through digital devices. Smartphones in particular have become one of the preferred interfaces for content consumption and social interaction. Identifying the content…
Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…
We receive several essential updates on our smartphones in the form of SMS, documents, voice messages, etc. that get buried beneath the clutter of content. We often do not realize the key information without going through the full content.…
Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents. We show that…
We present Multimodal OCR (MOCR), a document parsing paradigm that jointly parses text and graphics into unified textual representations. Unlike conventional OCR systems that focus on text recognition and leave graphical regions as cropped…
We propose a method to make mobile screenshots easily searchable. In this paper, we present the workflow in which we: 1) preprocessed a collection of screenshots, 2) identified script presentin image, 3) extracted unstructured text from…
Several methods have been proposed for classifying long textual documents using Transformers. However, there is a lack of consensus on a benchmark to enable a fair comparison among different approaches. In this paper, we provide a…
Modern recommender systems operate in a fully server-based fashion. To cater to millions of users, the frequent model maintaining and the high-speed processing for concurrent user requests are required, which comes at the cost of a huge…
In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process…
Information extraction from copy-heavy documents, characterized by massive volumes of structurally similar content, represents a critical yet understudied challenge in enterprise document processing. We present a systematic framework that…
Object Recognition and Document Skew Estimation have come a long way in terms of performance and efficiency. New models follow one of two directions: improving performance using larger models, and improving efficiency using smaller models.…
Upon deployment to edge devices, it is often desirable for a model to further learn from streaming data to improve accuracy. However, extracting representative features from such data is challenging because it is typically unlabeled,…
Object recognition and detection are well-studied problems with a developed set of almost standard solutions. Identity documents recognition, classification, detection, and localization are the tasks required in a number of applications,…
To meet the practical requirements of low latency, low cost, and good privacy in online intelligent services, more and more deep learning models are offloaded from the cloud to mobile devices. To further deal with cross-device data…
In large technology companies, the requirements for managing and organizing technical documents created by engineers and managers have increased dramatically in recent years, which has led to a higher demand for more scalable, accurate, and…
Recognition of identity documents using mobile devices has become a topic of a wide range of computer vision research. The portfolio of methods and algorithms for solving such tasks as face detection, document detection and rectification,…
Smartphones have recently gained significant popularity in heavy mobile processing while users are increasing their expectations toward rich computing experience. However, resource limitations and current mobile computing advancements…