R. Manmatha — Scifaro

R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding

Visual agent models for automating human activities on Graphical User Interfaces (GUIs) have emerged as a promising research direction, driven by advances in large Vision Language Models (VLMs). A critical challenge in GUI automation is the…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Joonhyung Park , Peng Tang , Sagnik Das , Srikar Appalaraju , Kunwar Yashraj Singh , R. Manmatha , Shabnam Ghadar

The Amazon Nova Family of Models: Technical Report and Model Card

We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of…

Artificial Intelligence · Computer Science 2025-06-17 Amazon AGI , Aaron Langford , Aayush Shah , Abhanshu Gupta , Abhimanyu Bhatter , Abhinav Goyal , Abhinav Mathur , Abhinav Mohanty , Abhishek Kumar , Abhishek Sethi , Abi Komma , Abner Pena , Achin Jain , Adam Kunysz , Adam Opyrchal , Adarsh Singh , Aditya Rawal , Adok Achar Budihal Prasad , Adrià de Gispert , Agnika Kumar , Aishwarya Aryamane , Ajay Nair , Akilan M , Akshaya Iyengar , Akshaya Vishnu Kudlu Shanbhogue , Alan He , Alessandra Cervone , Alex Loeb , Alex Zhang , Alexander Fu , Alexander Lisnichenko , Alexander Zhipa , Alexandros Potamianos , Ali Kebarighotbi , Aliakbar Daronkolaei , Alok Parmesh , Amanjot Kaur Samra , Ameen Khan , Amer Rez , Amir Saffari , Amit Agarwalla , Amit Jhindal , Amith Mamidala , Ammar Asmro , Amulya Ballakur , Anand Mishra , Anand Sridharan , Anastasiia Dubinina , Andre Lenz , Andreas Doerr , Andrew Keating , Andrew Leaver , Andrew Smith , Andrew Wirth , Andy Davey , Andy Rosenbaum , Andy Sohn , Angela Chan , Aniket Chakrabarti , Anil Ramakrishna , Anirban Roy , Anita Iyer , Anjali Narayan-Chen , Ankith Yennu , Anna Dabrowska , Anna Gawlowska , Anna Rumshisky , Anna Turek , Anoop Deoras , Anton Bezruchkin , Anup Prasad , Anupam Dewan , Anwith Kiran , Apoorv Gupta , Aram Galstyan , Aravind Manoharan , Arijit Biswas , Arindam Mandal , Arpit Gupta , Arsamkhan Pathan , Arun Nagarajan , Arushan Rajasekaram , Arvind Sundararajan , Ashwin Ganesan , Ashwin Swaminathan , Athanasios Mouchtaris , Audrey Champeau , Avik Ray , Ayush Jaiswal , Ayush Sharma , Bailey Keefer , Balamurugan Muthiah , Beatriz Leon-Millan , Ben Koopman , Ben Li , Benjamin Biggs , Benjamin Ott , Bhanu Vinzamuri , Bharath Venkatesh , Bhavana Ganesh , Bhoomit Vasani , Bill Byrne , Bill Hsu , Bincheng Wang , Blake King , Blazej Gorny , Bo Feng , Bo Zheng , Bodhisattwa Paul , Bofan Sun , Bofeng Luo , Bowen Chen , Bowen Xie , Boya Yu , Brendan Jugan , Brett Panosh , Brian Collins , Brian Thompson , Can Karakus , Can Liu , Carl Lambrecht , Carly Lin , Carolyn Wang , Carrie Yuan , Casey Loyda , Cezary Walczak , Chalapathi Choppa , Chandana Satya Prakash , Chankrisna Richy Meas , Charith Peris , Charles Recaido , Charlie Xu , Charul Sharma , Chase Kernan , Chayut Thanapirom , Chengwei Su , Chenhao Xu , Chenhao Yin , Chentao Ye , Chenyang Tao , Chethan Parameshwara , Ching-Yun Chang , Chong Li , Chris Hench , Chris Tran , Christophe Dupuy , Christopher Davis , Christopher DiPersio , Christos Christodoulopoulos , Christy Li , Chun Chen , Claudio Delli Bovi , Clement Chung , Cole Hawkins , Connor Harris , Corey Ropell , Cynthia He , DK Joo , Dae Yon Hwang , Dan Rosen , Daniel Elkind , Daniel Pressel , Daniel Zhang , Danielle Kimball , Daniil Sorokin , Dave Goodell , Davide Modolo , Dawei Zhu , Deepikaa Suresh , Deepti Ragha , Denis Filimonov , Denis Foo Kune , Denis Romasanta Rodriguez , Devamanyu Hazarika , Dhananjay Ram , Dhawal Parkar , Dhawal Patel , Dhwanil Desai , Dinesh Singh Rajput , Disha Sule , Diwakar Singh , Dmitriy Genzel , Dolly Goldenberg , Dongyi He , Dumitru Hanciu , Dushan Tharmal , Dzmitry Siankovich , Edi Cikovic , Edwin Abraham , Ekraam Sabir , Elliott Olson , Emmett Steven , Emre Barut , Eric Jackson , Ethan Wu , Evelyn Chen , Ezhilan Mahalingam , Fabian Triefenbach , Fan Yang , Fangyu Liu , Fanzi Wu , Faraz Tavakoli , Farhad Khozeimeh , Feiyang Niu , Felix Hieber , Feng Li , Firat Elbey , Florian Krebs , Florian Saupe , Florian Sprünken , Frank Fan , Furqan Khan , Gabriela De Vincenzo , Gagandeep Kang , George Ding , George He , George Yeung , Ghada Qaddoumi , Giannis Karamanolakis , Goeric Huybrechts , Gokul Maddali , Gonzalo Iglesias , Gordon McShane , Gozde Sahin , Guangtai Huang , Gukyeong Kwon , Gunnar A. Sigurdsson , Gurpreet Chadha , Gururaj Kosuru , Hagen Fuerstenau , Hah Hah , Haja Maideen , Hajime Hosokawa , Han Liu , Han-Kai Hsu , Hann Wang , Hao Li , Hao Yang , Haofeng Zhu , Haozheng Fan , Harman Singh , Harshavardhan Kaluvala , Hashim Saeed , He Xie , Helian Feng , Hendrix Luo , Hengzhi Pei , Henrik Nielsen , Hesam Ilati , Himanshu Patel , Hongshan Li , Hongzhou Lin , Hussain Raza , Ian Cullinan , Imre Kiss , Inbarasan Thangamani , Indrayani Fadnavis , Ionut Teodor Sorodoc , Irem Ertuerk , Iryna Yemialyanava , Ishan Soni , Ismail Jelal , Ivan Tse , Jack FitzGerald , Jack Zhao , Jackson Rothgeb , Jacky Lee , Jake Jung , Jakub Debski , Jakub Tomczak , James Jeun , James Sanders , Jason Crowley , Jay Lee , Jayakrishna Anvesh Paidy , Jayant Tiwari , Jean Farmer , Jeff Solinsky , Jenna Lau , Jeremy Savareese , Jerzy Zagorski , Ji Dai , Jiacheng , Gu , Jiahui Li , Jian , Zheng , Jianhua Lu , Jianhua Wang , Jiawei Dai , Jiawei Mo , Jiaxi Xu , Jie Liang , Jie Yang , Jim Logan , Jimit Majmudar , Jing Liu , Jinghong Miao , Jingru Yi , Jingyang Jin , Jiun-Yu Kao , Jixuan Wang , Jiyang Wang , Joe Pemberton , Joel Carlson , Joey Blundell , John Chin-Jew , John He , Jonathan Ho , Jonathan Hueser , Jonathan Lunt , Jooyoung Lee , Joshua Tan , Joyjit Chatterjee , Judith Gaspers , Jue Wang , Jun Fang , Jun Tang , Jun Wan , Jun Wu , Junlei Wang , Junyi Shi , Justin Chiu , Justin Satriano , Justin Yee , Jwala Dhamala , Jyoti Bansal , Kai Zhen , Kai-Wei Chang , Kaixiang Lin , Kalyan Raman , Kanthashree Mysore Sathyendra , Karabo Moroe , Karan Bhandarkar , Karan Kothari , Karolina Owczarzak , Karthick Gopalswamy , Karthick Ravi , Karthik Ramakrishnan , Karthika Arumugam , Kartik Mehta , Katarzyna Konczalska , Kavya Ravikumar , Ke Tran , Kechen Qin , Kelin Li , Kelvin Li , Ketan Kulkarni , Kevin Angelo Rodrigues , Keyur Patel , Khadige Abboud , Kiana Hajebi , Klaus Reiter , Kris Schultz , Krishna Anisetty , Krishna Kotnana , Kristen Li , Kruthi Channamallikarjuna , Krzysztof Jakubczyk , Kuba Pierewoj , Kunal Pal , Kunwar Srivastav , Kyle Bannerman , Lahari Poddar , Lakshmi Prasad , Larry Tseng , Laxmikant Naik , Leena Chennuru Vankadara , Lenon Minorics , Leo Liu , Leonard Lausen , Leonardo F. R. Ribeiro , Li Zhang , Lili Gehorsam , Ling Qi , Lisa Bauer , Lori Knapp , Lu Zeng , Lucas Tong , Lulu Wong , Luoxin Chen , Maciej Rudnicki , Mahdi Namazifar , Mahesh Jaliminche , Maira Ladeira Tanke , Manasi Gupta , Mandeep Ahlawat , Mani Khanuja , Mani Sundaram , Marcin Leyk , Mariusz Momotko , Markus Boese , Markus Dreyer , Markus Mueller , Mason Fu , Mateusz Górski , Mateusz Mastalerczyk , Matias Mora , Matt Johnson , Matt Scott , Matthew Wen , Max Barysau , Maya Boumerdassi , Maya Krishnan , Mayank Gupta , Mayank Hirani , Mayank Kulkarni , Meganathan Narayanasamy , Melanie Bradford , Melanie Gens , Melissa Burke , Meng Jin , Miao Chen , Michael Denkowski , Michael Heymel , Michael Krestyaninov , Michal Obirek , Michalina Wichorowska , Michał Miotk , Milosz Watroba , Mingyi Hong , Mingzhi Yu , Miranda Liu , Mohamed Gouda , Mohammad El-Shabani , Mohammad Ghavamzadeh , Mohit Bansal , Morteza Ziyadi , Nan Xia , Nathan Susanj , Nav Bhasin , Neha Goswami , Nehal Belgamwar , Nicolas Anastassacos , Nicolas Bergeron , Nidhi Jain , Nihal Jain , Niharika Chopparapu , Nik Xu , Nikko Strom , Nikolaos Malandrakis , Nimisha Mishra , Ninad Parkhi , Ninareh Mehrabi , Nishita Sant , Nishtha Gupta , Nitesh Sekhar , Nithin Rajeev , Nithish Raja Chidambaram , Nitish Dhar , Noor Bhagwagar , Noy Konforty , Omar Babu , Omid Razavi , Orchid Majumder , Osama Dar , Oscar Hsu , Pablo Kvitca , Pallavi Pandey , Parker Seegmiller , Patrick Lange , Paul Ferraro , Payal Motwani , Pegah Kharazmi , Pei Wang , Pengfei Liu , Peter Bradtke , Peter Götz , Peter Zhou , Pichao Wang , Piotr Poskart , Pooja Sonawane , Pradeep Natarajan , Pradyun Ramadorai , Pralam Shah , Prasad Nirantar , Prasanthi Chavali , Prashan Wanigasekara , Prashant Saraf , Prashun Dey , Pratyush Pant , Prerak Pradhan , Preyaa Patel , Priyanka Dadlani , Prudhvee Narasimha Sadha , Qi Dong , Qian Hu , Qiaozi , Gao , Qing Liu , Quinn Lam , Quynh Do , R. Manmatha , Rachel Willis , Rafael Liu , Rafal Ellert , Rafal Kalinski , Rafi Al Attrach , Ragha Prasad , Ragini Prasad , Raguvir Kunani , Rahul Gupta , Rahul Sharma , Rahul Tewari , Rajaganesh Baskaran , Rajan Singh , Rajiv Gupta , Rajiv Reddy , Rajshekhar Das , Rakesh Chada , Rakesh Vaideeswaran Mahesh , Ram Chandrasekaran , Ramesh Nallapati , Ran Xue , Rashmi Gangadharaiah , Ravi Rachakonda , Renxian Zhang , Rexhina Blloshmi , Rishabh Agrawal , Robert Enyedi , Robert Lowe , Robik Shrestha , Robinson Piramuthu , Rohail Asad , Rohan Khanna , Rohan Mukherjee , Rohit Mittal , Rohit Prasad , Rohith Mysore Vijaya Kumar , Ron Diamant , Ruchita Gupta , Ruiwen Li , Ruoying Li , Rushabh Fegade , Ruxu Zhang , Ryan Arbow , Ryan Chen , Ryan Gabbard , Ryan Hoium , Ryan King , Sabarishkumar Iyer , Sachal Malick , Sahar Movaghati , Sai Balakavi , Sai Jakka , Sai Kashyap Paruvelli , Sai Muralidhar Jayanthi , Saicharan Shriram Mujumdar , Sainyam Kapoor , Sajjad Beygi , Saket Dingliwal , Saleh Soltan , Sam Ricklin , Sam Tucker , Sameer Sinha , Samridhi Choudhary , Samson Tan , Samuel Broscheit , Samuel Schulter , Sanchit Agarwal , Sandeep Atluri , Sander Valstar , Sanjana Shankar , Sanyukta Sanyukta , Sarthak Khanna , Sarvpriye Khetrapal , Satish Janakiraman , Saumil Shah , Saurabh Akolkar , Saurabh Giri , Saurabh Khandelwal , Saurabh Pawar , Saurabh Sahu , Sean Huang , Sejun Ra , Senthilkumar Gopal , Sergei Dobroshinsky , Shadi Saba , Shamik Roy , Shamit Lal , Shankar Ananthakrishnan , Sharon Li , Shashwat Srijan , Shekhar Bhide , Sheng Long Tang , Sheng Zha , Shereen Oraby , Sherif Mostafa , Shiqi Li , Shishir Bharathi , Shivam Prakash , Shiyuan Huang , Shreya Yembarwar , Shreyas Pansare , Shreyas Subramanian , Shrijeet Joshi , Shuai Liu , Shuai Tang , Shubham Chandak , Shubham Garg , Shubham Katiyar , Shubham Mehta , Shubham Srivastav , Shuo Yang , Siddalingesha D S , Siddharth Choudhary , Siddharth Singh Senger , Simon Babb , Sina Moeini , Siqi Deng , Siva Loganathan , Slawomir Domagala , Sneha Narkar , Sneha Wadhwa , Songyang Zhang , Songyao Jiang , Sony Trenous , Soumajyoti Sarkar , Soumya Saha , Sourabh Reddy , Sourav Dokania , Spurthideepika Sandiri , Spyros Matsoukas , Sravan Bodapati , Sri Harsha Reddy Wdaru , Sridevi Yagati Venkateshdatta , Srikanth Ronanki , Srinivasan R Veeravanallur , Sriram Venkatapathy , Sriramprabhu Sankaraguru , Sruthi Gorantla , Sruthi Karuturi , Stefan Schroedl , Subendhu Rongali , Subhasis Kundu , Suhaila Shakiah , Sukriti Tiwari , Sumit Bharti , Sumita Sami , Sumith Mathew , Sunny Yu , Sunwoo Kim , Suraj Bajirao Malode , Susana Cumplido Riel , Swapnil Palod , Swastik Roy , Syed Furqhan , Tagyoung Chung , Takuma Yoshitani , Taojiannan Yang , Tejaswi Chillakura , Tejwant Bajwa , Temi Lajumoke , Thanh Tran , Thomas Gueudre , Thomas Jung , Tianhui Li , Tim Seemman , Timothy Leffel , Tingting Xiang , Tirth Patel , Tobias Domhan , Tobias Falke , Toby Guo , Tom Li , Tomasz Horszczaruk , Tomasz Jedynak , Tushar Kulkarni , Tyst Marin , Tytus Metrycki , Tzu-Yen Wang , Umang Jain , Upendra Singh , Utkarsh Chirimar , Vaibhav Gupta , Vanshil Shah , Varad Deshpande , Varad Gunjal , Varsha Srikeshava , Varsha Vivek , Varun Bharadwaj , Varun Gangal , Varun Kumar , Venkatesh Elango , Vicente Ordonez , Victor Soto , Vignesh Radhakrishnan , Vihang Patel , Vikram Singh , Vinay Varma Kolanuvada , Vinayshekhar Bannihatti Kumar , Vincent Auvray , Vincent Cartillier , Vincent Ponzo , Violet Peng , Vishal Khandelwal , Vishal Naik , Vishvesh Sahasrabudhe , Vitaliy Korolev , Vivek Gokuladas , Vivek Madan , Vivek Subramanian , Volkan Cevher , Vrinda Gupta , Wael Hamza , Wei Zhang , Weitong Ruan , Weiwei Cheng , Wen Zhang , Wenbo Zhao , Wenyan Yao , Wenzhuo Ouyang , Wesley Dashner , William Campbell , William Lin , Willian Martin , Wyatt Pearson , Xiang Jiang , Xiangxing Lu , Xiangyang Shi , Xianwen Peng , Xiaofeng Gao , Xiaoge Jiang , Xiaohan Fei , Xiaohui Wang , Xiaozhou Joey Zhou , Xin Feng , Xinyan Zhao , Xinyao Wang , Xinyu Li , Xu Zhang , Xuan Wang , Xuandi Fu , Xueling Yuan , Xuning Wang , Yadunandana Rao , Yair Tavizon , Yan Rossiytsev , Yanbei Chen , Yang Liu , Yang Zou , Yangsook Park , Yannick Versley , Yanyan Zhang , Yash Patel , Yen-Cheng Lu , Yi Pan , Yi-Hsiang , Lai , Yichen Hu , Yida Wang , Yiheng Zhou , Yilin Xiang , Ying Shi , Ying Wang , Yishai Galatzer , Yongxin Wang , Yorick Shen , Yuchen Sun , Yudi Purwatama , Yue , Wu , Yue Gu , Yuechun Wang , Yujun Zeng , Yuncong Chen , Yunke Zhou , Yusheng Xie , Yvon Guy , Zbigniew Ambrozinski , Zhaowei Cai , Zhen Zhang , Zheng Wang , Zhenghui Jin , Zhewei Zhao , Zhiheng Li , Zhiheng Luo , Zhikang Zhang , Zhilin Fang , Zhiqi Bu , Zhiyuan Wang , Zhizhong Li , Zijian Wang , Zimeng , Qiu , Zishi Li

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

In recent years, notable advancements have been made in the domain of visual document understanding, with the prevailing architecture comprising a cascade of vision and language models. The text component can either be extracted explicitly…

Computer Vision and Pattern Recognition · Computer Science 2025-03-27 Ofir Abramovich , Niv Nayman , Sharon Fogel , Inbal Lavi , Ron Litman , Shahar Tsiper , Royee Tichauer , Srikar Appalaraju , Shai Mazor , R. Manmatha

Efficient Scaling of Diffusion Transformers for Text-to-Image Generation

We empirically study the scaling properties of various Diffusion Transformers (DiTs) for text-to-image generation by performing extensive and rigorous ablations, including training scaled DiTs ranging from 0.3B upto 8B parameters on…

Computer Vision and Pattern Recognition · Computer Science 2024-12-18 Hao Li , Shamit Lal , Zhiheng Li , Yusheng Xie , Ying Wang , Yang Zou , Orchid Majumder , R. Manmatha , Zhuowen Tu , Stefano Ermon , Stefano Soatto , Ashwin Swaminathan

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models

Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by…

Computer Vision and Pattern Recognition · Computer Science 2024-10-07 Sungnyun Kim , Haofu Liao , Srikar Appalaraju , Peng Tang , Zhuowen Tu , Ravi Kumar Satzoda , R. Manmatha , Vijay Mahadevan , Stefano Soatto

NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality

We study the capability of Video-Language (VidL) models in understanding compositions between objects, attributes, actions and their relations. Composition understanding becomes particularly challenging for video data since the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Chaofan Tao , Gukyeong Kwon , Varad Gunjal , Hao Yang , Zhaowei Cai , Yonatan Dukler , Ashwin Swaminathan , R. Manmatha , Colin Jon Taylor , Stefano Soatto

Mixed-Query Transformer: A Unified Image Segmentation Architecture

Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task.…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Pei Wang , Zhaowei Cai , Hao Yang , Ashwin Swaminathan , R. Manmatha , Stefano Soatto

On the Scalability of Diffusion-based Text-to-Image Generation

Scaling up model and data size has been quite successful for the evolution of LLMs. However, the scaling law for the diffusion based text-to-image (T2I) models is not fully explored. It is also unclear how to efficiently scale the model for…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Hao Li , Yang Zou , Ying Wang , Orchid Majumder , Yusheng Xie , R. Manmatha , Ashwin Swaminathan , Zhuowen Tu , Stefano Ermon , Stefano Soatto

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding.…

Computer Vision and Pattern Recognition · Computer Science 2023-11-16 Peng Tang , Pengkai Zhu , Tian Li , Srikar Appalaraju , Vijay Mahadevan , R. Manmatha

Multiple-Question Multiple-Answer Text-VQA

We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models. The text-VQA task requires a model to answer a question by understanding multi-modal content: text (typically from…

Computer Vision and Pattern Recognition · Computer Science 2023-11-16 Peng Tang , Srikar Appalaraju , R. Manmatha , Yusheng Xie , Vijay Mahadevan

DocTr: Document Transformer for Structured Information Extraction in Documents

We present a new formulation for structured information extraction (SIE) from visually rich documents. It aims to address the limitations of existing IOB tagging or graph-based formulations, which are either overly reliant on the correct…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Haofu Liao , Aruni RoyChowdhury , Weijian Li , Ankan Bansal , Yuting Zhang , Zhuowen Tu , Ravi Kumar Satzoda , R. Manmatha , Vijay Mahadevan

DocFormerv2: Local Features for Document Understanding

We propose DocFormerv2, a multi-modal transformer for Visual Document Understanding (VDU). The VDU domain entails understanding documents (beyond mere OCR predictions) e.g., extracting information from a form, VQA for documents and other…

Computer Vision and Pattern Recognition · Computer Science 2023-06-05 Srikar Appalaraju , Peng Tang , Qi Dong , Nishant Sankaran , Yichu Zhou , R. Manmatha

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Jiang Liu , Hui Ding , Zhaowei Cai , Yuting Zhang , Ravi Kumar Satzoda , Vijay Mahadevan , R. Manmatha

SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation

Learning to segment images purely by relying on the image-text alignment from web data can lead to sub-optimal performance due to noise in the data. The noise comes from the samples where the associated text does not correlate with the…

Computer Vision and Pattern Recognition · Computer Science 2023-02-08 Yash Patel , Yusheng Xie , Yi Zhu , Srikar Appalaraju , R. Manmatha

YORO -- Lightweight End to End Visual Grounding

We present YORO - a multi-modal transformer encoder-only architecture for the Visual Grounding (VG) task. This task involves localizing, in an image, an object referred via natural language. Unlike the recent trend in the literature of…

Computer Vision and Pattern Recognition · Computer Science 2022-11-16 Chih-Hui Ho , Srikar Appalaraju , Bhavan Jasani , R. Manmatha , Nuno Vasconcelos

GLASS: Global to Local Attention for Scene-Text Spotting

In recent years, the dominant paradigm for text spotting is to combine the tasks of text detection and recognition into a single end-to-end framework. Under this paradigm, both tasks are accomplished by operating over a shared global…

Computer Vision and Pattern Recognition · Computer Science 2022-08-09 Roi Ronen , Shahar Tsiper , Oron Anschel , Inbal Lavi , Amir Markovitz , R. Manmatha

Searching for Apparel Products from Images in the Wild

In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar…

Computer Vision and Pattern Recognition · Computer Science 2022-04-11 Son Tran , Ming Du , Sampath Chanda , R. Manmatha , Cj Taylor

Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the…

Computer Vision and Pattern Recognition · Computer Science 2022-02-15 Yair Kittenplon , Inbal Lavi , Sharon Fogel , Yarin Bar , R. Manmatha , Pietro Perona

LaTr: Layout-Aware Transformer for Scene-Text VQA

We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to reason over different modalities. Thus, we first investigate the impact…

Computer Vision and Pattern Recognition · Computer Science 2021-12-28 Ali Furkan Biten , Ron Litman , Yusheng Xie , Srikar Appalaraju , R. Manmatha

DocFormer: End-to-End Transformer for Document Understanding

We present DocFormer -- a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). VDU is a challenging problem which aims to understand documents in their varied formats (forms, receipts etc.) and…

Computer Vision and Pattern Recognition · Computer Science 2021-09-21 Srikar Appalaraju , Bhavan Jasani , Bhargava Urala Kota , Yusheng Xie , R. Manmatha