Related papers: Boostlet.js: Image processing plugins for the web …
Text-driven video generation witnesses rapid progress. However, merely using text prompts is not enough to depict the desired subject appearance that accurately aligns with users' intents, especially for customized content creation. In this…
Image compositing is one of the most fundamental steps in creative workflows. It involves taking objects/parts of several images to create a new image, called a composite. Currently, this process is done manually by creating accurate masks…
This work presents Insert Anything, a unified framework for reference-based image insertion that seamlessly integrates objects from reference images into target scenes under flexible, user-specified control guidance. Instead of training…
Due to the availability of increasingly large amounts of visual data, there is a growing need for tools that can help users find relevant images. While existing tools can perform image retrieval based on similarity or metadata, they fall…
With the emergence of diffusion models and rapid development in image processing, it has become effortless to generate fancy images in tasks such as style transfer and image editing. However, these impressive image processing approaches…
This is to present a text image classifier device that identifies textual content in images and then categorizes each image into one of four predefined categories, including Invoice, Form, Letter, or Report. The device supports a gallery…
We introduce a lightweight network to improve descriptors of keypoints within the same image. The network takes the original descriptors and the geometric properties of keypoints as the input, and uses an MLP-based self-boosting stage and a…
In recent years, numerous ideas have emerged for designing a mutually reinforcing mechanism or extra stages for the image fusion task, ignoring the inevitable gaps between different vision tasks and the computational burden. We argue that…
Text-to-image models are powerful tools for image creation. However, the generation process is akin to a dice roll and makes it difficult to achieve a single image that captures everything a user wants. In this paper, we propose a framework…
Image Fusion, a technique which combines complimentary information from different images of the same scene so that the fused image is more suitable for segmentation, feature extraction, object recognition and Human Visual System. In this…
This work investigates how the traditional image classification pipelines can be extended into a deep architecture, inspired by recent successes of deep neural networks. We propose a deep boosting framework based on layer-by-layer joint…
Digital face manipulation has become a popular and fascinating way to touch images with the prevalence of smartphones and social networks. With a wide variety of user preferences, facial expressions, and accessories, a general and flexible…
In this paper, we introduce TextBoost, an efficient one-shot personalization approach for text-to-image diffusion models. Traditional personalization methods typically involve fine-tuning extensive portions of the model, leading to…
Custom animated visualizations of large, complex datasets are helpful across many domains, but they are hard to develop. Much of the difficulty arises from maintaining visualization state across many animated graphical elements that may…
Image processing holds immense potential for societal benefit, yet its full potential is often accessible only to tech-savvy experts. Bridging this knowledge gap and providing accessible tools for users of all backgrounds remains an…
Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. Recent research has begun to explore scalable disentanglement methods. However, there are still…
Methods: We have developed a software suite (DataSet Tracker) for real-time analysis designed to run on computers, smartphones, and smart glasses hardware and suitable for resource-constrained, on-the-fly computing in microscopes without…
Cross-browser compatibility testing is concerned with identifying perceptible differences in the way a Web page is rendered across different browsers or configurations thereof. Existing automated cross-browser compatibility testing methods…
Recent advances in multimodal learning have achieved remarkable success across diverse vision-language tasks. However, such progress heavily relies on large-scale image-text datasets, making training costly and inefficient. Prior efforts in…
We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via reference…