Related papers: PixelGen: Rethinking Embedded Camera Systems
Pixel diffusion generates images directly in pixel space, avoiding the VAE artifacts and representational bottlenecks of two-stage latent diffusion. Recent JiT further simplifies pixel diffusion with x-prediction, where the model predicts…
Recent agentic language models increasingly need to interact with real-world environments that contain tightly intertwined visual and textual information, often through raw camera pixels rather than separately processed images and tokenized…
Image sensors hold a pivotal role in society due to their ability to capture vast amounts of information. Traditionally, image sensors are opaque due to light absorption in both the pixels and the read-out electronics that are stacked on…
Lensless imaging seeks to replace/remove the lens in a conventional imaging system. The earliest cameras were in fact lensless, relying on long exposure times to form images on the other end of a small aperture in a darkened room/container…
Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices. We propose low-cost and low-power imaging modalities that…
A scanning pixel camera is a novel low-cost, low-power sensor that is not diffraction limited. It produces data as a sequence of samples extracted from various parts of the scene during the course of a scan. It can provide very detailed…
Event cameras capture the world at high time resolution and with minimal bandwidth requirements. However, event streams, which only encode changes in brightness, do not contain sufficient scene information to support a wide variety of…
Lensless imaging is an important and challenging problem. One notable solution to lensless imaging is a single pixel camera which benefits from ideas central to compressive sampling. However, traditional single pixel cameras require many…
By replacing the lens with a thin optical element, lensless imaging enables new applications and solutions beyond those supported by traditional camera design and post-processing, e.g. compact and lightweight form factors and visual…
Polarization imaging captures the polarization state of light, revealing information invisible to the human eye yet valuable in domains such as biomedical diagnostics, autonomous driving, and remote sensing. However, conventional…
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time,…
Depth imaging is vital for many emerging technologies with applications in augmented reality, robotics, gesture detection, and facial recognition. These applications, however, demand compact and low-power systems beyond the capabilities of…
This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other…
Event cameras are a new type of sensors that are different from traditional cameras. Each pixel is triggered asynchronously by event. The trigger event is the change of the brightness irradiated on the pixel. If the increment or decrement…
Over the past two decades, mobile imaging has experienced a profound transformation, with cell phones rapidly eclipsing all other forms of digital photography in popularity. Today's cell phones are equipped with a diverse range of imaging…
This report presents PixelBytes, an approach for unified multimodal representation learning. Drawing inspiration from sequence models like Image Transformers, PixelCNN, and Mamba-Bytes, we explore integrating text, audio, action-state, and…
As conventional frame-based cameras suffer from high energy consumption and latency, several new types of image sensors have been devised, with some of them exploiting the sparsity of natural images in some transform domains. Instead of…
Modern computer vision has moved beyond the domain of internet photo collections and into the physical world, guiding camera-equipped robots and autonomous cars through unstructured environments. To enable these embodied agents to interact…
Polarization imaging is a technique that creates a pixel map of the polarization state in a scene. Although invisible to the human eye, polarization can assist various sensing and computer vision tasks. Existing polarization cameras use…
Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture…