Related papers: Parametric Graph-based Separable Transforms for Vi…
In many state-of-the-art compression systems, signal transformation is an integral part of the encoding and decoding process, where transforms provide compact representations for the signals of interest. This paper introduces a class of…
Current video coding standards, including H.264/AVC, HEVC, and VVC, employ discrete cosine transform (DCT), discrete sine transform (DST), and secondary to Karhunen-Loeve transforms (KLTs) decorrelate the intra-prediction residuals.…
Recent video codecs with multiple separable transforms can achieve significant coding gains using asymmetric trigonometric transforms (DCTs and DSTs), because they can exploit diverse statistics of residual block signals. However, they add…
Discrete trigonometric transforms (DTTs), such as the DCT-2 and the DST-7, are widely used in video codecs for their balance between coding performance and computational efficiency. In contrast, data-dependent transforms, such as the…
Modern compression systems use linear transformations in their encoding and decoding processes, with transforms providing compact signal representations. While multiple data-dependent transforms for image/video coding can adapt to diverse…
The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored…
In this paper, we propose a new graph-based transform and illustrate its potential application to signal compression. Our approach relies on the careful design of a graph that optimizes the overall rate-distortion performance through an…
Transformer-based models have recently shown success in representation learning on graph-structured data beyond natural language processing and computer vision. However, the success is limited to small-scale graphs due to the drawbacks of…
The discrete cosine transform (DCT) is a relevant tool in signal processing applications, mainly known for its good decorrelation properties. Current image and video coding standards -- such as JPEG and HEVC -- adopt the DCT as a…
Data-dependent transforms are increasingly being incorporated into next-generation video coding systems such as AVM, a codec under development by the Alliance for Open Media (AOM), and VVC. To circumvent the computational complexities…
Video tokenization procedure is critical for a wide range of video processing tasks. Most existing approaches directly transform video into fixed-grid and patch-wise tokens, which exhibit limited versatility. Spatially, uniformly allocating…
For the last few decades, the application of signal-adaptive transform coding to video compression has been stymied by the large computational complexity of matrix-based solutions. In this paper, we propose a novel parametric approach to…
Linear block transform coding remains a fundamental component of image and video compression. Although the Discrete Cosine Transform (DCT) is widely employed in all current compression standards, its sub-optimality has sparked ongoing…
Video compression has been investigated by means of analysis-synthesis, and more particularly by means of inpainting. The first part of our approach has been to develop the inpainting of DCT coefficients in an image. This has shown good…
Most codec designs rely on the mean squared error (MSE) as a fidelity metric in rate-distortion optimization, which allows to choose the optimal parameters in the transform domain but may fail to reflect perceptual quality. Alternative…
Data-dependent secondary transforms, which aim to decorrelate coefficients of a separable primary transform, can improve residual coding efficiency; however, their deployment is often constrained by computational complexity. Recent video…
Discrete transforms play an important role in many signal processing applications, and low-complexity alternatives for classical transforms became popular in recent years. Particularly, the discrete cosine transform (DCT) has proven to be…
Well-trained generative neural networks (GNN) are very efficient at compressing visual information for static images in their learned parameters but not as efficient as inter- and intra-prediction for most video content. However, for…
Generating longer textual sequences when conditioned on the visual information is an interesting problem to explore. The challenge here proliferate over the standard vision conditioned sentence-level generation (e.g., image or video…
In graph signal processing (GSP), prior information on the dependencies in the signal is collected in a graph which is then used when processing or analyzing the signal. Blind source separation (BSS) techniques have been developed and…