AYUSH DOGRA, APOORAV SHARMA, BHAWNA GOYAL, VINAY KUKREJA, and RENU VIG
Image matting is the process of isolating an object from its background, which is a fundamental problem in computer vision and image processing. The goal of image matting is to extract an object’s silhouette or alpha channel, which can be used for compositing or various image editing tasks. More technically, it is a process that involves estimating the opacity or transparency of each pixel in an image with respect to an underlying background. The goal of image matting is to separate the foreground objects from the background while preserving the fine details of the objects, such as hair, fur, and transparent or translucent surfaces.
Image matting algorithms use various cues, such as color, texture, and gradient, to estimate the alpha (transparency) value for each pixel. The resulting alpha matte can be used to composite the foreground objects onto a new background or to remove the background altogether. Image matting has numerous applications in image and video editing, including object segmentation, image compositing, and virtual reality. Image matting has gained significant attention in recent years due to its various industry-based applications.1
Several methodologies have been developed for image matting, including:
Sampling-based matting. This method involves sampling pixels in the foreground and background regions and using these samples to estimate the alpha value for each pixel. This method is computationally intensive and can take a long time to process large images.1
Closed-form matting. This method estimates the foreground and background areas using a color and alpha matte. By resolving a series of linear equations that link the foreground, background, and unknown areas, the alpha matte is computed.3
Trimap-based matting. This method involves creating a trimap, which is a rough estimate of the foreground and background regions. The trimap consists of three regions: the foreground, the background, and the unknown region. The unknown region is then estimated using the color and texture information from the foreground and background regions.7
Spectral matting. This technique bases its estimation of the alpha matte on the spectral decomposition of the picture. Using the calculated alpha matte, the foreground and background areas are then extracted.
Optimization-based matting. This method involves formulating the image matting problem as an optimization problem, where the objective function is to minimize the difference between the estimated alpha value and the true alpha value. This method can be time-consuming, but can produce high-quality results.
Learning-based matting. This method involves training a machine learning model to estimate the alpha value for each pixel based on the input image and a set of labeled images. This method can produce high-quality results and is computationally efficient.5, 6
Deep learning-based matting. This method involves training a deep neural network to estimate the alpha value for each pixel. This method can produce high-quality results and is computationally efficient.
KNN matting. In order to estimate the alpha value of the pixel being estimated, the procedure includes choosing K nearby pixels from the picture. The K-closest pixels are often chosen based on how closely their color and texture resemble the predicted pixel.4
Bayesian matting. It determines an image’s alpha matte using a probabilistic approach. Given the observed image and the predicted foreground and background colors, Bayesian inference is used to determine the posterior distribution of the alpha matte.4
There are various image matting methodologies, each with its own benefits and drawbacks. The application and the user’s needs determine the approach to utilize. While some techniques could be computationally demanding, others might call for a sizable training collection of labeled images. Deep learning-based approaches are becoming more common because of their capacity to deliver high-quality outcomes while being computationally effective. The table provides the general subjective assessment of various matting algorithms.
Furthermore, for objective assessment of the matting algorithms, the following parameters are used:
Mean square error (MSE). The average squared difference between the estimated alpha matte and the actual matte is what MSE calculates.
Connectivity error (CE). The degree to which the foreground object’s connection is maintained by the calculated alpha matte is measured by CE.
KNN error (KE). Using a K-nearest neighbor method, KNE calculates the difference between the estimated alpha matte and the actual matte.
Gradient error (GE). The gradient difference between the estimated alpha matte and the ground truth matte is measured by GE.
Boundary recall (BR). The BR metric assesses how well the foreground object’s borders are correctly captured by the approximated alpha matte.
F-measure. The estimated alpha matte’s accuracy and recall are both taken into account by the combined metric known as the F-measure.
Boundary displacement error (BDE). BDE calculates the difference between the estimated alpha matte’s bounds and the ground truth matte’s boundaries.
The performance of a matting algorithm cannot be fully captured by a single metric, so a variety of measurements are used to offer a more thorough evaluation. Also, the features of the pictures being processed as well as the needs particular to the application should be taken into account when selecting the assessment measures.
In this article, we will stick to the most widely discussed and used methodology in literature: Trimap-based matting (see Fig. 1 at top of this page).
According to the diagram, input image (I) is the image for which the matte is created. Possible inputs can be a digital image, a photograph, or a video frame. A trimap is an image that isolates the foreground, background, and undefined region of the original image (also known as the “matte” region). The portion of the image that we want to maintain is the foreground, the portion that we want to eliminate is the background, and the portion that we want to estimate the alpha values for is the unknown region. Typically, the user creates the trimap manually or automatically using segmentation techniques. The scribble image is there to provide additional guidance to the algorithm on the exact location of the foreground and background regions.
The matting algorithm uses the input picture and the trimap or scribbles to produce an alpha matte (), which calculates the transparency of each pixel in the input image. The foreground, background, and intermediate regions of the input image are identified by the alpha matte (known as the “unknown” region).
The output image (I') is then created by using an alpha matte to composite the input picture’s foreground and background areas. A composite picture that seamlessly combines the foreground and background can be produced, or it can be a cutout of the foreground item.
Applications
Entertainment, advertising, and e-commerce are just a few of the many sectors in which picture matting may be used. Image matting has several important uses in the entertainment sector, where it is frequently utilized for special effects and compositing in films and television programs. Picture matting, which removes characters or objects from their surroundings and adds them to a new background, is essential in creating realistic and credible visual effects.
Image matting is also popular in advertising and online shopping. With the backdrop removed and a clean, polished look created, picture matting aids in improving the visual appeal of items in various sectors. For instance, picture matting may be used to remove the backdrop from an image of clothing to generate a transparent version of the item, which can then be put on various backgrounds to promote the goods while selling apparel online.
For compositing and special effects, picture matting is widely used in the film and television industries. It provides convincing visual effects, such as the substitution of a genuine background for a green screen or the setting of an object in a new location. In post-production, image matting is frequently used to exclude undesirable characters or elements from a scene.1, 2
Image matting is a technique used in the advertising sector to produce expert and aesthetically pleasing product photographs. For instance, product photos for jewelry sales should be aesthetically appealing and emphasize the piece’s intricacies. The backdrop may be eliminated using picture matting to get a clear image of the jewelry, which can then be positioned on various backgrounds to highlight its beauty.
Image matting is frequently used by e-commerce companies to display their goods. For instance, image matting may be used to make translucent photos of the clothing item, which can then be put on various backdrops to promote the goods while selling garments online. Customers can more easily perceive the apparel item and make wiser purchasing judgements as a result.1
Experiments and results
We have experimented with 5 different techniques to obtain matting results. The dataset used is shown in Figure 2 with corresponding trimaps and the results are shown in Figure 3.
Looking at the results, it can be seen that apart from Gaussian matting, all other techniques provide fairly good results. The difference can be seen in details. These matte images are used to form composite images with different backgrounds (see Fig. 4).
Recent advancements
Current developments in picture matting have mostly concentrated on enhancing the method’s accuracy and effectiveness. The application of deep learning methods for picture matting, such as convolutional neural networks (CNNs), is one noteworthy development. It has been demonstrated that deep learning-based techniques may produce results that are both accurate and quick.
New algorithms that use extra information, including depth or color information, to enhance the quality of the mating outcomes are another recent breakthrough. For instance, some academics have suggested utilizing depth data to enhance the accuracy of the alpha mattes, while others have suggested using color data to enhance the foreground and background areas.
Lastly, efforts have been made to create interactive picture matting systems that enable users to directly alter the alpha matte. To create excellent outcomes, these methods often combine user direction with automated algorithms. In general, the area of picture matting is still developing as academics look into new methods and algorithms to increase the precision and effectiveness of the procedure.1, 7
REFERENCES
1. J. Boda and D. Pandya, Proc. IEEE ICCSP 2018, 0765–0770 (Apr. 2018).
2. Y. Aksoy, T. Ozan Aydin, and M. Pollefeys, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 29–37 (2017).
3. A. Levin, D. Lischinski, and Y. Weiss, IEEE Trans. Pattern Anal. Mach. Intell., 30, 2, 228–242 (2008).
4. Y. Y. Chuang, B. Curless, D. H. Salesin, and R. Szeliski, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2, II–II (2001).
5. N. Xu, B. Price, S. Cohen, and T. Huang, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 5576–5584 (2017).
6. H. Guo, W. Liu, and J. Cai, IEEE Trans. Image Process., 28, 6, 3086–3099 (2019).
7. T. Ruzic, R. Ranftl, and V. Koltun, Proc. ECCV, 101–117, Springer (2018).
Dr. Ayush Dogra is an assistant professor-Senior Grade at Chitkara University Institute of Engineering and Technology in Chitkara, Punjab, India. Apoorav Sharma works as a research scholar at the UIET, Panjab University in Chandigarh. Dr. Bhawna Goyal is an assistant professor in the UCRD and ECE department at Chandigarh University in Punjab, India. Professor Vinay Kukreja is a Professor at Chitkara University Institute of Engineering and Technology in Chitkara, Punjab, India. Professor Renu Vig is the current Vice Chancellor of Panjab University in Chandigarh. E-mail: [email protected].