Saimunur Rahman1,2, Piotr Koniusz1,3, Lei Wang2, Luping Zhou4, Peyman Moghadam1,5, Changming Sun1
1CSIRO Data61, 2University of Wollongong, 3Australian National University, 4University of Sydney, 5Queensland University of Technology
Australia
Understanding the partial correlation (a 3D toy case). Unlike the ordinary covariance (pairwise correlation of say $\mathbf{x}$ and $\mathbf{y}$ corresponding to channels), partial correlation between variables $\mathbf{x}$ and $\mathbf{y}$ removes the influence of the confounding variable $\mathbf{z}$. Let the number of samples $n\!=\!3$ and channels $d\!=\!3$. For the 3D case, $\mathbf{x}$ and $\mathbf{y}$ are projected onto a plane perpendicular to $\mathbf{z}$. Then $\rho_{xy}\!=\!\cos{\varphi_{xy}}$ (and $\rho_{xz}$ and $\rho_{yz}$ can be computed by analogy). Projected ''residuals''' $\mathbf{r}_\mathbf{x}$ and $\mathbf{r}_\mathbf{y}$ are computed as indicated in the plot, ${\mathbf{w}'\!}_x\!=\!\arg min\!_{\mathbf{w}}\!\sum_{i=1}^3(x_i\!-\!\mathbf{w}_x^\top\mathbf{z}_i)$ where $\mathbf{z}_i\!=\![z_i, 1]^\top$ (and ${\mathbf{w}'\!}_y$ is computed by analogy). The green box: for $d\!>\!3$, the computation of partial correlation requires covariance inversion.
Visual representation based on covariance matrix has demonstrates its efficacy for image classification by characterising the pairwise correlation of different channels in convolutional feature maps. However, pairwise correlation will become misleading once there is another channel correlating with both channels of interest, resulting in the "confounding'' effect. For this case, "partial correlation'' which removes the confounding effect shall be estimated instead. Nevertheless, reliably estimating partial correlation requires to solve a symmetric positive definite matrix optimisation, known as sparse inverse covariance estimation (SICE). How to incorporate this process into CNN remains an open issue. In this work, we formulate SICE as a novel structured layer of CNN. To ensure end-to-end trainability, we develop an iterative method to solve the above matrix optimisation during forward and backward propagation steps. Our work obtains a partial correlation based deep visual representation and mitigates the small sample problem often encountered by covariance matrix estimation in CNN. Computationally, our model can be effectively trained with GPU and works well with a large number of channels of advanced CNNs. Experiments show the efficacy and superior classification performance of our deep visual representation compared to covariance matrix based counterparts.
Proposed iterative sparse inverse covariance estimation (iSICE) method in a CNN pipeline.
For explanation of notations and Algirithm 1, please read the paper.
For further details, please read the paper.
For explanation of notations and Algirithm 1, please read the paper.
Visualisation of learned convolutional feature maps from Cars dataset with ResNeXt-101. From left: Input image, GAP, iSQRT-COV (Covariance Matrix), Precision Matrix and iSICE. The colour in the heatmap ranges from blue to red, blue indicates cold and red indicates hot. The feature maps suggest that with iSICE, the model focuses well on the key parts of car to extract features for classification. GAP (global average pooling) overly focuses on entire foreground, iSQRT-COV focuses poorly, while iSICE lets us control the degree of ‘focus’ by controlling sparsity.
@InProceedings{isice_cvpr, author = {Rahman, Saimunur and Koniusz, Piotr and Wang, Lei and Zhou, Luping and Moghadam, Peyman and Sun, Changming}, title = {Learning Partial Correlation based Deep Visual Representation for Image Classification}, booktitle = {IEEE/CVF Int. Conf. on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023} }
For any questions please contact at saimun.rahman@data61.csiro.au.