Understanding the partial correlation (a 3D toy case). Unlike the ordinary covariance (pairwise correlation of say $\mathbf{x}$ and $\mathbf{y}$ corresponding to channels), partial correlation between variables $\mathbf{x}$ and $\mathbf{y}$ removes the influence of the confounding variable $\mathbf{z}$. Let the number of samples $n\!=\!3$ and channels $d\!=\!3$. For the 3D case, $\mathbf{x}$ and $\mathbf{y}$ are projected onto a plane perpendicular to $\mathbf{z}$. Then $\rho_{xy}\!=\!\cos{\varphi_{xy}}$ (and $\rho_{xz}$ and $\rho_{yz}$ can be computed by analogy). Projected ''residuals''' $\mathbf{r}_\mathbf{x}$ and $\mathbf{r}_\mathbf{y}$ are computed as indicated in the plot, ${\mathbf{w}'\!}_x\!=\!\arg min\!_{\mathbf{w}}\!\sum_{i=1}^3(x_i\!-\!\mathbf{w}_x^\top\mathbf{z}_i)$ where $\mathbf{z}_i\!=\![z_i, 1]^\top$ (and ${\mathbf{w}'\!}_y$ is computed by analogy). The green box: for $d\!>\!3$, the computation of partial correlation requires covariance inversion.

Proposed idea

Visual representation based on covariance matrix has demonstrates its efficacy for image classification by characterising the pairwise correlation of different channels in convolutional feature maps. However, pairwise correlation will become misleading once there is another channel correlating with both channels of interest, resulting in the "confounding'' effect. For this case, "partial correlation'' which removes the confounding effect shall be estimated instead. Nevertheless, reliably estimating partial correlation requires to solve a symmetric positive definite matrix optimisation, known as sparse inverse covariance estimation (SICE). How to incorporate this process into CNN remains an open issue. In this work, we formulate SICE as a novel structured layer of CNN. To ensure end-to-end trainability, we develop an iterative method to solve the above matrix optimisation during forward and backward propagation steps. Our work obtains a partial correlation based deep visual representation and mitigates the small sample problem often encountered by covariance matrix estimation in CNN. Computationally, our model can be effectively trained with GPU and works well with a large number of channels of advanced CNNs. Experiments show the efficacy and superior classification performance of our deep visual representation compared to covariance matrix based counterparts.

Proposed iterative sparse inverse covariance estimation (iSICE) method in a CNN pipeline.

For explanation of notations and Algirithm 1, please read the paper.

For further details, please read the paper.

For explanation of notations and Algirithm 1, please read the paper.

Visualisation of learned convolutional feature maps from Cars dataset with ResNeXt-101. From left: Input image, GAP, iSQRT-COV (Covariance Matrix), Precision Matrix and iSICE. The colour in the heatmap ranges from blue to red, blue indicates cold and red indicates hot. The feature maps suggest that with iSICE, the model focuses well on the key parts of car to extract features for classification. GAP (global average pooling) overly focuses on entire foreground, iSQRT-COV focuses poorly, while iSICE lets us control the degree of ‘focus’ by controlling sparsity.

Main paper

arXiv preprint

View PDF

Source code

Github Code repository

View repository

Poster

Poster presentation

View PDF

@InProceedings{isice_cvpr,
  author = {Rahman, Saimunur and Koniusz, Piotr and Wang, Lei and Zhou, Luping and Moghadam, Peyman and Sun, Changming},
  title = {Learning Partial Correlation based Deep Visual Representation for Image Classification},
  booktitle = {IEEE/CVF Int. Conf. on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2023}
}

For any questions please contact at saimun.rahman@data61.csiro.au.

Learning Partial Correlation based Deep Visual Representation
for Image Classification

Proposed idea

Proposed framework

Proposed iSICE algorithm

Video

Experimental results

Feature visualisation

Resources

arXiv preprint

Github Code repository

Poster presentation

Bibilography

Contact