# Similarity Measurement in Image Retrieval

## Contents

In image retrieval or other similarity-based task such as person re-identification, we need to compute the similarity(or distance) between the our query image and the database images. Then we can rank the database images according to their similarity to the query image. In this post, I want to briefly introduce 2 measures widely used in image retrieval tasks.

# The Euclidean distance

The Euclidean distance is straight forward, suppose \(x\) and \(y\) is two feature vectors in \(\mathbf{R^n}\), then the Euclidean distance between the two vectors is:

\[\begin{equation}\begin{aligned} d_{euclid} &= {\Vert x - y \Vert}_2 \\ &= \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2} \\ &= \sqrt{ {\Vert x \Vert}^2 + {\Vert y \Vert}^2 - 2x\cdot y }\\ \end{aligned}\end{equation}\]

If Euclidean distance between feature vectors of image A and B is smaller than that of image A and C, then we may conclude that image B is more similar to A than image C.

# The cosine similarity

Cosine similarity is another commonly used measure. For vector \(x\) and \(y\), it is defined as:

\[\begin{equation} s = \frac{x\cdot y}{\Vert x \Vert \Vert y \Vert}\ , \end{equation}\]

which is actually the cosine value of angle \(\theta\) between vector \(x\) and \(y\). Here is a plot which shows that.

How does the above equation come? It can be derived from the The Law of cosines. Based on that law, we have:

\[\begin{equation} \cos(\theta) = \frac{ {\Vert x \Vert}^2 +{\Vert y \Vert}^2 - {\Vert x - y\Vert}^2 }{2\Vert x \Vert \cdot \Vert y \Vert}\ . \end{equation}\]

We also have the following equality:

\[\begin{equation}\begin{aligned} {\Vert x - y\Vert}^2 &= {d_{euclid}}^2 \\ &= {\Vert x \Vert}^2 +{\Vert y \Vert}^2 - 2x\cdot y \end{aligned}\end{equation}\]

Combine the two equations, we can finally get

\[\begin{equation} \cos(\theta) = \frac{x\cdot y}{\Vert x \Vert \cdot \Vert y \Vert} \end{equation}\]

# Cosine distance and its relation to Euclidean distance

In image retrieval, the feature vectors are often \(L_2\) normalized to be a unit vector. In this case, the Euclidean distance between two vectors \(x\) and \(y\) becomes:

\[\begin{equation}\begin{aligned} d_{euclid} &= \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2} \\ &= \sqrt{ {\Vert x \Vert}^2 + {\Vert y \Vert}^2 - 2x\cdot y }\\ &= \sqrt{ 2 - 2x\cdot y }\\ &= \sqrt{ 2(1 - x\cdot y) }\\ \end{aligned}\end{equation}\]

In image retrieval, the feature vector elements are all positive. \(\cos(\theta)\) are in the range \([0, 1]\).Then we can define cosine distance as

\[\begin{equation} d_{cosine}(x, y) = 1 - x\cdot y \ . \end{equation}\]

Now it is easy to see that

\[\begin{equation} d_{euclid}(x, y) = \sqrt{ 2d_{cosine} }\ . \end{equation}\]

So these two measures are closely related. In this case, you can choose either of the two measurement to assess the similarity between two images.