Combination of Multiple Global Descriptors for Image Retrieval

This repository contains a Tensorflow 2.2 implementation of image ranking model based on following research papers:

In Defense of the Triplet Loss for Person Re-Identification - by Alexander Hermans, Lucas Beyer, and Bastian Leibe
Hard-aware point-to-set deep metric for person re-identification - by Rui Yu, Zhiyong Dou, Song Bai, Zhaoxiang Zhang, Yongchao Xu, and Xiang Bai
Combination of Multiple Global Descriptors for Image Retrieval - by HeeJae Jun, Byungsoo Ko, Youngjoon Kim, Insik Kim, and Jongtack Kim

Two models are provided:

net.ml.ImagesSimilarityComputer - a simple reference model, somewhat similar to model used in In Defense of the Triplet Loss for Person Re-Identification
net.ml.CGDImagesSimilarityComputer - model using a combination of glabl descriptors, based on achitecture from Combination of Multiple Global Descriptors for Image Retrieval (CGD)

Two types of losses are provided:

batch hard triplets loss - as described in In Defense of the Triplet Loss for Person Re-Identification
hard aware point to set loss - as described in Hard-aware point-to-set deep metric for person re-identification

Implementation details

There are a some differences between official CGD architecture and our implementation. Authors of CGD paper modify backbone ResNet-50 network so there is no downsampling between stage 3 and stage 4, resulting in higher-resolution outputs from base network. We don't modify base network in any way.

We include script for training and evaluation on Stanford University's Cars 196 Dataset. While CGD paper crops exact cars locations from raw images and then resizes results to fixed size, we instead first pad raw images to squares, and then resize to fixed size.

Results

Results are based on Stanford University's Cars 196 Dataset.

Somewhat different from results reported in Combination of Multiple Global Descriptors for Image Retrieval, the best results were obtained using model with SPoC (sum from channels) head only. Adding MAC and GeM heads brings down accuracy by ~5%.

k	Recall at k
1	0.717
2	0.814
4	0.880
8	0.931

Image below shows representative ranking performance. Each row starts with a query image, marked with a blue dot, followed by top 8 ranked images for that query. Images with same category as query image are marked with a green dot. Validation set contains about 8,000 images, with, on average, about 80 images per category.

How to run

This project can be run in a docker container. Building the docker container is a two stage process:

building app_base container (./docker/app_base.Dockerfile)
buiding app container (./docker/app.Dockerfile)

You can build containers manually with docker, but there are also invoke tasks provided:

invoke docker.build-app-base-container - builds base container. Based on tensorflow/tensorflow:2.2.0-gpu container, downloads weights for base network, installs python requirements
invoke docker.build-app-container - based on app-base container created with command above, creates user, paths, environment variables, mounts code

You can then start the container manually, or with provided invoke docker.run command that takes care of mounting paths for data volume. invoke docker.run asks for password to execute sudo chmod on data volume path, so that docker container has write permissions on host system. This is necessary because user inside docker container isn't root. It's easy to modify code to change this behaviour if you don't need to access any outputs on host system.

Once inside container, following key invoke commands are available:

analysis.analyze-model-performance Analyze model performance
ml.train Train model
visualize.visualize-data Visualize data
visualize.visualize-predictions-on-batches Visualize image similarity ranking predictions on a few batches of data
visualize.visualize-predictions-on-dataset Visualize image similarity ranking predictions on a few

Most commands accept --config-path argument that accepts a path pointing to a configuration file. Sample configuration file is provided at ./config.yaml.

How to extend

Should you want to use this code to train and predict on a different data than Cars 196, you would need to:

provide your own data loaders for training and analyzing - please refer to net.data.Cars196TrainingLoopDataLoader and net.data.Cars196AnalysisDataLoader for sample implementations
provide a yaml configuration file pointing to paths with your data - please refer to config.yaml for the expected format

Honorable mentions

In addition to research papers listed above, following works were consulted during making of this project:

Olivier Moindrot for his post on triples loss that helped me troubleshoot problem with training divergence when distance between two embeddings was 0
leftthomas for his PyTorch implementation of CGD that I consulted for implementation of descriptors implementations

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.devcontainer		.devcontainer
.vscode		.vscode
docker		docker
images		images
net		net
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
configuration.yaml		configuration.yaml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Combination of Multiple Global Descriptors for Image Retrieval

Implementation details

Results

How to run

How to extend

Honorable mentions

About

Releases

Packages

Languages

PuchatekwSzortach/combination_of_multiple_global_descriptors_for_image_retrieval

Folders and files

Latest commit

History

Repository files navigation

Combination of Multiple Global Descriptors for Image Retrieval

Implementation details

Results

How to run

How to extend

Honorable mentions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages