🌮 TACOSS : Learning transferable land cover semantics for open vocabulary interactions with remote sensing images.
Valerie Zermatten , Javiera Castillo-Navarro , Diego Marcos , Devis Tuia
This repository proposes Text As supervision for COntrastive Semantic Segmentation (TACOSS), an open vocabulary semantic segmentation model for remote sensing images. TACOSS leverages the common sense knowledge captured by language models and is capable of interpreting the image at the pixel level, attributing semantics to each pixel and removing the constraints of a fixed set of land cover labels.
This project intends to not only simplify the map creation process but also bridge the gap between complex remote sensing technology and user-friendly applications, eventually making advanced mapping tools accessible to everyone.
Interested in trying out?
First install the necessary dependencies and download the models/data.
Required Python packages are listed in the requirements.yml which can be used to build a conda environment.
conda env create --file environment.yml
conda activate tacoss
Or use the provided "Dockerfile"
For trying out the TACOSS, please download the model weights and embeddings available in Zenodo: .
First, clone this repository, then copy the model weights files in the folder /output
, and the labels embeddings in the folder /data
-
The FLAIR dataset is available on the IGN website FLAIR challenge.
-
The TLM aerial images can be downloaded from the swissIMAGE 10cm website
-
The TLM annotations can be downloaded as shapefile 'BodenBeckdung' on the swissTLM3d website
- The TLM dataset as used in this repository can be provided on request by contacting the authors.
Several configuration files are provided in the config
folder.
To launch experiments based on the existing configuration files, use the following command line :
python main.py --cfg <config_name>
# Train Segformer baseline model :
python main.py --cfg segformer-base
# Train DeepLabv3+ baseline model :
python main.py --cfg dlv-base
# Train TACOSS with the SegFormer visual backbone and the SentenceBERT text encoder :
python main.py --cfg segformer-bcos-sbert-des-eda
# Train TACOSS with the DeepLabv3+ backbone and CLIP text encoder :
python main.py --cfg dlv-bcos-clip-name
Experiments with the CLIPSeg model require a specific dataset class for its training and inference since CLIPSeg is trained as a binary segmentation task with a binary cross-entropy loss. To train and evaluate CLIPSeg, use the CLIPSeg
folder and the CLIPSegFinetune.py
script.
Qualitative performance of TACOSS on the FLAIR dataset :
Qualitative performance of TACOSS on the TLM dataset (in a transfert setting ) :
Fig. 1 : Aerial view : Fig. 2 : TLM labels : Fig. 3: TACOSS predictions:More examples can be found in the associated publication [under review].
This project proposes the development of remote sensing-specific vision-language models to facilitate interactions with RS images. Our work showed a proof of principle.
In principle, to be more usable, TACOSS requires multiple improvements:
- Extend TACOSS to more geographical regions, sensors and spatial resolution. Currently, the model is trained only on high-resolution (30cm) images with RGB bands.
- Improve fine-tuning of TACOSS from a few land cover labels to a larger set of labels and a more diverse description of land cover.
- Improve open-vocabulary capabilities of TACOSS.
If you are interested in contributing to one of the aforementioned points or working on a similar project and wish to collaborate, please reach out to ECEO.
For code-related contributions, suggestions or inquiries, please open a GitHub issue.
We acknowledge the following code repositories that helped to build the TACOSS repository :
Thank you! Other smaller sources are mentioned in the relevant code sections.