This project is a Deep Learning Project with the description of the video/image. It's a deep learning project combined NLP and CV, literally.
MSR-VTT dataset. The link of the dataset is msr-vtt
This dataset has 10000 videos in total including training dataset, test dataset and validation dataset. Each video has 20 captions and all the videos are categorized to 20 classes.
The architecture is from the S2VT paper.