This repository contains the Video Analysis (VIA) Framework, a collection of Google Cloud services that you can use to transcribe video.
The repository also contains an extended version of the the Video Analysis (VIA) Framework which includes a collection of components including Elasticsearch and a web interface that you can use to search for words and phrases within your videos.
It can:
- Process uploaded video files to Cloud Storage.
- Enrich the processed video files with Google Cloud Video Intelligence API.
- Write the enriched data to BigQuery.
- With the Extended version add the enriched data to Elasticsearch index and provide a user interface to search for words and phrases
The life of a video file with the VIA:
- A video file is uploaded to Cloud Storage
- The Cloud Function is triggered on object.create
- The Cloud Function sends a long running job request to the Video Intelligence API
- The Video Intelligence API starts processing the video file
- The Cloud Function then sends the job ID from Video Intelligence API with additional metadata to Cloud Pub/Sub
- The Cloud Dataflow job enriches the data
- Cloud Dataflow then writes the data to Google Cloud BigQuery
Extended Version: The life of a video file with the VIA:
Scroll to the bottom for instructions on how to install the extended version.
- A video file is uploaded to Cloud Storage
- The Cloud Function is triggered on object.create
- The Cloud Function sends a long running job request to the Video Intelligence API
- The Video Intelligence API starts processing the video file
- The Cloud Function then sends the job ID from Video Intelligence API with additional metadata to Cloud Pub/Sub
- The Cloud Dataflow job enriches the data
- Cloud Dataflow then writes the data to Google Cloud BigQuery
- Next step in the pipeline includes the data to be written to Elasticsearch index
- The data is now ready to be searched with Elasticsearch
-
Create a storage bucket for Dataflow Staging Files
gsutil mb gs://[BUCKET_NAME]/
-
Through the Google Cloud Console create a folder named tmp in the newly created bucket for the DataFlow staging files
-
Create a storage bucket for Uploaded Video Files
gsutil mb gs://[BUCKET_NAME]/
- Create a BigQuery Dataset
bq mk [YOUR_BIG_QUERY_DATABASE_NAME]
- Create Cloud Pub/Sub Topic
gcloud pubsub topics create [YOUR_TOPIC_NAME]
- Enable Cloud Dataflow API
gcloud services enable dataflow
- Enable Cloud Video Intelligence API
gcloud services enable videointelligence.googleapis.com
- Deploy the Google Cloud Function
- In the cloned repo, go to the via-longrun-job-func directory and deploy the following Cloud Function.
gcloud functions deploy viaLongRunJobFunc --region=us-central1 --stage-bucket=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME] --runtime=nodejs8 --trigger-event=google.storage.object.finalize --trigger-resource=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]
- Deploy the Cloud Dataflow Pipeline
- python3 --version Python 3.7.8
- In the cloned repo, go to via-longrun-job-dataflow directory and deploy the Cloud Dataflow Pipeline. Run the commands below to deploy the dataflow job.
# Apple/Linux
python3 -m venv env
source env/bin/activate
pip3 install apache-beam[gcp]
pip3 install dateparser
- The Dataflow job will create the BigQuery Table you listed in the parameters.
- Please wait as it might take a few minutes to complete.
python3 vialongrunjobdataflow.py --project=[YOUR_PROJECT_ID] --input_topic=projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME] --runner=DataflowRunner --temp_location=gs://[YOUR_DATAFLOW_STAGING_BUCKET]/tmp --output_bigquery=[DATASET NAME].[TABLE] --requirements_file="requirements.txt" --region=[GOOGLE_CLOUD_REGION]
The VIA Framework requires you have an working Elasticsearch install, for more information visit Managed Elasticsearch on Google Cloud
-
Create a storage bucket for Dataflow Staging Files
gsutil mb gs://[BUCKET_NAME]/
-
Through the Google Cloud Console create a folder named tmp in the newly created bucket for the DataFlow staging files
-
Create a storage bucket for Uploaded Video Files
gsutil mb gs://[BUCKET_NAME]/
- Create a BigQuery Dataset
bq mk [YOUR_BIG_QUERY_DATABASE_NAME]
- Create Cloud Pub/Sub Topic
gcloud pubsub topics create [YOUR_TOPIC_NAME]
- Enable Cloud Dataflow API
gcloud services enable dataflow
- Enable Cloud Video Intelligence API
gcloud services enable videointelligence.googleapis.com
- Deploy the Google Cloud Function
- In the cloned repo, go to the via-longrun-job-func directory and deploy the following Cloud Function.
gcloud functions deploy viaLongRunJobFunc --region=us-central1 --stage-bucket=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME] --runtime=nodejs8 --trigger-event=google.storage.object.finalize --trigger-resource=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]
- Deploy the Cloud Dataflow Pipeline
- python3 --version Python 3.7.8
- In the cloned repo, go to via-longrun-job-dataflow-extended directory and deploy the Cloud Dataflow Pipeline.
- You need to edit the pipeline to include your Elasticsearch settings on line 100
- Run the commands below to deploy the dataflow job.
# Apple/Linux
python3 -m venv env
source env/bin/activate
pip3 install apache-beam[gcp]
pip3 install dateparser
pip3 install elasticsearch
- The Dataflow job will create the BigQuery Table you listed in the parameters.
- Please wait as it might take a few minutes to complete.
python3 viaextendedlongrunjobdataflow.py --project=[YOUR_PROJECT_ID] --input_topic=projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME] --runner=DataflowRunner --temp_location=gs://[YOUR_DATAFLOW_STAGING_BUCKET]/tmp --output_bigquery=[DATASET NAME].[TABLE] --requirements_file="requirements.txt" --region=[GOOGLE_CLOUD_REGION]
- Deploy Search Interface
- In the cloned repo, go to the via-web/src directory. Edit the Settings.js file to include your Elasticsearch parameters.
- Run the commands below in the via-web directory to deploy in the search interface.
npm run build
gcloud app deploy
-
The Search Interface requires Google Cloud Identity-Aware Proxy (IAP)
-
Browse to the newly created App Engine service URL.
- To search for phrases enter your text string in quotes as:
- To search for multiple words enter your words separated by space as:
This is not an officially supported Google product