From f22dc738c2c5666f59f82bb0d22ed0acba5c605d Mon Sep 17 00:00:00 2001 From: shafin Date: Mon, 2 Dec 2024 17:36:18 -0800 Subject: [PATCH] Update deeptrio metrics page PiperOrigin-RevId: 702138503 --- docs/metrics-deeptrio.md | 93 +++++++++++++++++++++++----------------- 1 file changed, 53 insertions(+), 40 deletions(-) diff --git a/docs/metrics-deeptrio.md b/docs/metrics-deeptrio.md index 0afcb9d0..a2a5f306 100644 --- a/docs/metrics-deeptrio.md +++ b/docs/metrics-deeptrio.md @@ -2,6 +2,18 @@ ## WGS (Illumina) +## Setup + +The runtime and accuracy reported in this page are generated using +`n2-standard-96` GCP instances which has the following configuration: + +```bash +GCP instance type: n2-standard-96 +CPUs: 96-core (vCPU) +Memory: 384GiB +GPUs: 0 +``` + ### Runtime Runtime is on HG002/HG003/HG004 (all chromosomes). @@ -9,15 +21,15 @@ Reported runtime is an average of 5 runs. Stage | Wall time (minutes) -------------------------------- | ----------------- -make_examples | 381m27.76s -call_variants: HG002 | 376m44.92s -call_variants: HG003 | 379m55.40s -call_variants: HG004 | 380m27.95s -postprocess_variants (parallel) | 45m24.88s; 47m0.02s; 47m46.29s -vcf_stats_report(optional):HG002 | 9m20.03s -vcf_stats_report(optional):HG003 | 9m29.88s -vcf_stats_report(optional):HG003 | 9m29.88s -total | 1576m56.29s (26h16m56.29s) +make_examples | 172m53.87s +call_variants: HG002 | 269m26.55s +call_variants: HG003 | 268m2.29s +call_variants: HG004 | 270m22.72s +postprocess_variants (parallel) | 34m12.36s; 35m4.75s; 35m8.14s +vcf_stats_report(optional):HG002 | 6m36.58s +vcf_stats_report(optional):HG003 | 6m39.92s +vcf_stats_report(optional):HG003 | 6m40.64s +total | 1028m3.08s (17h08m3.08s) ### Accuracy @@ -47,13 +59,13 @@ truth), which was held out while training. | SNP | 71445 | 214 | 48 | 0.997014 | 0.999329 | 0.99817 | * See VCF stats report (for all chromosomes) - - [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WGS/HG002.output.visual_report.html) - - [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WGS/HG003.output.visual_report.html) - - [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WGS/HG004.output.visual_report.html) + - [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WGS/HG002.output.visual_report.html) + - [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WGS/HG003.output.visual_report.html) + - [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WGS/HG004.output.visual_report.html) ## PacBio (HiFi) -In v1.7.0, we introduced read haplotagging in DeepTrio PacBio. You no longer +Read haplotagging in DeepTrio PacBio is on by default. You no longer need to run DeepVariant->WhatsHap->DeepTrio, and can just run DeepTrio once. ### Runtime @@ -63,20 +75,20 @@ Reported runtime is an average of 5 runs. Stage | Wall time (minutes) -------------------------------- | ------------------- -make_examples | 50m35.96s+621m56.74s -call_variants: HG002 | 364m39.93s -call_variants: HG003 | 368m0.84s -call_variants: HG004 | 372m44.77s -postprocess_variants (parallel) | 58m52.92s; 66m36.57s; 67m35.91s -vcf_stats_report(optional):HG002 | 9m33.72s -vcf_stats_report(optional):HG003 | 9m48.13s -vcf_stats_report(optional):HG003 | 10m1.22s -total | 1858m53.78s (30h58m53.78s) +make_examples | 16m48.88s+288m15.08s +call_variants: HG002 | 279m5.76s +call_variants: HG003 | 274m47.90s +call_variants: HG004 | 283m37.89s +postprocess_variants (parallel) | 44m12.28s; 51m39.02s; 51m52.66s +vcf_stats_report(optional):HG002 | 6m49.94s +vcf_stats_report(optional):HG003 | 6m53.24s +vcf_stats_report(optional):HG003 | 7m19.57s +total | 1206m35.85s (20h6m35.85s) * See VCF stats report (for all chromosomes) - - [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/PACBIO/HG002.output.visual_report.html) - - [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/PACBIO/HG003.output.visual_report.html) - - [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/PACBIO/HG004.output.visual_report.html) + - [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/PACBIO/HG002.output.visual_report.html) + - [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/PACBIO/HG003.output.visual_report.html) + - [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/PACBIO/HG004.output.visual_report.html) ### Accuracy @@ -96,6 +108,7 @@ truth), which was held out while training. | ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- | | INDEL | 10577 | 51 | 77 | 0.995201 | 0.993089 | 0.994144 | | SNP | 70143 | 23 | 35 | 0.999672 | 0.999502 | 0.999587 | + #### HG004: | Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score | @@ -112,15 +125,15 @@ Reported runtime is an average of 5 runs. Stage | Wall time (minutes) -------------------------------- | -------------- -make_examples | 15m6.77s -call_variants: HG002 | 5m16.13s -call_variants: HG003 | 5m18.83s -call_variants: HG004 | 5m19.09s -postprocess_variants (parallel) | 0m51.70s; 0m52.27s; 0m53.73s -vcf_stats_report(optional):HG002 | 0m7.84s -vcf_stats_report(optional):HG003 | 0m8.01s -vcf_stats_report(optional):HG003 | 0m10.00s -total | 32m20.47s +make_examples | 7m11.47s +call_variants: HG002 | 3m49.25s +call_variants: HG003 | 3m53.32s +call_variants: HG004 | 3m52.68s +postprocess_variants (parallel) | 0m40.52s; 0m42.09s; 0m42.30s +vcf_stats_report(optional):HG002 | 0m5.65s +vcf_stats_report(optional):HG003 | 0m5.69s +vcf_stats_report(optional):HG003 | 0m7.15s +total | 20m6.26s ### Accuracy @@ -150,14 +163,14 @@ truth), which was held out while training. | SNP | 676 | 3 | 0 | 0.995582 | 1.0 | 0.997786 | * See VCF stats report (for all chromosomes) - - [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WES/HG002.output.visual_report.html) - - [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WES/HG003.output.visual_report.html) - - [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WES/HG004.output.visual_report.html) + - [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WES/HG002.output.visual_report.html) + - [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WES/HG003.output.visual_report.html) + - [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WES/HG004.output.visual_report.html) ## How to reproduce the metrics on this page For simplicity and consistency, we report runtime with a -[CPU instance with 64 CPUs](deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform) +[CPU instance with 96 CPUs](deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform) For bigger datasets (WGS and PACBIO), we used bigger disk size (900G). This is NOT the fastest or cheapest configuration. @@ -166,7 +179,7 @@ Use `gcloud compute ssh` to log in to the newly created instance. Download and run any of the following case study scripts: ``` -curl -O https://raw.githubusercontent.com/google/deepvariant/r1.7/scripts/inference_deeptrio.sh +curl -O https://raw.githubusercontent.com/google/deepvariant/r1.8/scripts/inference_deeptrio.sh # WGS bash inference_deeptrio.sh --model_preset WGS @@ -184,4 +197,4 @@ DeepTrio. The runtime numbers reported above are the average of 5 runs each. The accuracy metrics come from the hap.py summary.csv output file. The runs are deterministic so all 5 runs produced the same output. -[CPU instance with 64 CPUs]: deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform +[CPU instance with 96 CPUs]: deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform