Skip to content

Commit

Permalink
Update deeptrio metrics page
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 702138503
  • Loading branch information
kishwarshafin authored and copybara-github committed Dec 3, 2024
1 parent 094e51a commit f22dc73
Showing 1 changed file with 53 additions and 40 deletions.
93 changes: 53 additions & 40 deletions docs/metrics-deeptrio.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,34 @@

## WGS (Illumina)

## Setup

The runtime and accuracy reported in this page are generated using
`n2-standard-96` GCP instances which has the following configuration:

```bash
GCP instance type: n2-standard-96
CPUs: 96-core (vCPU)
Memory: 384GiB
GPUs: 0
```

### Runtime

Runtime is on HG002/HG003/HG004 (all chromosomes).
Reported runtime is an average of 5 runs.

Stage | Wall time (minutes)
-------------------------------- | -----------------
make_examples | 381m27.76s
call_variants: HG002 | 376m44.92s
call_variants: HG003 | 379m55.40s
call_variants: HG004 | 380m27.95s
postprocess_variants (parallel) | 45m24.88s; 47m0.02s; 47m46.29s
vcf_stats_report(optional):HG002 | 9m20.03s
vcf_stats_report(optional):HG003 | 9m29.88s
vcf_stats_report(optional):HG003 | 9m29.88s
total | 1576m56.29s (26h16m56.29s)
make_examples | 172m53.87s
call_variants: HG002 | 269m26.55s
call_variants: HG003 | 268m2.29s
call_variants: HG004 | 270m22.72s
postprocess_variants (parallel) | 34m12.36s; 35m4.75s; 35m8.14s
vcf_stats_report(optional):HG002 | 6m36.58s
vcf_stats_report(optional):HG003 | 6m39.92s
vcf_stats_report(optional):HG003 | 6m40.64s
total | 1028m3.08s (17h08m3.08s)

### Accuracy

Expand Down Expand Up @@ -47,13 +59,13 @@ truth), which was held out while training.
| SNP | 71445 | 214 | 48 | 0.997014 | 0.999329 | 0.99817 |

* See VCF stats report (for all chromosomes)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WGS/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WGS/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WGS/HG004.output.visual_report.html)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WGS/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WGS/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WGS/HG004.output.visual_report.html)

## PacBio (HiFi)

In v1.7.0, we introduced read haplotagging in DeepTrio PacBio. You no longer
Read haplotagging in DeepTrio PacBio is on by default. You no longer
need to run DeepVariant->WhatsHap->DeepTrio, and can just run DeepTrio once.

### Runtime
Expand All @@ -63,20 +75,20 @@ Reported runtime is an average of 5 runs.

Stage | Wall time (minutes)
-------------------------------- | -------------------
make_examples | 50m35.96s+621m56.74s
call_variants: HG002 | 364m39.93s
call_variants: HG003 | 368m0.84s
call_variants: HG004 | 372m44.77s
postprocess_variants (parallel) | 58m52.92s; 66m36.57s; 67m35.91s
vcf_stats_report(optional):HG002 | 9m33.72s
vcf_stats_report(optional):HG003 | 9m48.13s
vcf_stats_report(optional):HG003 | 10m1.22s
total | 1858m53.78s (30h58m53.78s)
make_examples | 16m48.88s+288m15.08s
call_variants: HG002 | 279m5.76s
call_variants: HG003 | 274m47.90s
call_variants: HG004 | 283m37.89s
postprocess_variants (parallel) | 44m12.28s; 51m39.02s; 51m52.66s
vcf_stats_report(optional):HG002 | 6m49.94s
vcf_stats_report(optional):HG003 | 6m53.24s
vcf_stats_report(optional):HG003 | 7m19.57s
total | 1206m35.85s (20h6m35.85s)

* See VCF stats report (for all chromosomes)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/PACBIO/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/PACBIO/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/PACBIO/HG004.output.visual_report.html)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/PACBIO/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/PACBIO/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/PACBIO/HG004.output.visual_report.html)

### Accuracy

Expand All @@ -96,6 +108,7 @@ truth), which was held out while training.
| ----- | -------- | -------- | -------- | ------------- | ---------------- | --------------- |
| INDEL | 10577 | 51 | 77 | 0.995201 | 0.993089 | 0.994144 |
| SNP | 70143 | 23 | 35 | 0.999672 | 0.999502 | 0.999587 |

#### HG004:

| Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Recall | METRIC.Precision | METRIC.F1_Score |
Expand All @@ -112,15 +125,15 @@ Reported runtime is an average of 5 runs.

Stage | Wall time (minutes)
-------------------------------- | --------------
make_examples | 15m6.77s
call_variants: HG002 | 5m16.13s
call_variants: HG003 | 5m18.83s
call_variants: HG004 | 5m19.09s
postprocess_variants (parallel) | 0m51.70s; 0m52.27s; 0m53.73s
vcf_stats_report(optional):HG002 | 0m7.84s
vcf_stats_report(optional):HG003 | 0m8.01s
vcf_stats_report(optional):HG003 | 0m10.00s
total | 32m20.47s
make_examples | 7m11.47s
call_variants: HG002 | 3m49.25s
call_variants: HG003 | 3m53.32s
call_variants: HG004 | 3m52.68s
postprocess_variants (parallel) | 0m40.52s; 0m42.09s; 0m42.30s
vcf_stats_report(optional):HG002 | 0m5.65s
vcf_stats_report(optional):HG003 | 0m5.69s
vcf_stats_report(optional):HG003 | 0m7.15s
total | 20m6.26s

### Accuracy

Expand Down Expand Up @@ -150,14 +163,14 @@ truth), which was held out while training.
| SNP | 676 | 3 | 0 | 0.995582 | 1.0 | 0.997786 |

* See VCF stats report (for all chromosomes)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WES/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WES/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.7.0/WES/HG004.output.visual_report.html)
- [HG002](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WES/HG002.output.visual_report.html)
- [HG003](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WES/HG003.output.visual_report.html)
- [HG004](https://storage.googleapis.com/deepvariant/visual_reports/DeepTrio/1.8.0/WES/HG004.output.visual_report.html)

## How to reproduce the metrics on this page

For simplicity and consistency, we report runtime with a
[CPU instance with 64 CPUs](deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform)
[CPU instance with 96 CPUs](deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform)
For bigger datasets (WGS and PACBIO), we used bigger disk size (900G).
This is NOT the fastest or cheapest configuration.

Expand All @@ -166,7 +179,7 @@ Use `gcloud compute ssh` to log in to the newly created instance.
Download and run any of the following case study scripts:

```
curl -O https://raw.githubusercontent.com/google/deepvariant/r1.7/scripts/inference_deeptrio.sh
curl -O https://raw.githubusercontent.com/google/deepvariant/r1.8/scripts/inference_deeptrio.sh
# WGS
bash inference_deeptrio.sh --model_preset WGS
Expand All @@ -184,4 +197,4 @@ DeepTrio. The runtime numbers reported above are the average of 5 runs each.
The accuracy metrics come from the hap.py summary.csv output file.
The runs are deterministic so all 5 runs produced the same output.

[CPU instance with 64 CPUs]: deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform
[CPU instance with 96 CPUs]: deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform

0 comments on commit f22dc73

Please sign in to comment.