Skip to content

Commit

Permalink
Update DeepTrio case-studies
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 700145069
  • Loading branch information
kishwarshafin authored and copybara-github committed Nov 26, 2024
1 parent d34ae33 commit 87764b2
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 43 deletions.
38 changes: 19 additions & 19 deletions docs/deeptrio-pacbio-case-study.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ is run as a separate command.
mkdir -p output
mkdir -p output/intermediate_results_dir

BIN_VERSION="1.7.0"
BIN_VERSION="1.8.0"

sudo apt -y update
sudo apt-get -y install docker.io
Expand Down Expand Up @@ -221,13 +221,13 @@ As a result we should get the following output:
```bash
Checking: /output/HG002_trio_merged.vcf.gz
Family: [HG003 + HG004] -> [HG002]
222 non-pass records were skipped
Concordance HG002: F:166005/169476 (97.95%) M:166074/168579 (98.51%) F+M:159317/164363 (96.93%)
188 non-pass records were skipped
Concordance HG002: F:166225/169750 (97.92%) M:166415/168977 (98.48%) F+M:159575/164659 (96.91%)
Sample HG002 has less than 99.0 concordance with both parents. Check for incorrect pedigree or sample mislabelling.
0/188247 (0.00%) records did not conform to expected call ploidy
176481/188247 (93.75%) records were variant in at least 1 family member and checked for Mendelian constraints
10169/176481 (5.76%) records had indeterminate consistency status due to incomplete calls
6610/176481 (3.75%) records contained a violation of Mendelian constraints
0/188437 (0.00%) records did not conform to expected call ploidy
176829/188437 (93.84%) records were variant in at least 1 family member and checked for Mendelian constraints
10143/176829 (5.74%) records had indeterminate consistency status due to incomplete calls
6722/176829 (3.80%) records contained a violation of Mendelian constraints
```

### Benchmark variant calls against 4.2.1 truth set with hap.py
Expand Down Expand Up @@ -289,22 +289,22 @@ sudo docker run \
```
Benchmarking Summary for HG002:
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 11256 11215 41 23348 85 11580 30 50 0.996357 0.992777 0.495974 0.994564 NaN NaN 1.561710 2.133416
INDEL PASS 11256 11215 41 23348 85 11580 30 50 0.996357 0.992777 0.495974 0.994564 NaN NaN 1.561710 2.133416
SNP ALL 71333 71303 30 108157 20 36757 16 4 0.999579 0.999720 0.339849 0.999650 2.314904 1.745105 1.715978 1.773270
SNP PASS 71333 71303 30 108157 20 36757 16 4 0.999579 0.999720 0.339849 0.999650 2.314904 1.745105 1.715978 1.773270
INDEL ALL 11256 11213 43 23405 84 11635 32 45 0.996180 0.992863 0.497116 0.994519 NaN NaN 1.561710 2.151675
INDEL PASS 11256 11213 43 23405 84 11635 32 45 0.996180 0.992863 0.497116 0.994519 NaN NaN 1.561710 2.151675
SNP ALL 71333 71305 28 108561 21 37160 14 7 0.999607 0.999706 0.342296 0.999657 2.314904 1.742256 1.715978 1.772847
SNP PASS 71333 71305 28 108561 21 37160 14 7 0.999607 0.999706 0.342296 0.999657 2.314904 1.742256 1.715978 1.772847
Benchmarking Summary for HG003:
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 10628 10575 53 23766 78 12623 33 44 0.995013 0.993000 0.531137 0.994006 NaN NaN 1.748961 2.326587
INDEL PASS 10628 10575 53 23766 78 12623 33 44 0.995013 0.993000 0.531137 0.994006 NaN NaN 1.748961 2.326587
SNP ALL 70166 70145 21 117124 35 46895 11 10 0.999701 0.999502 0.400388 0.999601 2.296566 1.579731 1.883951 1.689079
SNP PASS 70166 70145 21 117124 35 46895 11 10 0.999701 0.999502 0.400388 0.999601 2.296566 1.579731 1.883951 1.689079
INDEL ALL 10628 10577 51 23776 77 12634 33 43 0.995201 0.993089 0.531376 0.994144 NaN NaN 1.748961 2.332224
INDEL PASS 10628 10577 51 23776 77 12634 33 43 0.995201 0.993089 0.531376 0.994144 NaN NaN 1.748961 2.332224
SNP ALL 70166 70143 23 117125 35 46898 13 9 0.999672 0.999502 0.400410 0.999587 2.296566 1.57963 1.883951 1.685873
SNP PASS 70166 70143 23 117125 35 46898 13 9 0.999672 0.999502 0.400410 0.999587 2.296566 1.57963 1.883951 1.685873
Benchmarking Summary for HG004:
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 11000 10957 43 24219 60 12690 25 30 0.996091 0.994796 0.523969 0.995443 NaN NaN 1.792709 2.345610
INDEL PASS 11000 10957 43 24219 60 12690 25 30 0.996091 0.994796 0.523969 0.995443 NaN NaN 1.792709 2.345610
SNP ALL 71659 71621 38 116803 28 45069 10 10 0.999470 0.999610 0.385855 0.999540 2.310073 1.63293 1.878340 1.630435
SNP PASS 71659 71621 38 116803 28 45069 10 10 0.999470 0.999610 0.385855 0.999540 2.310073 1.63293 1.878340 1.630435
INDEL ALL 11000 10954 46 24235 70 12701 29 36 0.995818 0.993931 0.524077 0.994874 NaN NaN 1.792709 2.351344
INDEL PASS 11000 10954 46 24235 70 12701 29 36 0.995818 0.993931 0.524077 0.994874 NaN NaN 1.792709 2.351344
SNP ALL 71659 71617 42 116988 22 45260 11 7 0.999414 0.999693 0.386877 0.999554 2.310073 1.633809 1.878340 1.626369
SNP PASS 71659 71617 42 116988 22 45260 11 7 0.999414 0.999693 0.386877 0.999554 2.310073 1.633809 1.878340 1.626369
```
7 changes: 2 additions & 5 deletions docs/deeptrio-quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ documentation on how to build.
### Get Docker image

```bash
BIN_VERSION="1.7.0"
BIN_VERSION="1.8.0"

sudo apt -y update
sudo apt-get -y install docker.io
Expand Down Expand Up @@ -174,17 +174,14 @@ HG002.g.vcf.gz
HG002.g.vcf.gz.tbi
HG002.output.vcf.gz
HG002.output.vcf.gz.tbi
HG002.output.visual_report.html
HG003.g.vcf.gz
HG003.g.vcf.gz.tbi
HG003.output.vcf.gz
HG003.output.vcf.gz.tbi
HG003.output.visual_report.html
HG004.g.vcf.gz
HG004.g.vcf.gz.tbi
HG004.output.vcf.gz
HG004.output.vcf.gz.tbi
HG004.output.visual_report.html
intermediate_results_dir
```

Expand Down Expand Up @@ -341,7 +338,7 @@ INDEL PASS 2 2 0 2 0 0
[BAM]: http://genome.sph.umich.edu/wiki/BAM
[BWA]: https://academic.oup.com/bioinformatics/article/25/14/1754/225615/Fast-and-accurate-short-read-alignment-with
[docker build]: https://docs.docker.com/engine/reference/commandline/build/
[Dockerfile]: https://github.com/google/deepvariant/blob/r1.7/Dockerfile.deeptrio
[Dockerfile]: https://github.com/google/deepvariant/blob/r1.8/Dockerfile.deeptrio
[FASTA]: https://en.wikipedia.org/wiki/FASTA_format
[VCF]: https://samtools.github.io/hts-specs/VCFv4.3.pdf
[run_deeptrio.py]: ../scripts/run_deeptrio.py
Expand Down
38 changes: 19 additions & 19 deletions docs/deeptrio-wgs-case-study.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ command.
mkdir -p output
mkdir -p output/intermediate_results_dir

BIN_VERSION="1.7.0"
BIN_VERSION="1.8.0"

sudo docker pull google/deepvariant:deeptrio-"${BIN_VERSION}"

Expand Down Expand Up @@ -211,13 +211,13 @@ As a result we should get the following output:
```bash
Checking: /output/HG002_trio_merged.vcf.gz
Family: [HG003 + HG004] -> [HG002]
95 non-pass records were skipped
Concordance HG002: F:137908/139703 (98.72%) M:137988/139909 (98.63%) F+M:134596/137968 (97.56%)
86 non-pass records were skipped
Concordance HG002: F:138004/139790 (98.72%) M:138049/139959 (98.64%) F+M:134711/138044 (97.59%)
Sample HG002 has less than 99.0 concordance with both parents. Check for incorrect pedigree or sample mislabelling.
0/146013 (0.00%) records did not conform to expected call ploidy
143704/146013 (98.42%) records were variant in at least 1 family member and checked for Mendelian constraints
5066/143704 (3.53%) records had indeterminate consistency status due to incomplete calls
3886/143704 (2.70%) records contained a violation of Mendelian constraints
0/146134 (0.00%) records did not conform to expected call ploidy
143783/146134 (98.39%) records were variant in at least 1 family member and checked for Mendelian constraints
5082/143783 (3.53%) records had indeterminate consistency status due to incomplete calls
3842/143783 (2.67%) records contained a violation of Mendelian constraints
```

### Perform analysis with hap.py against 4.2.1 truth set
Expand Down Expand Up @@ -279,22 +279,22 @@ sudo docker run \
```
Benchmarking Summary for HG002:
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 11256 11208 48 21239 13 9586 7 4 0.995736 0.998884 0.451340 0.997308 NaN NaN 1.561710 2.047281
INDEL PASS 11256 11208 48 21239 13 9586 7 4 0.995736 0.998884 0.451340 0.997308 NaN NaN 1.561710 2.047281
SNP ALL 71333 71087 246 88976 42 17795 5 4 0.996551 0.999410 0.199998 0.997979 2.314904 2.029984 1.715978 1.716560
SNP PASS 71333 71087 246 88976 42 17795 5 4 0.996551 0.999410 0.199998 0.997979 2.314904 2.029984 1.715978 1.716560
INDEL ALL 11256 11208 48 21232 13 9579 7 4 0.995736 0.998884 0.451159 0.997308 NaN NaN 1.561710 2.044750
INDEL PASS 11256 11208 48 21232 13 9579 7 4 0.995736 0.998884 0.451159 0.997308 NaN NaN 1.561710 2.044750
SNP ALL 71333 71088 245 89034 41 17853 4 3 0.996565 0.999424 0.200519 0.997993 2.314904 2.026055 1.715978 1.717178
SNP PASS 71333 71088 245 89034 41 17853 4 3 0.996565 0.999424 0.200519 0.997993 2.314904 2.026055 1.715978 1.717178
Benchmarking Summary for HG003:
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 10628 10584 44 21028 20 9969 13 6 0.995860 0.998192 0.474082 0.997024 NaN NaN 1.748961 2.197401
INDEL PASS 10628 10584 44 21028 20 9969 13 6 0.995860 0.998192 0.474082 0.997024 NaN NaN 1.748961 2.197401
SNP ALL 70166 69975 191 85299 55 15231 15 4 0.997278 0.999215 0.178560 0.998246 2.296566 2.064978 1.883951 1.845348
SNP PASS 70166 69975 191 85299 55 15231 15 4 0.997278 0.999215 0.178560 0.998246 2.296566 2.064978 1.883951 1.845348
INDEL ALL 10628 10578 50 21055 24 9997 17 6 0.995295 0.997830 0.474804 0.996561 NaN NaN 1.748961 2.209131
INDEL PASS 10628 10578 50 21055 24 9997 17 6 0.995295 0.997830 0.474804 0.996561 NaN NaN 1.748961 2.209131
SNP ALL 70166 69977 189 85399 64 15325 17 8 0.997306 0.999087 0.179452 0.998196 2.296566 2.061752 1.883951 1.846595
SNP PASS 70166 69977 189 85399 64 15325 17 8 0.997306 0.999087 0.179452 0.998196 2.296566 2.061752 1.883951 1.846595
Benchmarking Summary for HG004:
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 11000 10945 55 21426 27 9969 22 4 0.995000 0.997643 0.465276 0.996320 NaN NaN 1.792709 2.279678
INDEL PASS 11000 10945 55 21426 27 9969 22 4 0.995000 0.997643 0.465276 0.996320 NaN NaN 1.792709 2.279678
SNP ALL 71659 71446 213 86406 52 14858 9 4 0.997028 0.999273 0.171956 0.998149 2.310073 2.064306 1.878340 1.735500
SNP PASS 71659 71446 213 86406 52 14858 9 4 0.997028 0.999273 0.171956 0.998149 2.310073 2.064306 1.878340 1.735500
INDEL ALL 11000 10949 51 21433 23 9975 16 5 0.995364 0.997993 0.465404 0.996676 NaN NaN 1.792709 2.280107
INDEL PASS 11000 10949 51 21433 23 9975 16 5 0.995364 0.997993 0.465404 0.996676 NaN NaN 1.792709 2.280107
SNP ALL 71659 71445 214 86523 48 14980 8 3 0.997014 0.999329 0.173133 0.998170 2.310073 2.064759 1.878340 1.737322
SNP PASS 71659 71445 214 86523 48 14980 8 3 0.997014 0.999329 0.173133 0.998170 2.310073 2.064759 1.878340 1.737322
```

0 comments on commit 87764b2

Please sign in to comment.