Skip to content

Commit

Permalink
Merge branch 'ar/prepare-0-4-2-release' into 'master'
Browse files Browse the repository at this point in the history
Prepare 0.4.2 release

See merge request machine-learning/modkit!238
  • Loading branch information
ArtRand committed Dec 21, 2024
2 parents d62c99b + b6cec1a commit 10d99bc
Show file tree
Hide file tree
Showing 40 changed files with 689 additions and 103 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [v0.4.2]
### Adds
- [entropy] Entropy can now be calculated with multiple motifs and multiple modified primary bases.
- [adjust-mods, call-mods] Retain or remove base modification calls based on whether they match a sequence motif in the basecall sequence.
- [bedmethyl] Add command to merge bedMethyl files.
- [dmr] Add strand to DMR output.


## [v0.4.1]
### Adds
- [docs] Fix documentation links
Expand Down
104 changes: 100 additions & 4 deletions book/src/advanced_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ Commands:
localize Investigate patterns of base modifications, by aggregating
pileup counts "localized" around genomic features of interest
stats Calculate base modification levels over entire regions
bedmethyl Utilities to work with bedMethyl files
help Print this message or the help of the given subcommand(s)
Options:
Expand Down Expand Up @@ -337,9 +338,6 @@ Arguments:
one of `-` or `stdin` to specify a stream from standard output
Options:
--log-filepath <LOG_FILEPATH>
Output debug logs to file at this path
--ignore <IGNORE>
Modified base code to ignore/remove, see
https://samtools.github.io/hts-specs/SAMtags.pdf for details on the
Expand Down Expand Up @@ -438,11 +436,31 @@ Options:
when estimating the filter threshold (i.e. ignore soft-clipped, and
inserted bases)
--motif <MOTIF> <MOTIF>
Filter out any base modification call that isn't part of a basecall
sequence motif. This argument can be passed multiple times. Format is
<motif_sequence> <offset>. For example the argument to match CpG
dinucleotides is `--motif CG 0`, or to match CG[5mC]G the argument
would be `--motif CGCG 2`. Single bases can be used as motifs to keep
only base modification calls for a specific primary base, for example
`--motif C 0`
--cpg
Shorthand for --motif CG 0
--discard-motifs
Discard base modification calls that match the provided motifs
(instead of keeping them)
--suppress-progress
Hide the progress bar
-h, --help
Print help (see a summary with '-h')
Logging:
--log-filepath <LOG_FILEPATH>
Output debug logs to file at this path
```

## update-tags
Expand Down Expand Up @@ -851,6 +869,20 @@ Options:
using this flag will keep only base modification calls in the first 4
and last 8 bases
--motif <MOTIF> <MOTIF>
Filter out any base modification call that isn't part of a basecall
sequence motif This argument can be passed multiple times. Format is
<motif_sequence> <offset>. For example the argument to match CpG
dinucleotides is `--motif CG 0`, or to match CG[5mC]G the argument
would be `--motif CGCG 2`
--cpg
Shorthand for --motif CG 0
--discard-motifs
Discard base modification calls that match the provided motifs
(instead of keeping them)
--output-sam
Output SAM format instead of BAM
Expand Down Expand Up @@ -1263,7 +1295,10 @@ Options:
Respect soft masking in the reference FASTA
--motif <MOTIF> <MOTIF>
Motif to use for entropy calculation, default will be CpG
Motif to use for entropy calculation, multiple motifs can be used by
repeating this option. When multiple motifs are used that specify
different modified primary bases, all modification possibilities will
be used in the calculation
--cpg
Use CpG motifs. Short hand for --motif CG 0 --combine-strands
Expand Down Expand Up @@ -2372,3 +2407,64 @@ Options:
-h, --help
Print help
```

## bedmethyl merge
```text
Perform an outer join on two or more bedMethyl files, summing their counts for
records that overlap
Usage: modkit bedmethyl merge [OPTIONS] --out-bed <OUT_BED> --genome-sizes <GENOME_SIZES> [IN_BEDMETHYL] [IN_BEDMETHYL]...
Arguments:
[IN_BEDMETHYL] [IN_BEDMETHYL]...
Input bedMethyl table(s). Should be bgzip-compressed and have an
associated Tabix index. The tabix index will be assumed to be
$this_file.tbi
Options:
-o, --out-bed <OUT_BED>
Specify the output file to write the results table
-g, --genome-sizes <GENOME_SIZES>
TSV of genome sizes, should be <chrom>\t<size_in_bp>
--force
Force overwrite the output file
--with-header
Output a header with the bedMethyl
--mixed-delim
Output bedMethyl where the delimiter of columns past column 10 are
space-delimited instead of tab-delimited. This option can be useful
for some browsers and parsers that don't expect the extra columns of
the bedMethyl format
--chunk-size <CHUNK_SIZE>
Chunk size for how many start..end regions for each chromosome to
read. Larger values will lead to faster merging at the expense of
memory usage, while smaller values will be slower with lower memory
usage. This option will only impact large bedmethyl files
-i, --interval-size <INTERVAL_SIZE>
Interval chunk size in base pairs to process concurrently. Smaller
interval chunk sizes will use less memory but incur more overhead
[default: 100000]
--log-filepath <LOG_FILEPATH>
Specify a file to write debug logs to
-t, --threads <THREADS>
Number of threads to use
[default: 4]
--io-threads <IO_THREADS>
Number of tabix/bgzf threads to use
[default: 2]
-h, --help
Print help (see a summary with '-h')
```
2 changes: 1 addition & 1 deletion docs/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@

<nav id="sidebar" class="sidebar" aria-label="Table of contents">
<div class="sidebar-scrollbox">
<ol class="chapter"><li class="chapter-item expanded "><a href="quick_start.html"><strong aria-hidden="true">1.</strong> Quick Start guides</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="intro_bedmethyl.html"><strong aria-hidden="true">1.1.</strong> Constructing bedMethyl tables</a></li><li class="chapter-item expanded "><a href="intro_pileup_hemi.html"><strong aria-hidden="true">1.2.</strong> Make hemi-methylation bedMethyl tables</a></li><li class="chapter-item expanded "><a href="intro_adjust.html"><strong aria-hidden="true">1.3.</strong> Updating and adjusting MM tags</a></li><li class="chapter-item expanded "><a href="intro_sample_probs.html"><strong aria-hidden="true">1.4.</strong> Inspecting base modification probabilities</a></li><li class="chapter-item expanded "><a href="intro_summary.html"><strong aria-hidden="true">1.5.</strong> Summarizing a modBAM</a></li><li class="chapter-item expanded "><a href="intro_stats.html"><strong aria-hidden="true">1.6.</strong> Calculating modification statistics in regions</a></li><li class="chapter-item expanded "><a href="intro_call_mods.html"><strong aria-hidden="true">1.7.</strong> Calling mods in a modBAM</a></li><li class="chapter-item expanded "><a href="intro_edge_filter.html"><strong aria-hidden="true">1.8.</strong> Removing modification calls at the ends of reads</a></li><li class="chapter-item expanded "><a href="intro_repair.html"><strong aria-hidden="true">1.9.</strong> Repair MM/ML tags on trimmed reads</a></li><li class="chapter-item expanded "><a href="intro_motif.html"><strong aria-hidden="true">1.10.</strong> Working with sequence motifs</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="intro_motif_bed.html"><strong aria-hidden="true">1.10.1.</strong> Making a motif BED file</a></li><li class="chapter-item expanded "><a href="intro_find_motifs.html"><strong aria-hidden="true">1.10.2.</strong> Find highly modified motif sequences</a></li><li class="chapter-item expanded "><a href="evaluate_motif.html"><strong aria-hidden="true">1.10.3.</strong> Evaluate and refine a table of known motifs</a></li></ol></li><li class="chapter-item expanded "><a href="intro_extract.html"><strong aria-hidden="true">1.11.</strong> Extracting read information to a table</a></li><li class="chapter-item expanded "><a href="intro_localize.html"><strong aria-hidden="true">1.12.</strong> Investigating patterns with localise</a></li><li class="chapter-item expanded "><a href="intro_dmr.html"><strong aria-hidden="true">1.13.</strong> Perform differential methylation scoring</a></li><li class="chapter-item expanded "><a href="intro_validate.html"><strong aria-hidden="true">1.14.</strong> Validate ground truth results</a></li><li class="chapter-item expanded "><a href="intro_entropy.html"><strong aria-hidden="true">1.15.</strong> Calculating methylation entropy</a></li><li class="chapter-item expanded "><a href="intro_include_bed.html"><strong aria-hidden="true">1.16.</strong> Narrow output to specific positions</a></li></ol></li><li class="chapter-item expanded "><a href="advanced_usage.html"><strong aria-hidden="true">2.</strong> Extended subcommand help</a></li><li class="chapter-item expanded "><a href="troubleshooting.html"><strong aria-hidden="true">3.</strong> Troubleshooting</a></li><li class="chapter-item expanded "><a href="faq.html"><strong aria-hidden="true">4.</strong> Frequently asked questions</a></li><li class="chapter-item expanded "><a href="limitations.html"><strong aria-hidden="true">5.</strong> Current limitations</a></li><li class="chapter-item expanded "><a href="perf_considerations.html"><strong aria-hidden="true">6.</strong> Performance considerations</a></li><li class="chapter-item expanded "><a href="algo_details.html"><strong aria-hidden="true">7.</strong> Algorithm details</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="filtering.html"><strong aria-hidden="true">7.1.</strong> Pass/fail base modification calls</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="filtering_details.html"><strong aria-hidden="true">7.1.1.</strong> Threshold examples</a></li><li class="chapter-item expanded "><a href="filtering_numeric_details.html"><strong aria-hidden="true">7.1.2.</strong> Numeric details</a></li></ol></li><li class="chapter-item expanded "><a href="dmr_scoring_details.html"><strong aria-hidden="true">7.2.</strong> DMR model and scoring details</a></li><li class="chapter-item expanded "><a href="collapse.html"><strong aria-hidden="true">7.3.</strong> Ignoring and combining calls</a></li></ol></li></ol>
<ol class="chapter"><li class="chapter-item expanded "><a href="quick_start.html"><strong aria-hidden="true">1.</strong> Quick Start guides</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="intro_pileup.html"><strong aria-hidden="true">1.1.</strong> Constructing bedMethyl tables</a></li><li class="chapter-item expanded "><a href="intro_pileup_hemi.html"><strong aria-hidden="true">1.2.</strong> Make hemi-methylation bedMethyl tables</a></li><li class="chapter-item expanded "><a href="intro_adjust.html"><strong aria-hidden="true">1.3.</strong> Updating and adjusting MM tags</a></li><li class="chapter-item expanded "><a href="intro_sample_probs.html"><strong aria-hidden="true">1.4.</strong> Inspecting base modification probabilities</a></li><li class="chapter-item expanded "><a href="intro_summary.html"><strong aria-hidden="true">1.5.</strong> Summarizing a modBAM</a></li><li class="chapter-item expanded "><a href="intro_stats.html"><strong aria-hidden="true">1.6.</strong> Calculating modification statistics in regions</a></li><li class="chapter-item expanded "><a href="intro_call_mods.html"><strong aria-hidden="true">1.7.</strong> Calling mods in a modBAM</a></li><li class="chapter-item expanded "><a href="intro_edge_filter.html"><strong aria-hidden="true">1.8.</strong> Removing modification calls at the ends of reads</a></li><li class="chapter-item expanded "><a href="intro_repair.html"><strong aria-hidden="true">1.9.</strong> Repair MM/ML tags on trimmed reads</a></li><li class="chapter-item expanded "><a href="intro_motif.html"><strong aria-hidden="true">1.10.</strong> Working with sequence motifs</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="intro_motif_bed.html"><strong aria-hidden="true">1.10.1.</strong> Making a motif BED file</a></li><li class="chapter-item expanded "><a href="intro_find_motifs.html"><strong aria-hidden="true">1.10.2.</strong> Find highly modified motif sequences</a></li><li class="chapter-item expanded "><a href="evaluate_motif.html"><strong aria-hidden="true">1.10.3.</strong> Evaluate and refine a table of known motifs</a></li></ol></li><li class="chapter-item expanded "><a href="intro_extract.html"><strong aria-hidden="true">1.11.</strong> Extracting read information to a table</a></li><li class="chapter-item expanded "><a href="intro_localize.html"><strong aria-hidden="true">1.12.</strong> Investigating patterns with localise</a></li><li class="chapter-item expanded "><a href="intro_dmr.html"><strong aria-hidden="true">1.13.</strong> Perform differential methylation scoring</a></li><li class="chapter-item expanded "><a href="intro_validate.html"><strong aria-hidden="true">1.14.</strong> Validate ground truth results</a></li><li class="chapter-item expanded "><a href="intro_entropy.html"><strong aria-hidden="true">1.15.</strong> Calculating methylation entropy</a></li><li class="chapter-item expanded "><a href="intro_include_bed.html"><strong aria-hidden="true">1.16.</strong> Narrow output to specific positions</a></li><li class="chapter-item expanded "><a href="intro_bedmethyl_merge.html"><strong aria-hidden="true">1.17.</strong> Merge multiple bedMethyl files</a></li></ol></li><li class="chapter-item expanded "><a href="advanced_usage.html"><strong aria-hidden="true">2.</strong> Extended subcommand help</a></li><li class="chapter-item expanded "><a href="troubleshooting.html"><strong aria-hidden="true">3.</strong> Troubleshooting</a></li><li class="chapter-item expanded "><a href="faq.html"><strong aria-hidden="true">4.</strong> Frequently asked questions</a></li><li class="chapter-item expanded "><a href="limitations.html"><strong aria-hidden="true">5.</strong> Current limitations</a></li><li class="chapter-item expanded "><a href="perf_considerations.html"><strong aria-hidden="true">6.</strong> Performance considerations</a></li><li class="chapter-item expanded "><a href="algo_details.html"><strong aria-hidden="true">7.</strong> Algorithm details</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="filtering.html"><strong aria-hidden="true">7.1.</strong> Pass/fail base modification calls</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="filtering_details.html"><strong aria-hidden="true">7.1.1.</strong> Threshold examples</a></li><li class="chapter-item expanded "><a href="filtering_numeric_details.html"><strong aria-hidden="true">7.1.2.</strong> Numeric details</a></li></ol></li><li class="chapter-item expanded "><a href="dmr_scoring_details.html"><strong aria-hidden="true">7.2.</strong> DMR model and scoring details</a></li><li class="chapter-item expanded "><a href="collapse.html"><strong aria-hidden="true">7.3.</strong> Ignoring and combining calls</a></li></ol></li></ol>
</div>
<div id="sidebar-resize-handle" class="sidebar-resize-handle">
<div class="sidebar-resize-indicator"></div>
Expand Down
Loading

0 comments on commit 10d99bc

Please sign in to comment.