Skip to content

Commit

Permalink
Update to use koza split command to produce separate node files for e…
Browse files Browse the repository at this point in the history
…ach included taxon
  • Loading branch information
kevinschaper committed Oct 5, 2024
1 parent 1a19b7e commit 77547dd
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 7 deletions.
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ download:
.PHONY: run
run: download
$(RUN) ingest transform
$(RUN) koza split output/ncbi_gene_nodes.tsv in_taxon --remove-prefixes --output-dir output/by_taxon
$(RUN) python scripts/generate-report.py


Expand Down
10 changes: 5 additions & 5 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ readme = "README.md"
[tool.poetry.dependencies]
python = "^3.10"
importlib-metadata = ">=4.8.0"
koza = ">=0.6.0"
koza = "0.7.1"
kghub-downloader = ">=0.3.8"
kgx = ">=2.4.0"
biolink-model = "^4.2.0"
Expand Down
5 changes: 4 additions & 1 deletion src/ncbi_gene/transform.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,13 @@ columns:

header: 0

# This likely will need adjusting based on the included taxa below
min_node_count: 2500000

filters:
- inclusion: "include"
column: "tax_id"
filter_code: "in"
filter_code: "in_exact"
value:
- "10036"
- "10042"
Expand Down

0 comments on commit 77547dd

Please sign in to comment.