Skip to content

MsAlEhR/KmerTokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

KmerTokenizer

KmerTokenizer is a Python package for k-mer tokenization.

Installation

You can install the package via pip:

pip install git+https://github.com/MsAlEhR/KmerTokenizer.git

Usage

from KmerTokenizer import KmerTokenizer
import torch

seq_list = ["ATTTTTTTTTTTCCCCCCCCCCCGGGGGGGGATCGATGC"]

# Test loading the tokenizer
tokenizer = KmerTokenizer(kmerlen=6, overlapping=True, maxlen=4096)

# Tokenize the sequence
tokenized_output = tokenizer.kmer_tokenize(seq_list)

# Convert the tokenized output to a tensor
inputs = torch.tensor(tokenized_output)
print(inputs)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages