Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test on rice genome scaffolding #5

Open
Zhiliang-Zhang opened this issue Dec 7, 2021 · 5 comments
Open

Test on rice genome scaffolding #5

Zhiliang-Zhang opened this issue Dec 7, 2021 · 5 comments

Comments

@Zhiliang-Zhang
Copy link

No description provided.

@Zhiliang-Zhang
Copy link
Author

Hi,

Thanks for your tool!

I have a question on the chromosome number as following. In fact, rice genome has 12 chromosomes, but pin_hic only output 8 large scaffolds (> 1Mb) with "pin_hic_it -i 3 -x contig.fa.fai -r contig.fa -O pin_hic SRR7700701.bam". How can I improve it?

grep '>' scaffolds_final.fa | awk '{if($2 >1000000) print}'

u000000275 62,135,349
u000000282 24,833,875
u000000283 158,939,322
u000000285 30,366,990
u000000298 25,607,870
c000000363 31,856,834
u000000364 34,524,769
u000000368 26,884,419

Rice_Chr_number | Length
1 | 45,032,876
2 | 37,283,135
3 | 39,945,903
4 | 37,431,192
5 | 31,410,158
6 | 31,926,638
7 | 30,886,039
8 | 30,753,037
9 | 25,132,334
10 | 25,658,200
11 | 35,297,587
12 | 26,949,631

Could you give me some advice? Thanks!

Zhiliang

@Zhiliang-Zhang Zhiliang-Zhang changed the title Test on Test on rice genome scaffolding Dec 7, 2021
@Zhiliang-Zhang
Copy link
Author

0000000

@dfguan
Copy link
Owner

dfguan commented Dec 7, 2021

Hi Zhiliang, thanks for trying pin_hic. I guess there are some chromsomes misjoined by pin_hic. There are several ways to correct it, the easiest way would be, say if you have a reference genome, you can use minimap2 compare them and correct the scaffolds based on alignments. My version of dot (a visulizaiton tool) would be helpful for doing this. Another way, which is more technical, would be use bam2hic/bam2cool under pin_hic/util directory, use scaffs.bk.sat and the bam file as input, generate a visualizable file for HIGLASS or Juicebox, and manually break the misjoins. Any problems, please feel free to contact. Best, Dengfeng.

@dfguan
Copy link
Owner

dfguan commented Dec 7, 2021

Sorry, I missed your circos plot. Yes, as expected, it does contain several misjoins and large inversions? I think. Do you think it is real? Is this sequencing data and assembly public avaliable? so I can try myself and update the algorithm maybe. Dengfeng.

@Zhiliang-Zhang
Copy link
Author

Zhiliang-Zhang commented Dec 8, 2021

Hi Dengfeng,
Sorry for the late reply!

I used the gap-free genome and its HiFi data from a recent paper. I tried it and wanted to build a robust pipeline for rice genome assembly using HiFi (canu/hifiasm) and HiC (pin_hic) data. If you could help me and solve the problem, I'd appreciate it. Please check your email ([email protected]) and I send you a host of relevant data (genome/HiC link & contig…).

Zhiliang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants