c
chang
V1
2022/12/31阅读:92主题:默认主题
比对
比对
首先是人类参考基因组和注释文件 在hisat2官网下载 https://daehwankimlab.github.io/hisat2/download/#h-sapiens
这里使用Ch38中的 genome_tran
下载后用FTP拖进服务器
使用hisat2进行比对
for i in 10 16 21
do
hisat2 --dta -t -x ./grch38/genome -1 ./RNA/fastp/SRR116186${i}_1.fq.gz -2 RNA/fastp/SRR116186${i}_2.fq.gz -S ./RNA/hisat2/SRR116186${i}.sam &
done
#结果如下
(rna) [tree116@st8-Horsea-12U 17:13:52 ~/RNA/h2test ]$ ls -lh
total 1.2G
-rw-rw-r-- 1 tree116 tree116 90M 12月 29 17:11 SRR11618610.sam
-rw-rw-r-- 1 tree116 tree116 481M 12月 29 17:11 SRR11618616.sam
-rw-rw-r-- 1 tree116 tree116 616M 12月 29 17:11 SRR11618621.sam
结果如下
Time loading forward index: 00:00:02
Time loading forward index: 00:00:02
Time loading forward index: 00:00:02
Time loading reference: 00:00:01
Time loading reference: 00:00:01
Time loading reference: 00:00:01
Multiseed full-index search: 00:00:09
131840 reads; of these:
131840 (100.00%) were paired; of these:
31540 (23.92%) aligned concordantly 0 times
97199 (73.72%) aligned concordantly exactly 1 time
3101 (2.35%) aligned concordantly >1 times
----
31540 pairs aligned concordantly 0 times; of these:
2189 (6.94%) aligned discordantly 1 time
----
29351 pairs aligned 0 times concordantly or discordantly; of these:
58702 mates make up the pairs; of these:
35646 (60.72%) aligned 0 times
20843 (35.51%) aligned exactly 1 time
2213 (3.77%) aligned >1 times
86.48% overall alignment rate
Time searching: 00:00:11
Overall time: 00:00:13
Multiseed full-index search: 00:00:44
640856 reads; of these:
640856 (100.00%) were paired; of these:
254355 (39.69%) aligned concordantly 0 times
344661 (53.78%) aligned concordantly exactly 1 time
41840 (6.53%) aligned concordantly >1 times
----
254355 pairs aligned concordantly 0 times; of these:
3482 (1.37%) aligned discordantly 1 time
----
250873 pairs aligned 0 times concordantly or discordantly; of these:
501746 mates make up the pairs; of these:
432723 (86.24%) aligned 0 times
52676 (10.50%) aligned exactly 1 time
16347 (3.26%) aligned >1 times
66.24% overall alignment rate
Time searching: 00:00:45
Overall time: 00:00:47
Multiseed full-index search: 00:00:53
824867 reads; of these:
824867 (100.00%) were paired; of these:
191323 (23.19%) aligned concordantly 0 times
587332 (71.20%) aligned concordantly exactly 1 time
46212 (5.60%) aligned concordantly >1 times
----
191323 pairs aligned concordantly 0 times; of these:
4757 (2.49%) aligned discordantly 1 time
----
186566 pairs aligned 0 times concordantly or discordantly; of these:
373132 mates make up the pairs; of these:
293021 (78.53%) aligned 0 times
66181 (17.74%) aligned exactly 1 time
13930 (3.73%) aligned >1 times
82.24% overall alignment rate
Time searching: 00:00:54
Overall time: 00:00:56
sam 转化成bam
#转换
for i in 10 16 21
do
samtools view SRR116186${i}.sam -b > SRR116186${i}.bam #转换
samtools sort SRR116186${i}.bam -o SRR116186${i}_sorted.bam #排序
samtools index SRR116186${i}_sorted.bam; #索引
done
查看单个结果比对后的总体质量 给出BAM文件的比对结果
samtools flagstat SRR11618610_sorted.bam
(rna) [tree116@st8-Horsea-12U 21:35:48 ~/RNA/fastp ]$ samtools flagstat SRR11618610_sorted.bam
288817 + 0 in total (QC-passed reads + QC-failed reads)
25137 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
253171 + 0 mapped (87.66% : N/A)
263680 + 0 paired in sequencing
131840 + 0 read1
131840 + 0 read2
200600 + 0 properly paired (76.08% : N/A)
207422 + 0 with itself and mate mapped
20612 + 0 singletons (7.82% : N/A)
2340 + 0 with mate mapped to a different chr
1979 + 0 with mate mapped to a different chr (mapQ>=5)
(rna) [tree116@st8-Horsea-12U 21:38:51 ~/RNA/fastp ]$ samtools flagstat SRR11618616_sorted.bam
1610889 + 0 in total (QC-passed reads + QC-failed reads)
329177 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
1178166 + 0 mapped (73.14% : N/A)
1281712 + 0 paired in sequencing
640856 + 0 read1
640856 + 0 read2
773002 + 0 properly paired (60.31% : N/A)
784814 + 0 with itself and mate mapped
64175 + 0 singletons (5.01% : N/A)
6486 + 0 with mate mapped to a different chr
5589 + 0 with mate mapped to a different chr (mapQ>=5)
(rna) [tree116@st8-Horsea-12U 21:43:24 ~/RNA/fastp ]$ samtools flagstat SRR11618621_sorted.bam
1983306 + 0 in total (QC-passed reads + QC-failed reads)
333572 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
1690285 + 0 mapped (85.23% : N/A)
1649734 + 0 paired in sequencing
824867 + 0 read1
824867 + 0 read2
1267088 + 0 properly paired (76.81% : N/A)
1285356 + 0 with itself and mate mapped
71357 + 0 singletons (4.33% : N/A)
6374 + 0 with mate mapped to a different chr
4009 + 0 with mate mapped to a different chr (mapQ>=5)
(rna) [tree116@st8-Horsea-12U 21:43:32 ~/RNA/fastp ]$
samtools sort -n test.bam -o sort_left.bam
samtools view sort_left.bam | head

samtools view SRR11618621_sorted.bam 1:1234-123456 | head

对BAM文件进行统计分析
①RSeQC 三个文件的统计分析如下图:
pip install RSeQC
bam_stat.py -i SRR11618610_sorted.bam



②使用bed文件作为参考 查看基因组覆盖率
需要提供bed文件,可以直接RSeQC的网站下载(本文统一使用hg38), https://sourceforge.net/projects/rseqc/postdownload或者可以用gtf转换 这里直接使用RSeQC网站下载: hg38_GENCODE_V42_Comprehensive.bed.gz
Tag_counts全部为0,暂时未找到问题处在哪里
作者介绍
c
chang
V1