
谢大飞
V1
2023/05/02阅读:18主题:默认主题
Piano_GSA(gene set analysis)分析
GSA(gene set analysis)分析
对每个时期之间比较的全部的差异基因做了GSA分析,然后从结果中看到一些与花分化相关的GO功能
取出来之后使用goatools进行分析,看一下层级关系图,以防漏掉一些关联的基因
(注:因为GSA分析参考的是导师给的示例代码,所以注释信息不做改动,保留原有格式)
安装加载R包
小tips:如果pinao包不太好装上,可以多试几次,如果装好了没办法用library加载,就直接在Packages中勾选出piano
BiocManager::install("piano")
BiocManager::install("snow")
BiocManager::install("snowfall")
BiocManager::install("edgeR")
library(pinao)
library(snow)
library(snowfall)
library(edgeR)

导入需要的数据类型
-
与基因对应的GO注释信息表

-
差异基因列表,只保留基因ID、logFC以及PValue

# load gene set collection (GSC) into R session
# only GO Biological terms that excluded "biological_process"
gsc_pecan_bio <- read.table("pecan_GO_BP.txt", sep="\t", header=T)
# load GSC into piano format
gsc_pecan_bio <- loadGSC(gsc_pecan_bio)
# load gene level statistic (GLS) into R session
FMD_St1.gls <- read.table("FMD_St1_geneLevelStat.txt", sep="\t", header=T)

GSA分析
首先将差异基因数据中logFC以及PValue分别取出保存为需要的格式,然后进行GSA分析
# assign the piano objects
logfc_FMD1bio <- FMD_St1.gls$logFC
pvalue_FMD1bio <- FMD_St1.gls$PValue
names(pvalue_FMD1bio) <- names(logfc_FMD1bio) <- FMD_St1.gls$RefTag
# run the Gene Set Analysis (GSA)
gsaRes_FMD1bio <- runGSA(pvalue_FMD1bio, logfc_FMD1bio,
geneSetStat="mean",
adjMethod="fdr",
gsc=gsc_pecan_bio,
gsSizeLim=c(1,Inf),
ncpus=2,
verbose=TRUE)
dev.new()
nw_FMD1bio <- networkPlot(gsaRes_FMD1bio,
class="distinct",
direction="both",
adjusted=FALSE,
significance=0.005,
geneSets=NULL,
overlap=1,
lay=4,
label="names",
cexLabel=1,
ncharLabel=50,
cexLegend=1,
nodeSize=c(10,50),
edgeWidth=c(2,20),
edgeColor=NULL,
# scoreColors=NULL)
scoreColors=c("red3", "orangered", "blue", "lightblue"))
nw_FMD1bio$geneSets
write.table(nw_FMD1bio$geneSets, file="FMD1_GSAresult_b.txt", sep="\t", quote=F)

绘制GSA分析图,并且导出图片保存为tiff格式,方便后续使用AI编辑图片
# export GSA network plot to Tiff file
tiff(filename="FMD1_GSAresult_b.tiff",
width=7, height=6, units="in", pointsize=10,
compression="lzw", bg="white", res=600)
# plot the network of gene set analysis
networkPlot(gsaRes_FMD1bio,
class="distinct",
direction="both",
adjusted=FALSE,
significance=0.005,
geneSets=NULL,
overlap=1,
lay=4,
# label="names",
label="numbers",
# cexLabel=1,
cexLabel=1.4,
ncharLabel=50,
cexLegend=1,
nodeSize=c(10,50),
edgeWidth=c(2,20),
edgeColor=NULL,
# scoreColors=NULL)
scoreColors=c("red3", "orangered", "blue", "lightblue"))
dev.off()
save.image()

作者介绍

谢大飞
V1