O

Odin

V1

2022/12/31阅读:61主题:凝夜紫

Python也能画漂亮的complex heatmap?

微信公众号:Computational Epigenetics

关注生物信息学和计算表观遗传学。问题或建议,请公众号留言。

对于经常用R语言来画图的科研工作者来说,应该对ComplexHeatmap很熟悉了吧。这个包画的热图,既专业又漂亮。

可惜的是,在python中,一直没能出现一个可以画出好看complex heatmap的包,由于我们在用python做机器学习或者处理大数据的时候,也需要画热图,而在python和R中来回切换,也比较麻烦而且没有效率。

今天,给大家介绍一款可以在python中画出类似于R中ComplexHeatmap效果的包:PyComplexHeatmap。直接看下面的代码和图吧:

1. Import packages

import os,sys
import PyComplexHeatmap
from PyComplexHeatmap import *
%matplotlib inline
import matplotlib.pylab as plt
plt.rcParams['figure.dpi'] = 120
plt.rcParams['savefig.dpi']=300

2. A quick example

#Generate example dataset
df = pd.DataFrame(['AAAA1'] * 5 + ['BBBBB2'] * 5, columns=['AB'])
df['CD'] = ['C'] * 3 + ['D'] * 3 + ['G'] * 4
df['EF'] = ['E'] * 6 + ['F'] * 2 + ['H'] * 2
df['F'] = np.random.normal(0110)
df.index = ['sample' + str(i) for i in range(1, df.shape[0] + 1)]
df_box = pd.DataFrame(np.random.randn(104), columns=['Gene' + str(i) for i in range(15)])
df_box.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar = pd.DataFrame(np.random.uniform(010, (102)), columns=['TMB1''TMB2'])
df_bar.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_scatter = pd.DataFrame(np.random.uniform(01010), columns=['Scatter'])
df_scatter.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_heatmap = pd.DataFrame(np.random.randn(5010), columns=['sample' + str(i) for i in range(111)])
df_heatmap.index = ["Fea" + str(i) for i in range(1, df_heatmap.shape[0] + 1)]
df_heatmap.iloc[12] = np.nan

plt.figure(figsize=(612))
row_ha = HeatmapAnnotation(label=anno_label(df.AB, merge=True),
                           AB=anno_simple(df.AB,add_text=True),axis=1,
                           CD=anno_simple(df.CD, colors={'C''red''D''yellow''G''green'},add_text=True),
                           Exp=anno_boxplot(df_box, cmap='turbo'),
                           Scatter=anno_scatterplot(df_scatter), TMB_bar=anno_barplot(df_bar),
                           )
cm = ClusterMapPlotter(data=df_heatmap, top_annotation=row_ha, col_split=2, row_split=3, col_split_gap=0.5,
                     row_split_gap=1,col_dendrogram=False,plot=True,
                     tree_kws={'col_cmap''Set1''row_cmap''Dark2'})
plt.savefig("example1_heatmap.pdf", bbox_inches='tight')
plt.show()

3. Plotting annotations

3.1 Only plot the row/column annotation

plt.figure(figsize=(64))
row_ha = HeatmapAnnotation(label=anno_label(df.AB, merge=True),
                            AB=anno_simple(df.AB,add_text=True,legend=True), axis=1,
                            CD=anno_simple(df.CD, colors={'C''red''D''gray''G''yellow'},
                                           add_text=True,legend=True),
                            Exp=anno_boxplot(df_box, cmap='turbo',legend=True),
                            Scatter=anno_scatterplot(df_scatter), TMB_bar=anno_barplot(df_bar,legend=True),
                           plot=True,legend=True,legend_gap=5
                            )
plt.savefig("col_annotation.pdf", bbox_inches='tight')
plt.show()

anno_label:

anno_label is used to add a text label to the annotatin, parameter merge control whether to merge the adjacent labels with the same text, if merge != True, then, texts would be draw for each columns.

anno_simple:

anno_simple is to draw simple annotatin, cmap for anno_simple can be either categorical (Set1, Dark2, tab10 et.al) or continnuous (jet, turbo, parula). Parameter add_text control whether to add text on the annotation, if the color and fontsize in text_kws was not specified, the color and fontsize would be determined automatically, for example, if the background color is deep, then the text color would be white, otherwise the text color would be black. The text color can be changed with parameter text_kws={'color':your_color},for example:

plt.figure(figsize=(54))
row_ha = HeatmapAnnotation(label=anno_label(df.AB, merge=True),
                            AB=anno_simple(df.AB,add_text=True,legend=True,text_kws={'color':'gold'}), axis=1,
                            CD=anno_simple(df.CD,add_text=True,legend=True,text_kws={'color':'purple'}),
                            Exp=anno_boxplot(df_box, cmap='turbo',legend=True),
                            Scatter=anno_scatterplot(df_scatter), TMB_bar=anno_barplot(df_bar,legend=True),
                           plot=True,legend=True,legend_gap=5)
plt.show()

To add a annotation quickly, you just need a dataframe

if df was given, all columns in dataframe df would be treated as a separately anno_simple annotation.

plt.figure(figsize=(33))
row_ha = HeatmapAnnotation(df=df,plot=True,legend=True)
plt.show()

3.2 Plot the figure and legend separately

Sometimes, one only want to plot the figure without legend, or plot the legend in a separated pdf, you can do that by giving the parameter plot_legend=False, and plot the legend in another pdf with row_ha.plot_legends

plt.figure(figsize=(64))
row_ha = HeatmapAnnotation(label=anno_label(df.AB, merge=True),
                            AB=anno_simple(df.AB,add_text=True,legend=True), axis=1,
                            CD=anno_simple(df.CD,add_text=True,legend=True),
                            Exp=anno_boxplot(df_box, cmap='turbo',legend=True),
                            Scatter=anno_scatterplot(df_scatter), TMB_bar=anno_barplot(df_bar,legend=True),
                           plot=True,legend=True,plot_legend=False,
                           legend_gap=5
                            )
plt.savefig("col_annotation.pdf", bbox_inches='tight')
plt.show()

plt.figure()
row_ha.plot_legends()
plt.savefig("legend.pdf",bbox_inches='tight')
plt.show()
No ax was provided, using plt.gca()

4. Plotting clustermap and annotation

Here we provided a example dataset in PyComplexHeatmap, let's visualaize it.

!wget https://github.com/DingWB/PyComplexHeatmap/raw/main/data/influence_of_snp_on_beta.pickle
--2022-05-05 22:37:43--  https://github.com/DingWB/pyclustermap/raw/main/data/influence_of_snp_on_beta.pickle
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-05-05 22:37:43 ERROR 404: Not Found.
import pickle
import urllib
f=open("influence_of_snp_on_beta.pickle",'rb')
data=pickle.load(f)
f.close()
beta,snp,df_row,df_col,col_colors_dict,row_colors_dict=data
# beta is DNA methylation beta values matrix, df_row and df_col are row and columns annotation respectively, col_colors_dict and row_colors_dict are color for annotation
print(beta.iloc[:,list(range(5))].head(5))
print(df_row.head(5))
print(df_col.head(5))
beta=beta.sample(2000)
snp=snp.loc[beta.index.tolist()]
df_row=df_row.loc[beta.index.tolist()]
                 204875570030_R01C02  204875570030_R04C01  \
cg30848532_TC21  0.525089             0.419515              
cg30147375_BC21  0.803776             0.585928              
cg46239718_BC21  0.443958             0.517514              
cg36100119_BC21  0.351977             0.528846              
cg42738582_BC21  0.783958             0.724901              

                 204875570030_R05C01  204875570030_R06C01  204875570035_R05C02  
cg30848532_TC21  0.483276             0.460750             0.390317             
cg30147375_BC21  0.510269             0.831463             0.550146             
cg46239718_BC21  0.535909             0.450167             0.564107             
cg36100119_BC21  0.524896             0.374422             0.551200             
cg42738582_BC21  0.802178             0.848621             0.850481             
                   chr  Target  CpG  ExtensionBase ProbeDesign CON  mapFlag  \
cg30848532_TC21  chr12  1       1    0              II          C   16        
cg30147375_BC21  chr11  0       0    0              II          C   0         
cg46239718_BC21  chr8   1       1    0              II          C   0         
cg36100119_BC21  chr19  1       1    0              II          C   16        
cg42738582_BC21  chr5   0       0    0              II          C   16        

                                        Group  \
cg30848532_TC21  Suboptimal hybridization       
cg30147375_BC21  No Effect                      
cg46239718_BC21  Artificial low meth. reading   
cg36100119_BC21  Suboptimal hybridization       
cg42738582_BC21  Suboptimal hybridization       

                                                  Type  
cg30848532_TC21  1-1-0-CG-GG-II-C-16-GA-chr12-79760438  
cg30147375_BC21  0-0-0-ca-ac-II-C-0-AG-chr11-109557651  
cg46239718_BC21  1-1-0-cg-gt-II-C-0-GA-chr8-117860829   
cg36100119_BC21  1-1-0-CG-GG-II-C-16-GA-chr19-5877949   
cg42738582_BC21  0-0-0-AA-AA-II-C-16-AG-chr5-122031379  
                       Strain              Tissue     Sex
204875570030_R01C02  MOLF_EiJ  Frontal Lobe Brain  Female
204875570030_R04C01  CAST_EiJ  Frontal Lobe Brain  Male  
204875570030_R05C01  CAST_EiJ  Frontal Lobe Brain  Female
204875570030_R06C01  MOLF_EiJ  Frontal Lobe Brain  Male  
204875570035_R05C02  CAST_EiJ  Liver               Male  
row_ha = HeatmapAnnotation(Target=anno_simple(df_row.Target,colors=row_colors_dict['Target'],rasterized=True),
                               Group=anno_simple(df_row.Group,colors=row_colors_dict['Group'],rasterized=True),
                               axis=0)
col_ha= HeatmapAnnotation(label=anno_label(df_col.Strain,merge=True,rotation=15),
                          Strain=anno_simple(df_col.Strain,add_text=True),
                          Tissue=df_col.Tissue,Sex=df_col.Sex,axis=1#df=df_col.loc[:,['Strain','Tissue','Sex']]
plt.figure(figsize=(610))
cm = ClusterMapPlotter(data=beta, top_annotation=col_ha, left_annotation=row_ha,
                     show_rownames=False,show_colnames=False,
                     row_dendrogram=False,col_dendrogram=False,
                     row_split=df_row.loc[:, ['Target''Group']],
                     col_split=df_col['Strain'],cmap='parula',
                     rasterized=True,row_split_gap=1,legend=True,
                     tree_kws={'col_cmap':'Set1'})
plt.savefig("clustermap.pdf", bbox_inches='tight')
plt.show()

Key features:

Users can split the columns and rows into multiple subgroups by giving row_split and col_split, row_split and col_split can be number of pandas dataframe or Series.

5. Composite multiple heatmap horizontally or vertically

row_ha = HeatmapAnnotation(Target=anno_simple(df_row.Target, colors=row_colors_dict['Target'], rasterized=True),
                               Group=anno_simple(df_row.Group, colors=row_colors_dict['Group'], rasterized=True),
                               axis=0)
col_ha = HeatmapAnnotation(label=anno_label(df_col.Strain, merge=True, rotation=15),
                           Strain=anno_simple(df_col.Strain, add_text=True),
                           Tissue=df_col.Tissue, Sex=df_col.Sex,
                           axis=1)  # df=df_col.loc[:,['Strain','Tissue','Sex']]

cm1 = ClusterMapPlotter(data=beta, top_annotation=col_ha, left_annotation=row_ha,
                       show_rownames=False, show_colnames=False,
                       row_dendrogram=False, col_dendrogram=False,
                       row_split=df_row.loc[:, ['Target''Group']],
                       col_split=df_col['Strain'], cmap='parula',
                       rasterized=True, row_split_gap=1, legend=True,
                        plot=False,label='beta',
                       tree_kws={'col_cmap''Set1'})  #

cm2 = ClusterMapPlotter(data=snp, top_annotation=col_ha, left_annotation=row_ha,
                        show_rownames=False, show_colnames=False,
                        row_dendrogram=False, col_dendrogram=False,
                        col_cluster_method='ward',row_cluster_method='ward',
                        col_cluster_metric='jaccard',row_cluster_metric='jaccard',
                        row_split=df_row.loc[:, ['Target''Group']],
                        col_split=df_col['Strain'],
                        rasterized=True, row_split_gap=1, legend=True,
                        plot=False,cmap='Greys',label='SNP',
                        tree_kws={'col_cmap''Set1'})  #

cmlist=[cm1,cm2]

plt.figure(figsize=(10,12))
composite(cmlist=cmlist, main=1,legendpad=0,legend_y=0.8)
plt.savefig("beta_snp.pdf", bbox_inches='tight')
plt.show()

希望这篇文章能对大家有帮助!扫描文末二维码或者搜索关注 Computational Epigenetics 公众号。

分类:

前端

标签:

工具介绍

作者介绍

O
Odin
V1