V1

2023/04/10阅读：22主题：极简黑

# R--制作Table1

Let's start!!!

## 案例

Using the COVID dataset (covid_20210908_rmph.rData, see Appendix A.4), numerically and visually examine the continuous variable hospitals_per_100k (number of hospitals per 100,000 persons) and the categorical variable CensusRegionName.

``#读取数据load("covid_20210908_rmph.rData")``
``#Task1:par(mfrow = c(1,2))hist(covid\$hospitals_per_100k, main = "hospitals_per_100k")barplot(table(covid\$CensusRegionName), main = "CensusRegionName")``

Using the United Nations Human Development Data (unhdd2020.rmph.rData, see Appendix A.2), create a “Table 1” of descriptive statistics (mean and standard deviation for continuous variables, frequency and proportion for categorical variables), overall and by Human Development Index group (hdi_group).

``````hdi: Human Development Index (HDI)
life: Life expectancy at birth (years)
educ_expected: Expected years of schooling (years)
gii: Gender Inequality Index
urban: Urban population (%)
``````

``#Task2:load("unhdd2020.rmph.rData")library(gtsummary)str(unhdd)unhdd %>% select(hdi, life, educ_expected, gii, urban, hdi_group) %>%   drop_na() %>% #选择完整病例  tbl_summary(    # The "by" variable    by = hdi_group,    statistic = list(all_continuous()  ~ "{mean} ({sd})",                     all_categorical() ~ "{n}    ({p}%)"),    digits = list(all_continuous()  ~ c(2, 2),                  all_categorical() ~ c(0, 1)),    type = list(hdi      ~ "continuous",                life ~ "continuous",                educ_expected   ~ "continuous",                gii ~ "continuous",                urban ~ "continuous"),    label  = list(hdi      ~ "Hdi",                  life ~ "Life",                  educ_expected   ~ "Educ_expected",                  urban ~ "Urban")  ) %>%  modify_header(    label = "**Variable**",    # The following adds the % to the column total label    # <br> is the location of a line break    all_stat_cols() ~ "**{level}**<br>N = {n} ({style_percent(p, digits=1)}%)"  ) %>%  modify_caption("Participant characteristics, by hdi_group") %>%  bold_labels()  %>%  # Include an "overall" column  add_overall(    last = FALSE,    # The ** make it bold    col_label = "**All participants**<br>N = {N}"  )``

### 参数设置

statistic: The default for continuous variables is the median and IQR. The default for categorical variables is the frequency and proportion. Below, this option is used to instead compute the mean and standard deviation for continuous variables (and the default for categorical variables is coded explicitly).#连续变量的默认值是中位数和IQR。分类变量的默认值是频率和比例。下面，此选项用于计算连续变量的平均值和标准差（分类变量的默认值是显式编码的）
digits: tbl_summary() guesses the number of digits to which to round. Use this option to set the number yourself. Since two statistics are specified in the statistic option, two numbers are specified here, one for each statistic. #tbl_summary（）猜测要舍入到的位数。使用此选项可以自行设置数字。由于在statistics选项中指定了两个统计信息，因此此处指定两个数字，每个统计信息一个。
type: tbl_summary() guesses whether a variable is continuous or categorical based on its distribution. Use this option to specify the types yourself, in particular if you need to override the default. #tbl_summary（）根据变量的分布来猜测变量是连续的还是分类的。使用此选项可以自己指定类型，特别是在需要覆盖默认值的情况下。
label: The default row labels are the variable names or labels (if the dataset has been labeled, for example, using the Hmisc library label() function). Use this option to change the row headers. #默认的行标签是变量名或标签（如果数据集已被标记，例如，使用Hmisc库标签（）函数）。使用此选项可以更改行标题。
modify_header: The default column header is “Characteristic”. Use this option to change the column header. Surrounding text by ** results in a bold font. #默认列标题为“Characteristic”。使用此选项可以更改列标题。用将文本包围会产生粗体。
modify_caption: Use this option to add a table caption.#使用此选项可以添加表格标题。
bold_labels: Use this option to display the row labels in a bold font.#使用此选项可以用粗体显示行标签。

Using the COVID dataset (covid_20210908_rmph.rData, see Appendix A.4), create a “Table 1” of descriptive statistics, overall and by the number of hospitals per 100,000 persons (hospitals_per_100k) (use a median split to create a binary version of this “by” variable). Describe the following variables in this table:

``````pop.usafacts: County population
cases.usafacts.20210908: COVID-19 cumulative cases as of 2021-09-08
deaths.usafacts.20210908: COVID-19 cumulative deaths as of 2021-09-08
MedianAge2010: median age of county in 2010
CensusRegionName: name of census region
``````

``load("covid_20210908_rmph.rData")library(gtsummary)covid <- covid %>% mutate(by = if_else(hospitals_per_100k > median(hospitals_per_100k), ">median", "<=median"))covid %>% select(pop.usafacts, cases.usafacts.20210908, deaths.usafacts.20210908, MedianAge2010, CensusRegionName, by) %>%   drop_na() %>% #选择完整病例  tbl_summary(    # Use the median split variable as the "by" variable    by = by,    statistic = list(all_continuous()  ~ "{median} ({p25}, {p75})",                     all_categorical() ~ "{n}    ({p}%)"),    digits = list(all_continuous()  ~ c(2, 2),                  all_categorical() ~ c(0, 1))  ) %>%  modify_header(    label = "**Variable**",    all_stat_cols() ~ "**{level}**<br>N = {n} ({style_percent(p, digits=1)}%)"  ) %>%  modify_caption("Participant characteristics, by SBP") %>%  bold_labels() %>%   add_overall(last = FALSE,              col_label = "**All participants**<br>N = {N}")``

## 输出表格

``Table1 <- covid %>% select(pop.usafacts, cases.usafacts.20210908, deaths.usafacts.20210908, MedianAge2010, CensusRegionName, by) %>%   drop_na() %>% #选择完整病例  tbl_summary(    # Use the median split variable as the "by" variable    by = by,    statistic = list(all_continuous()  ~ "{median} ({p25}, {p75})",                     all_categorical() ~ "{n}    ({p}%)"),    digits = list(all_continuous()  ~ c(2, 2),                  all_categorical() ~ c(0, 1))  ) %>%  modify_header(    label = "**Variable**",    all_stat_cols() ~ "**{level}**<br>N = {n} ({style_percent(p, digits=1)}%)"  ) %>%  modify_caption("Participant characteristics, by By") %>%  bold_labels() %>%   add_overall(last = FALSE,              col_label = "**All participants**<br>N = {N}")Table1 %>%   as_flex_table() %>%   flextable::save_as_docx(path = "MyTable1.docx")``

Word上显示效果还是不错的。。。

## 添加P值

``covid %>% select(pop.usafacts, cases.usafacts.20210908, deaths.usafacts.20210908, MedianAge2010, CensusRegionName, by) %>%   drop_na() %>% #选择完整病例  tbl_summary(    # Use the median split variable as the "by" variable    by = by,    statistic = list(all_continuous()  ~ "{median} ({p25}, {p75})",                     all_categorical() ~ "{n}    ({p}%)"),    digits = list(all_continuous()  ~ c(2, 2),                  all_categorical() ~ c(0, 1))  ) %>%  add_p(    test = list(all_continuous()  ~ "wilcox.test",                all_categorical() ~ "chisq.test"),    pvalue_fun = function(x) style_pvalue(x, digits = 3)) %>% #添加P值  modify_header(    label = "**Variable**",    all_stat_cols() ~ "**{level}**<br>N = {n} ({style_percent(p, digits=1)}%)"  ) %>%  modify_caption("Participant characteristics, by By") %>%  bold_labels() %>%   add_overall(last = FALSE,              col_label = "**All participants**<br>N = {N}")``

## 制作回归分析结果

``library(gtsummary)mod1 <- glm(response ~ trt + age + grade, trial, family = binomial)t1 <- tbl_regression(mod1, exponentiate = TRUE)library(survival)# build survival model tablet2 <- coxph(Surv(ttdeath, death) ~ trt + grade + age, trial) %>%  tbl_regression(exponentiate = TRUE)# merge tables tbl_merge_ex1 <-  tbl_merge(    tbls = list(t1, t2),    tab_spanner = c("**Tumor Response**", "**Time to Death**")  )tbl_merge_ex1``

OK!打完收工！

- END -

Python

V1