Traumatic Brain Injury and Depression: A Survival Analysis Study in R (Part 8)

February 24, 2025

Research

Tutorials

Introduction

Welcome back to our exploration of survival analysis! We've journeyed through the intricacies of data preparation, exploration, and visualization. Now, in this eighth installment, we're shifting our focus to the crucial steps that ensure a smooth, efficient, and reproducible workflow as we move into building and refining our Cox regression models.

While these steps might seem like behind-the-scenes work, they are essential for setting the stage for impactful analysis. We'll be focusing on the foundational elements that underpin robust survival modeling, all while continuing to investigate our core research question: How do depression levels one year after a traumatic brain injury (TBI) influence long-term mortality?

This post will guide you through a series of essential tasks, providing you with the tools and techniques to create a well-oiled analytical machine:

5.1 Initial Setup and Library Loading

This section configures the R environment for this project. It installs and loads necessary libraries using the pacman package, establishes a structured directory system for data organization, loads preprocessed data for analysis, and customizes font plots. These steps lay a solid foundation for data analysis.

5.2 Customizing Plot Aesthetics

A custom theme is introduced to enhance plot aesthetics. This ensures consistency in titles, axes, legends, and text styling across all visualizations, improving interpretability.

5.3 Defining and Preparing Covariates

This section involves defining and organizing covariates for nested models in survival analysis. Three groups of covariates are specified: sociodemographic variables (e.g., age, sex, education, Medicaid status), mental health variables (e.g., mental health treatment history, suicide attempts, substance use), and functional variables (e.g., function factor score at year 1). These groups are combined into a comprehensive list of all propose covariates for model building. Additionally, user-friendly labels are created for covariates and their factor levels to improve the clarity and interpretability of outputs such as tables and plots.

5.4 Univariate Cox Proportional Hazards Models

This section introduces univariate Cox proportional hazards models as an essential first step in our survival analysis. These models examine the individual effect of each covariate on the risk of the event of interest (i.e., death), providing initial insights into which factors—such as age at injury or depression level—significantly influence survival probability after moderate-to-severe traumatic brain injury (msTBI). Emphasis is placed on rounding adn formatting results for clear interpretation. By isolating the impact of each variable, we identify promising predictors for subsequent multivariable modeling, laying the groundwork for a deeper understanding of the drivers of survival outcomes.

5.5 Multivariable Cox Proportional Hazards Models

To understand how multiple factors collectively impact survival following moderate-to-severe TBI, this section employs multivariable Cox proportional hazards models. Using a nested approach, we progressively incorporate sociodemographic, mental health, and functional variables, estimating their independent effects while controlling for potential confounders. Results are rounded, clearly formatted, and visualized with comparative tables. Hazard ratios (HRs) and their corresponding confidence intervals (CIs) are interpreted to highlight both statistical and clinical significance. This step-by-step process illuminates the sophisticated network of factors affecting survival and lays the groundwork for addressing multicollinearity in the upcoming blog post section.

5.1 Initial Setup and Library Loading

Initialization and Environment Setup

This script establishes the foundational environment for data analysis by loading essential R libraries, setting up a structured directory system for data management, loading preprocessed data, and configuring plot aesthetics. These steps ensure a reproducible, organized, and visually consistent workflow.

Library Management with pacman

The first step involves ensuring all necessary libraries are installed and loaded. The pacman package simplifies this process, providing a unified interface for library management. If pacman is not already installed, it can be installed using:

# Load the pacman package (install if necessary)
if (!requireNamespace("pacman", quietly = TRUE)) {
  install.packages("pacman")
}

Subsequently, pacman::p_load is used to load the following libraries:

# Install and load prerequisite libraries
pacman::p_load(autoReg, broom, corrplot, extrafont, forestplot, ggsurvfit, grid, gridExtra, gt, gtsummary, here, patchwork, scales, showtext, survival, survminer, tidyverse)

  • autoReg: Simplifies regression modeling workflows by automating the generation of comprehensive and customizable summary tables.

  • broom: Bridges the gap between statistical modeling and tidy data principles, converting model outputs into tidy data frames for seamless analysis and visualization.

  • corrplot: Enhances data exploration with elegant visualization tools for correlation matrices, enabling intuitive analysis of variable relationships.

  • extrafont: Enables the import and management of custom fonts, enhancing the visual appeal of plots.

  • forestplot: Creates publication-ready forest plots, enabling clear visualization of estimates and confidence intervals for regression and meta-analysis results.

  • ggsurvfit: Extends ggplot2 for survival analysis, providing tools to craft customizable Kaplan-Meier plots and survival curves with ease.

  • grid: Empowers advanced graphics manipulation by enabling precise layout design and control of graphical elements in R plots.

  • gridExtra: Facilitates the arrangement of multiple grid-based plots, streamlining the creation of composite visualizations and complex layouts.

  • gt: Creates highly customizable and presentation-ready tables.

  • gtsummary: Simplifies the creation of publication-ready tables for summarizing data and regression model results.

  • here: Enables robust file path management, ensuring project portability across different computing environments.

  • patchwork: Simplifies the creation of multi-panel plots by seamlessly combining ggplot2 visualizations into cohesive layouts.

  • scales: Provides functions for customizing plot scales, allowing for fine-grained control over plot aesthetics.

  • showtext: Enhances text rendering in R graphics by enabling the use of custom fonts, improving aesthetic appeal and readability.

  • survival: Provides comprehensive tools for survival analysis, supporting Kaplan-Meier curves, Cox proportional hazards models, and more.

  • survminer: Streamlines the visualization of survival analysis results, offering intuitive tools for creating and customizing survival curves and diagnostic plots.

  • tidyverse: A collection of R packages designed for data manipulation, exploratino, and visualization, promoting a streamlined and consistent workflow.

Using pacman guarantees that all library dependencies are correctly managed, minimizing issues arising from missing or incompatible packages.

Directory Structure for Organized Data Management

To maintain a well-organized project, the following directory structure is established:

# Create the 'Data/Processed' subdirectory if not already accessible
data_processed_dir <- here("Data", "Processed")
if (!dir.exists(data_processed_dir)) {
  dir.create(data_processed_dir, recursive = TRUE)
}

# Create the 'Output/Tables' subdirectory if not already available
tables_dir <- here("Output", "Tables")
if (!dir.exists(tables_dir)) {
  dir.create(tables_dir, recursive = TRUE)
}

# Create the 'Output/Plots/Missingness' subdirectory if not already available
missingness_plots_dir <- here("Output", "Plots", "Missingness")
if (!dir.exists(missingness_plots_dir)) {
  dir.create(missingness_plots_dir, recursive = TRUE)
}

# Create the 'Output/Plots/Schoenfeld_Residuals' subdirectory if not already available
schoenfeld_residuals_plots_dir <- here("Output", "Plots", "Schoenfeld_Residuals")
if (!dir.exists(schoenfeld_residuals_plots_dir)) {
  dir.create(schoenfeld_residuals_plots_dir, recursive = TRUE)
}

# Create the 'Output/Plots/Martingale_Residuals' subdirectory if not already available
martingale_residuals_plots_dir <- here("Output", "Plots", "Martingale_Residuals")
if (!dir.exists(martingale_residuals_plots_dir)) {
  dir.create(martingale_residuals_plots_dir, recursive = TRUE)
}

# Create the 'Output/Plots/Multicollinearity' subdirectory if not already available
multicollinearity_plots_dir <- here("Output", "Plots", "Multicollinearity")
if (!dir.exists(multicollinearity_plots_dir)) {
  dir.create(multicollinearity_plots_dir, recursive = TRUE)
}

# Create the 'Output/Plots/Survival_Curves' subdirectory if not already available
survival_curves_plots_dir <- here("Output", "Plots", "Survival_Curves")
if (!dir.exists(survival_curves_plots_dir)) {
  dir.create(survival_curves_plots_dir, recursive = TRUE)
}

# Create the Output/Plots/Forest_Plots directory if it doesn't exist
forest_plots_dir <- here("Output", "Plots", "Forest_Plots")
if (!dir.exists(forest_plots_dir)) {
  dir.create(forest_plots_dir)
}
  • Data/Processed: This directory stores the cleaned, transformed, and analysis-ready datasets.

  • Output/Tables: This directory stores the generated summary tables and regression results, providing a centralized location for easily accessible tabular outputs.

  • Output/Plots/Missingness: This directory stores visualizations of missing data patterns, aiding in the assessment and handling of missingness in datasets.

  • Output/Plots/Schoenfeld_Residuals: This directory stores Schoenfeld residual plots, enabling diagnostics for the proportional hazards assumption in Cox regression models.

  • Output/Plots/Martingale_Residuals: This directory stores Martingale residual plots, facilitating the evaluation of model fit and identification of outliers or nonlinearity.

  • Output/Plots/Multicollinearity: This directory stores visualizations of multicollinearity diagnostics, including correlation matrices and VIF results, to assess predictor relationships.

  • Output/Plots/Survival_Curves: This directory stores survival curve plots, illustrating survival probabilities over time for different covariates or groups.

  • Output/Plots/Forest_Plots: This directory stores forest plots, providing a clear summary of hazard ratios and confidence intervals for regression model covariates.

The here package is employed to dynamically generate these directories. The dir.create function, with recursive = TRUE, ensures that all necessary subdirectories are created. This approach guarantees consistent file path construction across different operating systems, enhancing project reproducibility.

Loading Preprocessed Data

The core dataset, which has undergone prior cleaning and transformation and includes only the covariates selected for the Cox regression models, is loaded using the readRDS function:

# Load the .rds files
complete_cases_select <- readRDS(file.path(data_processed_dir, "complete_cases_select_final.rds"))

The .rds format is efficient for storing and retrieving R objects, preserving data structure and metadata. This step ensures that the analysis-ready data are available for subsequent steps. It's recommended to verify that the data frame has loaded correctly using functions like str() or head().

Customizing Plot Aesthetics with Imported Fonts

To enhance the visual appeal and consistency of plots, system fonts are imported using:

# Import fonts from Font Book
loadfonts(device = "all", quiet = TRUE)

This makes a wider range of fonts available for use in R visualizations, enabling greater control over plot aesthetics. This will improve the quality of visualizations.

Conclusion

By executing these initial steps, we establish a reproducible, organized, and visually consistent foundation for the subsequent data analysis, ensuring the integrity and efficiency of the workflow. This step lays the groundwork for a robust and insightful analysis.

5.2 Customizing Plot Aesthetics

Establishing a Unified Visual Style with Custom Plotting Preferences

This section focuses on establishing a consistent and visually appealing aesthetic for all plots generated throughout the analysis. By defining a custom theme, we aim to enhance the readability, professionalism, and overall impact of our visualizations. Predefining these plotting preferences streamlines the process of creating multiple plots and ensures a cohesive visual identity across the entire project.

Crafting a Custom Theme

The foundation of our visual style is a custom theme, designed to optimize the presentation of our data. This theme, named customization, dictates the appearance of various plot elements, from titles and legends to axis labels and text. The customization theme is defined as follows:

customization <- theme(
  title = element_text(family = "Proxima Nova", face = "bold", size = 10),
  legend.text = element_text(family = "Proxima Nova", size = 9.5),
  axis.title.x = element_text(family = "Proxima Nova", face = "bold", size = 12, margin = margin(t = 10)),
  axis.title.y = element_text(family = "Proxima Nova", face = "bold", size = 12, margin = margin(r = 10)),
  axis.text = element_text(family = "Proxima Nova", size = 10),
  text = element_text(family = "Proxima Nova"),
  legend.position = "top"
)

Dissecting the Theme's Components

Each element within the customization theme plays a specific role in shaping the overall look and feel of the plots. The title, for instance, is styled using element_text to be rendered in a bold, 20-point "Proxima Nova" font. This choice of font, known for its clean and modern aesthetic, ensures that the title is both prominent and visually appealing. Similarly, the legend title is set in bold "Proxima Nova" with a font size of 10, while the legend text itself uses a slightly smaller font size of 9.5. This subtle difference in size creates a visual hierarchy within the legend, making it easier to distinguish between the title and the individual items.

Axis titles, both x and y, are also rendered in bold "Proxima Nova," but with a font size of 12. A margin is added around the axis titles—10 points of spacing above the x-axis title and 10 points to the right of the y-axis title—to enhance readability and prevent them from overlapping with other plot elements. The axis text, which includes the tick labels, is styled using a 10-point "Proxima Nova" font, ensuring consistency while maintaining a clear visual distinction from the axis titles.

The text element within the theme sets the default font for all other text elements in the plot to "Proxima Nova," guaranteeing a uniform appearance across the entire visualization. Finally, the legend.position is set to "top," positioning the legend above the plot. This placement helps to minimize visual clutter within the plot area, especially when dealing with complex visualizations or plots with a wide aspect ratio.

Advantages of a Unified Visual Style

Employing a custom theme offers several key advantages. First, it ensures consistency across all visualizations, creating a cohesive and professional look that is particularly important for reports, presentations, and publications. Second, the carefully chosen font sizes, spacing, and positioning enhance the readability of the plots, making them easier to interpret for a wider audience. Finally, defining the theme once streamlines the plotting process, eliminating the need to repeatedly specify styling options for each individual plot.

Applying the Custom Theme

Integrating this custom theme into your plots is straightforward. Simply add + customization to the end of your ggplot code:

my_plot <- ggplot(data, aes(x, y)) +
  geom_point() +
  labs(title = "Example Plot", x = "X-Axis", y = "Y-Axis") +
  custom_theme

This simple addition ensures that the plot adheres to the predefined styling preferences, guaranteeing a consistent and visually appealing presentation.

Practical Considerations

When working with custom fonts like "Proxima Nova," ensure that the font is installed on your system and properly registered in R using extrafont::loadfonts(). Additionally, consider your target audience when selecting font sizes; for instance, slightly larger font sizes might be necessary for presentations to ensure visibility from a distance. If the legend interferes with the plot data in certain instances, consider adjusting its position dynamically using options like legend.position = "bottom" or legend.position = "right" as needed.

Conclusion

By establishing a custom theme, we've laid the groundwork for creating professional, visually consistent, and highly readable plots throughout the analysis. This proactive approach not only elevates the aesthetic quality of our visualizations but also streamlines the plotting process, making it a valuable investment in the overall workflow. This attention to visual detail ensures that the data are communicated effectively and engagingly.

5.3 Defining and Preparing Covariates

Introduction

This section outlines the selection, grouping, and labeling of key measures (covariates) used in our statistical models to analyze survival following traumatic brain injury (TBI). Properly defining covariates is crucial for a structured, interpretable analysis aligned with our research objectives.

Rationale for Covariate Definition

Covariates are variables that are hypothesized to influence the outcome of interest: survival time after TBI. Careful covariate selection and definition ensures:

  • Clarity and Consistency: Standardized definitions are used throughout the analysis.

  • Simplified Model Building: Related variables are grouped for efficient testing.

  • Enhanced Interpretability: Results are presented using clear, descriptive labels.

Strategic Covariate Grouping

To facilitate a stepwise approach to model building, covariates are organized into three logical groups based on their conceptual relationship to survival:

  • Sociodemographic Variables: Characteristics reflecting social or demographic factors, including age_at_injury, sex, education_level_at_injury, and rehab_payor_primary_type (as a proxy for Medicaid status).

  • Mental Health Variables: Indicators of psychological well-being, such as mental_health_tx_hx, suicide_attempt_hx, and problematic_substance_use_at_injury.

  • Functional Variables: Measures of functional status post-injury, represented by func_score_at_year_1_q5 (functional ability factor score one year post-injury, divided into quintiles).

The following R code defines these groups:

# Define the covariate groups
all_proposed_sociodemographic_vars <- c("age_at_injury",
                                        "sex",
                                        "education_level_at_injury", 
                                        "rehab_payor_primary_type")

all_proposed_mental_health_covariates <- c("mental_health_tx_hx",
                                           "suicide_attempt_hx",
                                           "problematic_substance_use_at_injury")

all_proposed_functional_vars <- "func_score_at_year_1_q5"

# Combine the covariate groups in the desired order
all_proposed_covariates <- c("depression_level_at_year_1",
  all_proposed_sociodemographic_vars, 
  all_proposed_mental_health_covariates,
  all_proposed_functional_vars)

Enhancing Interpretability with Descriptive Labels

To improve the readability of statistical outputs, we replace technical variable names with descriptive labels. For example, func_score_at_year_1_q5 becomes "Function Factor Score at Year 1 Quintiles."

These mappings are defined in R as follows:

# Define the preferred variable labels
var_name_mapping <- list(
  depression_level_at_year_1 = "Depression Level at Year 1",
  age_at_injury = "Age at Injury",
  sex = "Sex",
  education_level_at_injury = "Educational Attainment at Injury",
  rehab_payor_primary_type = "Medicaid Status",
  mental_health_tx_hx = "History of Mental Health Treatment",
  suicide_attempt_hx = "History of Suicide Attempt",
  problematic_substance_use_at_injury = "Problematic Substance Use at Injury",
  func_score_at_year_1_q5 = "Function Factor Score at Year 1 Quintiles",
  GLOBAL = "GLOBAL"
)

These labels will be consistently applied across all tables, figures, and statistical summaries.

Best Practices for Covariate Management

  • Maintain Consistency: Employ the same labels throughout the analysis for clarity.

  • Align with Research Goals: Organize covariates to reflect your hypotheses and model building strategy.

  • Iterate and Refine: Review initial summaries and visualizations to ensure data quality and completeness before proceeding.

Next Steps

With covariates clearly defined and labeled, we can now explore their individual and combined influence on survival outcomes.

5.4 Univariate Cox Proportional Hazards Models

Introduction

This section details the construction of univariate Cox proportional hazards models—a crucial initial step in our survival analysis. By analyzing each covariate individually, we assess its independent association with survival time before proceeding to more complex multivariable models.

Rationale for a Univariate Approach

  1. Isolate Potential Predictors: Identify individual covariates that may influence survival.

  2. Prioritize Significant Variables: Focus on covariates with statistically significant associations for further investigation.

  3. Validate Data Formatting: Ensure that each variable meets model assumptions and is appropriately formatted before inclusion in multivariable models.

Understanding the Cox Proportional Hazards Model:

The Cox model estimates the hazard ratio (HR), which quantifies the effect of a covariate on the risk of the event of interest (e.g., mortality). An HR greater than 1 indicates an increased risk, while an HR less than 1 indicates a decreased risk associated with a one-unit increase in the covariate.

In this analysis, time_to_event_in_years represents survival time, and event_status indicates whether the event occurred (1 = event, 0 = censored).

Implementing Univariate Models in R

The following R code rounds and formats the numeric components of a Cox model summary to two decimal places for improved readability:

# Function to round and format the Cox model summary results
round_cox_summary <- function(cox_model) {
  # Get the summary of the Cox model
  summary_cox <- summary(cox_model)
  
  # Round the numeric values in the summary to 2 decimal places
  summary_cox$coefficients <- round(summary_cox$coefficients, 2)
  if (!is.null(summary_cox$conf.int)) {
    summary_cox$conf.int <- round(summary_cox$conf.int, 2)
  }
  summary_cox$logtest <- round(summary_cox$logtest, 2)
  summary_cox$waldtest <- round(summary_cox$waldtest, 2)
  summary_cox$sctest <- round(summary_cox$sctest, 2)
  if (!is.null(summary_cox$rsq)) {
    summary_cox$rsq <- round(summary_cox$rsq, 2)
  }
  
  # Return the modified summary
  return(summary_cox)
}

The following R code performs univariate Cox regression for a list of covariates, validates input data, handles errors, and saves the rounded summary of each model to the global environment:

# Function to perform univariate Cox regression modeling
univariate_cox_models <- function(variables, data) {
  for (var in variables) {
    # Print the variable being processed for debugging
    cat("Processing variable:", var, "\n")
    
    # Check if the required columns are present in the data
    if (!all(c("time_to_event_in_years", "event_status") %in% colnames(data))) {
      stop("Data does not contain required columns: 'time_to_event_in_years' and 'event_status'")
    }
    
    # Check if the data columns are of the correct type
    if (!is.numeric(data$time_to_event_in_years)) {
      stop("The 'time_to_event_in_years' column must be numeric")
    }
    if (!is.numeric(data$event_status)) {
      stop("The 'event_status' column must be numeric")
    }
    
    # Print the formula being used for debugging
    formula_str <- paste('Surv(time_to_event_in_years, event_status) ~', var)
    cat("Using formula:", formula_str, "\n")
    
    # Construct the Cox regression model using each covariate
    cox_model <- tryCatch(
      {
        coxph(as.formula(formula_str), data = data)
      },
      error = function(e) {
        cat("Error in Cox model for variable", var, ":", e$message, "\n")
        return(NULL)
      }
    )
    
    # Check if the model was successfully created
    if (is.null(cox_model)) {
      next
    }
    
    # Print the summary for variable
    cat("Summary for variable:", var, "\n")
    
    # Print the rounded summary of the model
    rounded_summary <- round_cox_summary(cox_model)
    print(rounded_summary)
    
    # Save the rounded summary to the R environment
    assign(paste0("rounded_summary_", var), rounded_summary, envir = .GlobalEnv)
  }
}

The following R code executes univariate Cox models for groups of covariates related to sociodemographics, mental health, and functional variables using the complete-case dataset:

# Run univariate Cox models for each group of covariates
univariate_cox_models(all_proposed_sociodemographic_vars, complete_cases_all)
univariate_cox_models(all_proposed_mental_health_covariates, complete_cases_all)
univariate_cox_models(all_proposed_functional_vars, complete_cases_all)

Code Functionality Breakdown

  • round_cox_summary(): This helper function formats the Cox model output by rounding numeric values to two decimal places for improved readability.

  • univariate_cox_models(): This function iterates through a list of covariates, fitting a separate Cox model for each using the formula Surv(time_to_event_in_years, event_status) ~ covariate.

  • Error Handling: A tryCatch block gracefully handles potential errors during model fitting, preventing the entire loop from halting.

Interpreting the Output

The script outputs the following for each covariate:

  • Hazard Ratio (HR): The estimated effect size of the covariate on survival.

  • Confidence Interval (CI): The likely range of the true HR.

  • p-value: Indicates the statistical significance of the association (e.g., p < 0.05 suggests a significant association).

Key Considerations

  • Data Type Verification: Ensure covariates are appropriately formatted (e.g., categorical, continuous) before model fitting.

  • Beyond Statistical Significance: A significant p-value does not automatically imply clinical or practical importance; consider the magnitude of the HR and its real-world implications.

  • Result Archiving: The assign() function saves each model's output for future reference or further analysis.

Moving Forward

The results of these univariate models help to inform the selection of promising covariates for inclusion in multivariable models. In the next section, we will construct nested multivariable models, strategically adding groups of covariates to refine our analysis and gain a more comprehensive understanding of survival.

By beginning with univariate models, we establish a crucial foundation for understanding the individual relationships between covariates and survival outcomes, setting the stage for more sophisticated analyses.

5.5 Multivariable Cox Proportional Hazards Models

Introduction

This section advances our analysis from individual covariate effects to a more comprehensive multivariable approach using Cox regression. By incorporating multiple covariates into a single model, we can estimate their independent contributions to survival while controlling for the influence of other factors. We will employ a nested modeling strategy to systematically assess the impact of adding successive groups of variables.

Advantages of Multivariable Modeling

  1. Confounding Control: Adjusts for potential confounding factors, isolating the unique effect of each covariate.

  2. Realistic Representation: Creates models that better reflect the complex interplay of factors influencing survival outcomes.

  3. Interaction Exploration: Allows for the investigation of how groups of covariates jointly influence survival.

Our nested approach progressively adds covariate groups, enabling us to evaluate their incremental contribution to the model's explanatory power.

Constructing the Nested Model

The following R code builds our nested multivariable Cox regression models. Each model adds a new group of covariates to the preceding one:

# Fit Model 1: Baseline Model with Depression Level at Year 1
model_1 <- coxph(Surv(time_to_event_in_years, event_status) ~ depression_level_at_year_1, complete_cases_all)

# Fit Model 2: Adding Sociodemographic Covariates
model_2_formula <- as.formula(paste('Surv(time_to_event_in_years, event_status) ~', 
                                    paste(c('depression_level_at_year_1',
                                            all_proposed_sociodemographic_vars),
                                            collapse = " + ")))
model_2 <- coxph(model_2_formula, data = complete_cases_all)

# Fit Model 3: Adding Mental Health Covariates
model_3_formula <- as.formula(paste('Surv(time_to_event_in_years, event_status) ~', 
                                    paste(c('depression_level_at_year_1',
                                      all_proposed_sociodemographic_vars, 
                                      all_proposed_mental_health_covariates),
                                      collapse = " + ")))
model_3 <- coxph(model_3_formula, data = complete_cases_all)

# Fit Model 4: Adding Functional Covariates
model_4_formula <- as.formula(paste('Surv(time_to_event_in_years, event_status) ~', 
                                    paste(c('depression_level_at_year_1',
                                      all_proposed_sociodemographic_vars,
                                      all_proposed_mental_health_covariates,
                                      all_proposed_functional_vars), 
                                      collapse = " + ")))
model_4 <- coxph(model_4_formula, data = complete_cases_all)

Summarizing and Formatting Model Results

We use a custom function, round_cox_summary(), to neatly format the model summaries for improved readability:

# Summarize and round results for each model
rounded_summary_model_2 <- round_cox_summary(model_2)
rounded_summary_model_3 <- round_cox_summary(model_3)
rounded_summary_model_4 <- round_cox_summary(model_4)

# Print the rounded summaries
print(rounded_summary_model_2)
print(rounded_summary_model_3)
print(rounded_summary_model_4)

Visualizing and Comparing Models with gtsummary

To facilitate model comparison, we present a combined table of all models using the gtsummary package:

# Create gtsummary tables for each model, applying labels and formatting
tbl_model_1 <- tbl_regression(model_1, exponentiate = TRUE, label = var_name_mapping, pvalue_fun = purrr::partial(style_pvalue, digits = 2)) |>
  modify_header(p.value ~ "**_p_**") |>
  bold_labels() |>
  bold_p()

tbl_model_2 <- tbl_regression(model_2, exponentiate = TRUE, label = var_name_mapping, pvalue_fun = purrr::partial(style_pvalue, digits = 2)) |>
  modify_header(p.value ~ "**_p_**") |>
  bold_labels() |>
  bold_p()

tbl_model_3 <- tbl_regression(model_3, exponentiate = TRUE, label = var_name_mapping, pvalue_fun = purrr::partial(style_pvalue, digits = 2)) |>
  modify_header(p.value ~ "**_p_**") |>
  bold_labels() |>
  bold_p()

tbl_model_4 <- tbl_regression(model_4, exponentiate = TRUE, label = var_name_mapping, pvalue_fun = purrr::partial(style_pvalue, digits = 2)) |>
  modify_header(p.value ~ "**_p_**") |>
  bold_labels() |>
  bold_p()

# Merge the tables into a single, comparative summary
tbl_nested_models <- tbl_merge(
  tbls = list(tbl_model_1, tbl_model_2, tbl_model_3, tbl_model_4),
  tab_spanner = c("**Model 1**", "**Model 2**", "**Model 3**", "**Model 4**")
)

# Convert the merged table and adjust the font size
tbl_nested_models <- tbl_nested_models |>
  as_gt() |>
  gt::tab_options(table.font.size = 'small')

# Save the table as an image
gtsave(tbl_nested_models, here(tables_dir, "table_nested_models.png"))

Interpreting the Multivariable Model Results

  • Hazard Ratios (HRs): Each covariate's HR now represents its independent effect on survival after adjusting for all other variables in the model.

  • Confidence Intervals (CIs): Narrow CIs indicate more precise estimates. CIs that include 1 suggest that the covariate is not statistically significant in the presence of other variables.

  • Assessing Incremental Contributions: Comparing nested models reveals how much each group of covariates improves the model's ability to explain survival variation.

Key Considerations

  • Model Building Strategy: Starting with a key variable and incrementally adding complexity allows for a more nuanced understanding of covariate relationships.

  • Clinical and Statistical Significance: Statistical significance should be interpreted alongside practical implications and clinical relevance.

  • Model Diagnostics: Always verify that the proportional hazards assumption is met for each model (addressed in a later section).

Next Steps

Having constructed and interpreted our nested models, we now have a more comprehensive understanding of how different factors collectively influence survival after TBI. The next blog post will address multicollinearity, ensuring that our predictors are not excessively correlated, which can compromise model stability and interpretation.

By systematically building and carefully interpreting multivariable Cox models, we gain valuable and actionable insights into the web of factors driving survival outcomes in TBI research.

Conclusion

This post has provided a solid foundation for a comprehensive survival analysis within the TBIMS project. We began by establishing a reproducible R environment and carefully defining our covariates. We then systematically analyzed factors influencing survival after moderate-to-severe TBI, progressing from univariate to multivariable Cox proportional hazards models. This process allowed us to isolate the effects of individual predictors and understand their complex interplay, revealing insights with both statistical and clinical significance.

Reflecting on Our Progress: Key Accomplishments

Let's take a moment to reflect on the key steps we've conquered in this installment:

  1. Streamlining Our R Environment: We mastered the initial setup, efficiently loading essential libraries and establishing a structured project directory. This might seem like a small detail, but it's the bedrock of a smooth and reproducible workflow, saving us time and headaches down the road.

  2. Elevating Visual Consistency: We crafted a custom visual theme, ensuring that all of our plots will have a unified, professional look. This enhances the clarity and impact of our visualizations, making them more effective tools for communication.

  3. Defining and Refining Our Covariates: We thoughtfully defined our covariates of interest, creating both comprehensive and focused lists for different stages of analysis. We also assigned clear, descriptive labels, ensuring that our variables are easily understood and consistently represented.

  4. Building Univariate and Multivariable Models: We took our first steps into modeling, constructing both univariate and multivariable Cox regression models. This allowed us to explore the individual and combined effects of our predictors on survival, setting the stage for identifying key risk factors. We used a nested modeling approach to strategically evaluate the specific impact of groups of variables, such as our mental health-related variables.

Why This Matters: Building a Bridge from Data to Knowledge

These steps are far more than just technical procedures. They are about building a trustworthy bridge between our raw data and the insights we seek to uncover. We've ensured that:

  • Our Results Will Be Reliable: Thoughtful preparation, from organizing our environment to carefully defining our variables, minimizes errors and enhances the clarity and accuracy of our findings.

  • Our Workflow is Transparent and Reproducible: Clear documentation, a structured project directory, and well-commented code make it possible for others (or our future selves) to easily replicate and validate our analysis.

  • Our Findings Will Be Actionable: By building a solid foundation, we're setting the stage for uncovering patterns and relationships that can ultimately translate into real-world applications, potentially improving the lives of individuals with TBI.

Looking Ahead: Fine-Tuning Our Models and Unveiling the Story

Moving forward, our next steps will involve rigorous model validation. We will examine crucial assumptions underlying Cox regression, including the presence of multicollinearity and the validity of the proportional hazards assumption. Martingale residual diagnostics will be employed to assess model fit, and key findings will be illustrated through the visualization of survival curves. This thorough validation process is essential to ensuring the robustness and interpretability of our results as we continue to refine our understanding of post-TBI survival.

Comments