SNPLinkage is modular enough to use directly dataframes of correlation matrices and chromosomic positions specified by the user, e.g. for visualizing RNASeq data. The user can compute a correlation matrix or any kind of pair-wise similarity matrix independently and then use SNPLinkage to build and arrange easily customizable ggplot2 objects.
The user can specify the correlations he wants to visualize as a
dataframe to the ggplot_ld function. The column names must
follow the following pattern: SNP_A and SNP_B
for the two variables in relation, and R2 for the
correlation value.
  library(snplinkage)
#> Loading required package: GWASTools
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#>     union, unique, unsplit, which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
  # example rnaseq data matrix, 20 variables of 20 patients
  m_rna = matrix(runif(20 ^ 2), nrow = 20)
  # pair-wise correlation matrix
  m_ld = cor(m_rna) ^ 2
  # keep only upper triangle and reshape to data frame
  m_ld[lower.tri(m_ld, diag = TRUE)] = NA
  df_ld = reshape2::melt(m_ld) |> na.omit()
  # rename for SNPLinkage
  names(df_ld) = c('SNP_A', 'SNP_B', 'R2')
  # visualize with ggplot_ld
  gg_ld = ggplot_ld(df_ld)
  gg_ldSimilarly, the user can specify a dataframe to the
ggplot_snp_pos function. The dataframe is assumed to be in
the same order as the correlation dataframe, and the column name
position is required.
  # let's imagine the 20 variables came from 3 physically close regions
  positions = c(runif(7, 31e6, 31.5e6), runif(6, 32e6, 32.5e6),
                runif(7, 33e6, 33.5e6)) |> sort()
  # build the dataframe
  df_snp_pos = data.frame(position = positions)
  # minimal call
  gg_snp_pos = ggplot_snp_pos(df_snp_pos)Optionally, one can specify the labels_colname parameter
to give the name of a column that will have the labels to display.
We then arrange the plots with the gtable_ld_grobs
function. One needs to specify in the labels_colname
parameter if the chromosomic positions plot was built with labels or
not. The title parameter is also required.
  l_ggs = list(snp_pos = gg_snp_pos, ld = gg_ld)
  gt_ld = gtable_ld_grobs(l_ggs, labels_colname = TRUE,
                          title = 'RNASeq correlations')
  grid::grid.draw(gt_ld)Finally we add the variables’ associations to our outcome of
interest. The ggplot_associations uses as input a dataframe
and accepts a parameter pvalue_colname to specify which
column holds the association values, by default ‘pvalues’. It also
requires a labels_colname parameter to specify the column
holding the labels, and a column named chromosome. The
linked_area parameter will affect how the associations are
plotted and it is recommended to be used in combination with the
diamonds parameter of ggplot_ld (i.e. TRUE for
small number of variables, approximately less than 40).
Additionally, the n_labels parameter controls the number
of highest association labels displayed (be default 10, the behavior can
be disabled by setting labels_colname to NULL), and the
nudge parameter will affect how the labels are displayed
(passed to geom_label_repel function of ‘ggrepel’
package).
  # let's imagine the middle region, HLA-B, is more associated with the outcome
  pvalues = c(runif(7, 1e-3, 1e-2), runif(6, 1e-8, 1e-6), runif(7, 1e-3, 1e-2))
  log10_pvals = -log10(pvalues)
  # we can reuse the df_snp_pos object
  df_snp_pos$pvalues = log10_pvals
  
  # add the chromosome column
  df_snp_pos$chromosome = 6
  gg_assocs = ggplot_associations(df_snp_pos, labels_colname = 'label',
                                  linked_area = TRUE, nudge = c(0, 0.5),
                                  n_labels = 12)We then arrange the plots with the gtable_ld_grobs
function as previously. We need to call the ggplot_snp_pos
function with the upper_subset parameter set to TRUE for it
to connect to the upper graph.
  gg_pos_biplot = ggplot_snp_pos(df_snp_pos, labels_colname = 'label',
                                 upper_subset = TRUE)
  # let's also say the middle region HLA-B is particularly correlated
  df_ld$R2[df_ld$SNP_A %in% 8:13 & df_ld$SNP_B %in% 8:13] = runif(15, 0.7, 0.9)
  gg_ld = ggplot_ld(df_ld)
  l_ggs = list(pos = gg_pos_biplot, ld = gg_ld, pval = gg_assocs)
  gt_ld = gtable_ld_associations_combine(l_ggs, diamonds = TRUE)We can extract a title and remove the horizontal axis text as follows.
  library(ggplot2)
  gg_assocs <- gg_assocs + theme(axis.text.x = element_blank())
  title <- gg_assocs$labels$x %>% gsub(' (Mbp)', '', ., fixed = TRUE) %>%
    paste('-', nrow(df_snp_pos), 'SNPs')
  gg_assocs <- gg_assocs + labs(title = title, x = NULL)
  
  l_ggs$pval = gg_assocs
  gt_ld = gtable_ld_associations_combine(l_ggs, diamonds = TRUE)
  grid::grid.draw(gt_ld)Let’s say we want to change the color of the associations area. We first need to identify which layer it corresponds to:
  gg_assocs$layers
#> [[1]]
#> geom_area: na.rm = FALSE, orientation = NA, outline.type = upper
#> stat_align: na.rm = FALSE, orientation = NA
#> position_stack 
#> 
#> [[2]]
#> mapping: xend = ~x 
#> geom_segment: arrow = NULL, arrow.fill = NULL, lineend = butt, linejoin = round, na.rm = FALSE
#> stat_identity: na.rm = FALSE
#> position_identity 
#> 
#> [[3]]
#> geom_label_repel: parse = FALSE, box.padding = 0.25, label.padding = 0.1, point.padding = 1e-06, label.r = 0.15, label.size = 0.1, min.segment.length = 0.5, arrow = NULL, na.rm = TRUE, force = 1, force_pull = 1, max.time = 0.5, max.iter = 10000, max.overlaps = 10, nudge_x = 0, nudge_y = 0.5, xlim = c(NA, NA), ylim = c(NA, NA), direction = both, seed = NA, verbose = FALSE
#> stat_identity: na.rm = TRUE
#> position_nudge_repel 
#> 
#> [[4]]
#> geom_point: na.rm = FALSE
#> stat_identity: na.rm = FALSE
#> position_identityThen we can change the color with:
And rebuild our object.
To change the lines and labels colors, parameters in the functions are available. You can either specify a single value or a vector of same length as your number of features.
  gg_pos_biplot = ggplot_snp_pos(df_snp_pos, labels_colname = 'label',
                                 upper_subset = TRUE, colors = '#101d6b')
  gg_assocs = ggplot_associations(df_snp_pos, labels_colname = 'label',
                                  linked_area = TRUE, nudge = c(0, 0.5),
                                  n_labels = 12, colors = '#101d6b')
  # extract title
  gg_assocs <- gg_assocs + theme(axis.text.x = element_blank())
  title <- gg_assocs$labels$x %>% gsub(' (Mbp)', '', ., fixed = TRUE) %>%
    paste('-', nrow(df_snp_pos), 'SNPs')
  gg_assocs <- gg_assocs + labs(title = title, x = NULL)
  # replace area color
  gg_assocs$layers[[1]]$aes_params$fill = "#0147ab"
  # rebuild
  l_ggs = list(pos = gg_pos_biplot, ld = gg_ld, pval = gg_assocs)
  gt_ld = gtable_ld_associations_combine(l_ggs, diamonds = TRUE)
  grid::grid.draw(gt_ld)In this dataset from the ‘gap’ package (Zhao, Kurt Hornik, and Ripley 2015), 206 SNPs from chromosome 5 (5q31) were measured from 129 Crohn’s disease patients and their 2 parents, totalling 387 samples.
  data('crohn')
  m_hla = crohn[, -(1:6)]
  m_ld = cor(m_hla) ^ 2
  # keep only upper triangle and reshape to data frame
  m_ld[lower.tri(m_ld, diag = TRUE)] = NA
  df_ld = reshape2::melt(m_ld) |> na.omit()
  # rename for SNPLinkage
  names(df_ld) = c('SNP_A', 'SNP_B', 'R2')
  # visualize with ggplot_ld
  gg_ld = ggplot_ld(df_ld)Compute p-values
  mlog10_pvals = chisq_pvalues(m_hla, crohn[, 'crohn'])
  df_pos = data.frame(probe_id = colnames(m_hla), pvalues = mlog10_pvals,
                      chromosome = 5)
  # if we don't have positions we can use byindex = TRUE
  gg_assocs = ggplot_associations(df_pos, byindex = TRUE, nudge = c(0, 0.5))Arrange with ‘cowplot’
Focus on most associated
  df_top_assocs = subset(df_pos, pvalues > quantile(pvalues, 0.9))
  gg_assocs = ggplot_associations(df_top_assocs, linked_area = TRUE,
                                  nudge = c(0, 0.5))
  df_ld = subset(df_ld, SNP_A %in% df_top_assocs$probe_id &
                        SNP_B %in% df_top_assocs$probe_id)
  gg_ld = ggplot_ld(df_ld)
   
  cowplot::plot_grid(gg_assocs, gg_ld, nrow = 2)  sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.1       snplinkage_1.2.0    GWASTools_1.50.0   
#> [4] Biobase_2.64.0      BiocGenerics_0.50.0
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.2.3               gdsfmt_1.40.0           httr2_1.0.1            
#>   [4] sandwich_3.1-0          biomaRt_2.60.1          rlang_1.1.4            
#>   [7] magrittr_2.0.3          compiler_4.4.1          RSQLite_2.3.7          
#>  [10] mgcv_1.9-1              reshape2_1.4.4          png_0.1-8              
#>  [13] vctrs_0.6.5             quantreg_5.98           stringr_1.5.1          
#>  [16] pkgconfig_2.0.3         shape_1.4.6.1           crayon_1.5.3           
#>  [19] SNPRelate_1.38.0        fastmap_1.2.0           backports_1.5.0        
#>  [22] dbplyr_2.5.0            XVector_0.44.0          labeling_0.4.3         
#>  [25] utf8_1.2.4              rmarkdown_2.28          UCSC.utils_1.0.0       
#>  [28] nloptr_2.1.1            MatrixModels_0.5-3      purrr_1.0.2            
#>  [31] bit_4.0.5               xfun_0.47               glmnet_4.1-8           
#>  [34] jomo_2.7-6              logistf_1.26.0          zlibbioc_1.50.0        
#>  [37] cachem_1.1.0            GenomeInfoDb_1.40.1     jsonlite_1.8.8         
#>  [40] progress_1.2.3          blob_1.2.4              highr_0.11             
#>  [43] pan_1.9                 parallel_4.4.1          broom_1.0.6            
#>  [46] prettyunits_1.2.0       R6_2.5.1                bslib_0.8.0            
#>  [49] stringi_1.8.4           boot_1.3-30             DNAcopy_1.78.0         
#>  [52] rpart_4.1.23            lmtest_0.9-40           jquerylib_0.1.4        
#>  [55] Rcpp_1.0.13             iterators_1.0.14        knitr_1.48             
#>  [58] zoo_1.8-12              IRanges_2.38.1          Matrix_1.7-0           
#>  [61] splines_4.4.1           nnet_7.3-19             tidyselect_1.2.1       
#>  [64] yaml_2.3.10             codetools_0.2-20        curl_5.2.2             
#>  [67] plyr_1.8.9              lattice_0.22-6          tibble_3.2.1           
#>  [70] withr_3.0.1             quantsmooth_1.70.0      KEGGREST_1.44.1        
#>  [73] evaluate_0.24.0         survival_3.6-4          BiocFileCache_2.12.0   
#>  [76] xml2_1.3.6              Biostrings_2.72.1       pillar_1.9.0           
#>  [79] filelock_1.0.3          mice_3.16.0             foreach_1.5.2          
#>  [82] stats4_4.4.1            generics_0.1.3          S4Vectors_0.42.1       
#>  [85] hms_1.1.3               munsell_0.5.1           scales_1.3.0           
#>  [88] minqa_1.2.8             GWASExactHW_1.2         glue_1.7.0             
#>  [91] tools_4.4.1             data.table_1.16.0       lme4_1.1-35.5          
#>  [94] SparseM_1.84-2          cowplot_1.1.3           grid_4.4.1             
#>  [97] tidyr_1.3.1             AnnotationDbi_1.66.0    colorspace_2.1-1       
#> [100] nlme_3.1-164            formula.tools_1.7.1     GenomeInfoDbData_1.2.12
#> [103] cli_3.6.3               rappdirs_0.3.3          fansi_1.0.6            
#> [106] dplyr_1.1.4             gtable_0.3.5            sass_0.4.9             
#> [109] digest_0.6.37           operator.tools_1.6.3    ggrepel_0.9.6          
#> [112] farver_2.1.2            memoise_2.0.1           htmltools_0.5.8.1      
#> [115] lifecycle_1.0.4         httr_1.4.7              mitml_0.4-5            
#> [118] bit64_4.0.5             MASS_7.3-60.2