Bootstrap cell type enrichment test for transcriptome data

ewce_expression_data takes a differential gene expression (DGE) results table and determines the probability of cell type enrichment in the up- and down- regulated genes.

ewce_expression_data(
  sct_data,
  annotLevel = 1,
  tt,
  sortBy = "t",
  thresh = 250,
  reps = 100,
  ttSpecies = "mouse",
  sctSpecies = "mouse"
)

Arguments

sct_data	List generated using generate_celltype_data.
annotLevel	An integer indicating which level of `sct_data` to analyse (Default: 1).
tt	Differential expression table. Can be output of topTable function. Minimum requirement is that one column stores a metric of increased/decreased expression (i.e. log fold change, t-statistic for differential expression etc) and another contains gene symbols.
sortBy	Column name of metric in `tt` which should be used to sort up- from down- regulated genes (Default: "t").
thresh	The number of up- and down- regulated genes to be included in each analysis (Default: 250).
reps	Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
ttSpecies	The species the differential expression table was generated from.
sctSpecies	Species that `sct_data` came from (no longer limited to just "mouse" and "human").

Value

A list containing five data frames:

results: dataframe in which each row gives the statistics (p-value, fold change and number of standard deviations from the mean) associated with the enrichment of the stated cell type in the gene list. An additional column *Direction* stores whether it the result is from the up or downregulated set.
hit.cells.up: vector containing the summed proportion of expression in each cell type for the target list
hit.cells.down: vector containing the summed proportion of expression in each cell type for the target list#'
bootstrap_data.up: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists
bootstrap_data.down: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists

Examples

# Load the single cell data
ctd <- ewceData::ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache

# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5
annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)

# Load the top table
tt_alzh <- ewceData::tt_alzh()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache

tt_results <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh,
    annotLevel = 1,
    thresh = thresh,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)
#> Returning 10,854 unique genes from the user-supplied bg.
#> Standardising CellTypeDataset
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Checking gene list inputs.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Standardising sct_data.
#> Converting gene list input to standardised human genes.
#> Running without gene size control.
#> 6 hit genes remain after filtering.
#> Computing summed proportions.
#> Testing for enrichment in 7 cell types...
#> Sorting results by p-value.
#> Computing BH-corrected q-values.
#> 2 significant cell type enrichment results @ q<0.05 : 
#>            CellType annotLevel p fold_change sd_from_mean q
#> 1 endothelial-mural          1 0    2.393329     7.415055 0
#> 2         microglia          1 0    2.545384     2.513614 0
#> Returning 10,854 unique genes from the user-supplied bg.
#> Standardising CellTypeDataset
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Checking gene list inputs.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Standardising sct_data.
#> Converting gene list input to standardised human genes.
#> Running without gene size control.
#> 5 hit genes remain after filtering.
#> Computing summed proportions.
#> Testing for enrichment in 7 cell types...
#> Sorting results by p-value.
#> Computing BH-corrected q-values.
#> 3 significant cell type enrichment results @ q<0.05 : 
#>        CellType annotLevel p fold_change sd_from_mean q
#> 1  pyramidal SS          1 0    2.397246    12.001535 0
#> 2  interneurons          1 0    1.932982     3.329294 0
#> 3 pyramidal CA1          1 0    1.407643     1.766998 0