drop_uninformative_genes first removes genes that do not have 1:1 orthologs with humans.

drop_uninformative_genes(
  exp,
  level2annot,
  DGE_method = "limma",
  min_variance_decile = NULL,
  adj_pval_thresh = 1e-05,
  convert_orths = FALSE,
  input_species = NULL,
  output_species = "human",
  non121_strategy = "drop_both_species",
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  return_sce = FALSE,
  no_cores = 1,
  verbose = TRUE,
  ...
)

Arguments

exp

Expression matrix with gene names as rownames.

level2annot

Array of cell types, with each sequentially corresponding a column in the expression matrix

DGE_method

Which method to use for the Differential Gene Expression (DGE) step.

min_variance_decile

If min_variance_decile!=NULL, calculates the variance of the mean gene expression across `level2annot` (i.e. cell types), and then removes any genes that are below min_variance_decile (on a 0-1 scale).

adj_pval_thresh

Minimum differential expression significance that a gene must demonstrate across level2annot (i.e. cell types).

convert_orths

If input_species!=output_species and convert_orths=TRUE, will drop genes without 1:1 output_species orthologs and then convert exp gene names to those of output_species.

input_species

Which species the gene names in exp come from.

output_species

Which species' genes names to convert exp to.

non121_strategy

How to handle genes that don't have 1:1 mappings between input_species:output_species. Options include:

  • "drop_both_species" or "dbs" or 1 :
    Drop genes that have duplicate mappings in either the input_species or output_species
    (DEFAULT).

  • "drop_input_species" or "dis" or 2 :
    Only drop genes that have duplicate mappings in the input_species.

  • "drop_output_species" or "dos" or 3 :
    Only drop genes that have duplicate mappings in the output_species.

  • "keep_both_species" or "kbs" or 4 :
    Keep all genes regardless of whether they have duplicate mappings in either species.

  • "keep_popular" or "kp" or 5 :
    Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs.

  • "sum","mean","median","min" or "max" :
    When gene_df is a matrix and gene_output="rownames", these options will aggregate many-to-one gene mappings (input_species-to-output_species) after dropping any duplicate genes in the output_species.

as_sparse

Convert exp to sparse matrix.

as_DelayedArray

Convert exp to DelayedArray for scalable processing.

return_sce

Whether to return the filtered results as an expression matrix or a SingleCellExperiment.

no_cores

Number of cores to parallelise across. Set to NULL to automatically optimise.

verbose

Print messages.

...

Additional arguments to be passed to the selected DGE method.

Value

exp Expression matrix with gene names as row names.

Details

drop_uninformative_genes then drops genes from an SCT expression matrix if they do not significantly vary between any cell types. Makes this decision based on use of an ANOVA (implemented with limma). If the F-statistic for variation amongst type2 annotations is less than a strict p-threshold, then the gene is dropped.

A very fast alternative to DGE methods is filtering by min_variance_decile, which selects only genes with the top variance deciles.

Examples

cortex_mrna <- ewceData::cortex_mrna()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
# Use only a subset of genes to keep the example quick cortex_mrna$exp <- cortex_mrna$exp[1:300, ] ## Convert orthologs at the same time exp2_orth <- drop_uninformative_genes( exp = cortex_mrna$exp, level2annot = cortex_mrna$annot$level2class, input_species = "mouse" )
#> Check 300Check 3005
#> + 1 core(s) assigned as workers ( 63 reserved).
#> Converting to sparse matrix.
#> Checking for non-expressed genes.
#> Checking for cells with no expressed genes.
#> DGE:: Limma...
#> 3 / 300 genes dropped @ DGE adj_pval_thresh < 1e-05
#> Time difference of 0.1156094 secs