check_ewce_genelist_inputs — check_ewce_genelist

check_ewce_genelist_inputs Is used to check that hits and bg gene lists passed to EWCE are setup correctly. Checks they are the appropriate length. Checks all hits genes are in bg. Checks the species match and if not reduces to 1:1 orthologs.

check_ewce_genelist_inputs(
  sct_data,
  hits,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  output_species = "human",
  geneSizeControl = FALSE,
  standardise = FALSE,
  verbose = TRUE
)

Arguments

sct_data	List generated using generate_celltype_data.
hits	List of gene symbols containing the target gene list. Will automatically be converted to human gene symbols if `geneSizeControl=TRUE`.
bg	List of gene symbols containing the background gene list (including hit genes). If `bg=NULL`, an appropriate gene background will be created automatically. if `geneSizeControl=TRUE`.
genelistSpecies	Species that `hits` genes came from (no longer limited to just "mouse" and "human").
sctSpecies	Species that `sct_data` came from (no longer limited to just "mouse" and "human").
output_species	Species to convert `sct_data` and `hits` to (Default: "human").
geneSizeControl	Whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies (Default: FALSE). If set to `TRUE`, then `hits` must be from humans. should be used rather than mouse.
standardise	If `input_species==output_species`, should the genes be standardised using map_genes?
verbose	Print messages.

Value

A list containing

hits: Array of MGI/HGNC gene symbols containing the target gene list.
bg: Array of MGI/HGNC gene symbols containing the background gene list.

Examples

ctd <- ewceData::ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
example_genelist <- ewceData::example_genelist()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache

# Called from "bootstrap_enrichment_test()" and "generate_bootstrap_plots()"
checkedLists <- EWCE::check_ewce_genelist_inputs(
    sct_data = ctd,
    hits = example_genelist,
    sctSpecies = "mouse",
    genelistSpecies = "human"
)
#> Checking gene list inputs.
#> Generating gene background for mouse x human ==> human
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: mmusculus
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: hsapiens
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: mmusculus
#> 1 organism identified from search: 10090
#> Gene table with 21,207 rows retrieved.
#> Returning all 21,207 genes from mmusculus.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: hsapiens
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from hsapiens.
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 21,207 genes extracted.
#> Converting mmusculus ==> hsapiens orthologs using: homologene
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: mmusculus
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: hsapiens
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in hsapiens.
#> Extracting genes from input_gene.
#> 17,355 genes extracted.
#> Extracting genes from ortholog_gene.
#> 17,355 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 131 genes that have multiple input_gene per ortholog_gene.
#> Dropping 498 genes that have multiple ortholog_gene per input_gene.
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#> 
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#>    4,725 / 21,207 (22%)
#> Total genes remaining after convert_orthologs :
#>    16,482 / 21,207 (78%)
#> 
#> =========== REPORT SUMMARY ===========
#> 16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion.
#> 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion.
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: hsapiens
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: hsapiens
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: hsapiens
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from hsapiens.
#> 
#> =========== REPORT SUMMARY ===========
#> 19,129 / 19,129 (100%) target_species genes remain after ortholog conversion.
#> 19,129 / 19,129 (100%) reference_species genes remain after ortholog conversion.
#> 16,482 intersect background genes used.
#> Standardising sct_data.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Converting gene list input to standardised human genes.