Generate CellTypeData (CTD) file — generate_celltype

generate_celltype_data takes gene expression data and cell type annotations and creates CellTypeData (CTD) files which contain matrices of mean expression and specificity per cell type.

generate_celltype_data(
  exp,
  annotLevels,
  groupName,
  no_cores = 1,
  savePath = tempdir(),
  file_prefix = "ctd",
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  normSpec = FALSE,
  convert_orths = FALSE,
  input_species = "mouse",
  output_species = "human",
  non121_strategy = "drop_both_species",
  force_new_file = TRUE,
  specificity_quantiles = TRUE,
  numberOfBins = 40,
  dendrograms = TRUE,
  return_ctd = FALSE,
  verbose = TRUE,
  ...
)

Arguments

exp	Numerical matrix with row for each gene and column for each cell. Row names are gene symbols. Column names are cell IDs which can be cross referenced against the annot data frame.
annotLevels	List with arrays of strings containing the cell type names associated with each column in `exp`.
groupName	A human readable name for referring to the dataset being
no_cores	Number of cores that should be used to speedup the computation. NOTE: Use `no_cores=1` when using this package in windows system.
savePath	Directory where the CTD file should be saved.
file_prefix	Prefix to add to saved CTD file name.
as_sparse	Convert `exp` to a sparse `Matrix`.
as_DelayedArray	Convert `exp` to `DelayedArray`.
normSpec	Boolean indicating whether specificity data should be transformed to a normal distribution by cell type, giving equivalent scores across all cell types.
convert_orths	If `input_species!=output_species` and `convert_orths=TRUE`, will drop genes without 1:1 `output_species` orthologs and then convert `exp` gene names to those of `output_species`.
input_species	The species that the `exp` dataset comes from.
output_species	Species to convert `exp` to (Default: "human").
non121_strategy	How to handle genes that don't have 1:1 mappings between `input_species`:`output_species`. Options include: `"drop_both_species" or "dbs" or 1` : Drop genes that have duplicate mappings in either the `input_species` or `output_species` (DEFAULT). `"drop_input_species" or "dis" or 2` : Only drop genes that have duplicate mappings in the `input_species`. `"drop_output_species" or "dos" or 3` : Only drop genes that have duplicate mappings in the `output_species`. `"keep_both_species" or "kbs" or 4` : Keep all genes regardless of whether they have duplicate mappings in either species. `"keep_popular" or "kp" or 5` : Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs. `"sum","mean","median","min" or "max"` : When `gene_df` is a matrix and `gene_output="rownames"`, these options will aggregate many-to-one gene mappings (`input_species`-to-`output_species`) after dropping any duplicate genes in the `output_species`.
force_new_file	If a file of the same name as the one being created already exists, overwrite it.
specificity_quantiles	Compute specificity quantiles. Recommended to set to `TRUE`.
numberOfBins	Number of quantile 'bins' to use (40 is recommended)
dendrograms	Add dendrogram plots
return_ctd	Return the CTD object in a list along with the file name, instead of just the file name.
verbose	Print messages.
...	Additional arguments passed to convert_orthologs.

Value

File names for the saved CellTypeData (CTD) files.

Examples

# Load the single cell data
cortex_mrna <- ewceData::cortex_mrna()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
# Use only a subset to keep the example quick
expData <- cortex_mrna$exp[1:100, ]
l1 <- cortex_mrna$annot$level1class
l2 <- cortex_mrna$annot$level2class
annotLevels <- list(l1 = l1, l2 = l2)
fNames_ALLCELLS <- EWCE::generate_celltype_data(
    exp = expData,
    annotLevels = annotLevels,
    groupName = "allKImouse"
)
#> +  1  core(s) assigned as workers ( 63  reserved).
#> Converting to sparse matrix.
#> + Calculating normalized mean expression.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> + Calculating normalized specificity.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Loading required namespace: ggdendro
#> + Saving results ==>  /tmp/Rtmp5DLuZU/ctd_allKImouse.rda