Also checks whether any gene names contain "Sep", "Mar" or "Feb". These should be checked for any suggestion that excel has corrupted the gene names.

fix_bad_mgi_symbols(
  exp,
  mrk_file_path = NULL,
  printAllBadSymbols = FALSE,
  as_sparse = TRUE,
  verbose = TRUE
)

Arguments

exp

An expression matrix where the rows are MGI symbols, or a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object.

mrk_file_path

Path to the MRK_List2 file which can be downloaded from www.informatics.jax.org/downloads/reports/index.html

printAllBadSymbols

Output to console all the bad gene symbols

as_sparse

Convert exp to sparse matrix.

verbose

Print messages.

Value

Returns the expression matrix with the rownames corrected and rows representing the same gene merged. If no corrections are necessary, input expression matrix is returned. If a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object was inputted this will be returned with the corrected expression matrix under counts.

Examples

# Load the single cell data cortex_mrna <- ewceData::cortex_mrna()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
# take a subset for speed cortex_mrna$exp <- cortex_mrna$exp[1:50, 1:5] cortex_mrna$exp <- fix_bad_mgi_symbols(cortex_mrna$exp)
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
#> 5 rows do not have proper MGI symbols
#> 2310042E22Rik, BC005764, C130030K03Rik, Stmn1-rs1, Gm9846
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
#> 0 poorly annotated genes are replicates of existing genes. These are:
#>
#> Converting to sparse matrix.
#> 3 rows should have been corrected by checking synonyms.
#> 2 rows STILL do not have proper MGI symbols.