Workflow Utilities
spacedeconv_utils.Rmd
spacedeconv
offers a variety of workflow helper
functions that streamline the overall analysis process. In the following
we will give an overview over the available functions.
preprocess
normalize
print_info
available_results
aggregate_results
addCustomAnnotation
annotate_spots
scale_cell_counts
subsetSCE
subsetSPE
library(spacedeconv)
## → checking spacedeconv environment and dependencies
1. preprocess
The function can be used to preprocess single-cell or spatial
datasets. The cuts of low and high UMI observations, removes noisy
expression and performs additional checks to streamline the
deconvolution analysis. The functions takes a SingleCellExperiment,
AnnData or Seurat and returns a processed SingleCellExperiment.
min_umi
or max_umi
parameters can be set to
improve data quality. The assay can be selected with the
assay
parameter. Additionally Mitochondria Genes can be
removed by setting remove_mito=TRUE
.
data("single_cell_data_3")
sce <- spacedeconv::preprocess(single_cell_data_3, min_umi = 500, assay = "counts", remove_mito = TRUE)
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ✔ parameter OK [65ms]
##
## ℹ Removing 8 observations with umi count below threshold
## ✔ Removed 8 observations with umi count below threshold [692ms]
##
## ℹ Removing 5862 variables with all zero expression
## ✔ Removed 5862 variables with all zero expression [685ms]
##
## ℹ Removing 13 mitochondria genes
## ✔ Removed 13 mitochondria genes [649ms]
##
## ℹ Removing duplicated genes
## ✔ Removed duplicated genes [51ms]
##
## ℹ Checking for ENSEMBL Identifiers
## ! Warning: ENSEMBL identifiers detected in gene names
## ℹ Checking for ENSEMBL Identifiersℹ Consider using Gene Names for first-generation deconvolution tools
## ℹ Checking for ENSEMBL Identifiers✔ Finished Preprocessing [7ms]
2. normalize
You can scale and normalize your single-cell or spatial data by
calling the normalize function
. The function takes a
method
parameter where cpm
or
logcpm
can be selected. The normalized data is stored as an
additional assay in the object.
sce <- spacedeconv::normalize(sce, method = "cpm", assay = "counts")
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ✔ parameter OK [11ms]
##
## ℹ Normalizing using cpm
## Warning in asMethod(object): sparse->dense coercion: allocating vector of size
## 1.4 GiB
## ✔ Finished normalization using cpm [4.4s]
##
## ℹ Please note the normalization is stored in an additional assay
3. print_info
You can obtain additional info about your dataset by calling
print_info
.
print_info(sce)
##
## ── Single Cell
## Assays: "counts" and "cpm"
## Genes: 23858
## → without expression: 0 (0%)
## Cells: 7978
## → without expression: 0 (0%)
## Umi count range: 447 - 74244
## ✔ Rownames set
## ✔ Colnames set
4. available_results
You can check what deconvolution results and additional annotation is
available in your data by calling available_resutls
. You
can set the method
parameter to the name of a deconvolution
tool to further filter the results if many quantifications where
performed.
# "deconv" contains DWLS results
available_results(deconv)
## [1] "dwls_B.cells" "dwls_CAFs" "dwls_Cancer.Epithelial"
## [4] "dwls_Endothelial" "dwls_Myeloid" "dwls_Normal.Epithelial"
## [7] "dwls_Plasmablasts" "dwls_PVL" "dwls_T.cells"
5. aggregate_results
You can aggregate fine-grained deconvolution results to a single
value by providing a list of deconvolution result names to the
cell_types
parameter. You can additionally set a new
name
and you have the option to remove
the
original fine-grained columns and just keep the aggregation.
aggregate_results(deconv, cell_types = c("dwls_Cancer.Epithelial", "dwls_Normal.Epithelial"), name = "dwls_Epithelial", remove = TRUE)
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ✔ parameter OK [4ms]
##
## ℹ Aggregating cell types
## ✔ Aggregated cell types [6ms]
##
## class: SpatialExperiment
## dim: 23542 1185
## metadata(0):
## assays(2): counts cpm
## rownames(23542): AL627309.1 AL627309.5 ... AC007325.4 AC007325.2
## rowData names(2): symbol ensembl
## colnames(1185): AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1 ...
## TTGTTTCATTAGTCTA-1 TTGTTTGTGTAAATTC-1
## colData names(12): in_tissue array_row ... dwls_T.cells dwls_Epithelial
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
6. addCustomAnnotation
This function helps adding a custom annotation vector to a SpatialExperiment object.
# newAnnotation is a vector containing custom annotation for each spot
spe <- addCustomAnnotation(spe, columnName = "ManualAnnotation", values = new_annotation)
7. annotate_spots
This function is able to annotate spots with TRUE / FALSE if you want to classify a specific subgroup of spots. It takes a list of spots that should be classified as TRUE, setting all other spots to FALSE.
# spots is a list of spot names.
spe <- annotate_spots(spe, spots, value_pos = TRUE, value_neg = FALSE, name = "customAnnotation")
8. scale_cell_counts
Most deconvolution tools compute relative cell fractions for spots.
If you have cell counts for each spot you can scale the relative values
to absolute cell counts using this function. The input parameters are
the column name that should be scaled value
and a vector of
absolute cell counts for each spot cell_counts
. You can
also set a new resName
.
# cell_counts_per_spot contains spot level absolute cell counts
spe_absolute <- scale_cell_counts(spe, value = "dwls_B.cells", cell_counts = cell_counts_per_spot, resName = "BCellsAbsolute")
9. subsetSCE
To improve resource requirements for deconvolution computation you
can reduce your input scRNA-seq reference size by subsetting. The
functions requires your input sce
object, the column name
containing the cell-type annotation cell_type_col
. You can
specify the subsetting scenario scenario
as one of “mirror”
or “even”. The mirror scenario keeps the same cell-type proportions as
in the input data but reduces the overall cell number. The even scenario
selects the same number of cells for each cell-type. Specify the number
of cells you want after subsetting using the ncells
parameter. In case notEnough
cells are available for a
cell-type to match the required number according to the scenario you can
set this parameter to “asis” to keep all remaining cells or “remove” the
cell-type completely.
subset <- subsetSCE(sce, cell_type_col = "celltype_major", scenario = "mirror", ncells = 500)
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ℹ Set seed to 12345
## ℹ testing parameter✔ parameter OK [10ms]
##
## ℹ extracting up to 500 cells
## ✔ extracting up to 500 cells [98ms]
##
## ℹ extracted 501 cells
## ✔ extracted 501 cells [10ms]