singler package¶
- class singler.TrainedIntegratedReferences(ptr, ref_labels, ref_names)[source]¶
Bases:
objectIntegrated references, typically constructed by
train_integrated(). This is intended for advanced users only and should not be serialized.- __annotate_func__ = None¶
- __annotations_cache__ = {}¶
- __firstlineno__ = 13¶
- __static_attributes__ = ('_labels', '_names', '_ptr')¶
- class singler.TrainedSingleReference(ptr, full_data, full_label_codes, labels, features, markers)[source]¶
Bases:
objectA prebuilt reference object, typically created by
train_single(). This is intended for advanced users only and should not be serialized.- __annotate_func__ = None¶
- __annotations_cache__ = {}¶
- __firstlineno__ = 17¶
- __static_attributes__ = ('_features', '_full_data', '_full_label_codes', '_labels', '_markers', '_ptr')¶
- marker_subset(indices_only=False)[source]¶
- Parameters:
indices_only (
bool) – Whether to return the markers as indices intofeatures, or as a list of feature identifiers.- Return type:
ndarray|StringList- Returns:
If
indices_only = False, a list of feature identifiers for the markers.If
indices_only = True, a NumPy array containing the integer indices of features infeaturesthat were chosen as markers.
- property markers: dict[Any, dict[Any, StringList]]¶
Markers for every pairwise comparison between labels.
- num_markers()[source]¶
- Return type:
- Returns:
Number of markers to be used for classification. This is the same as the size of the array from
marker_subset().
- singler.aggregate_reference(ref_data, ref_labels, ref_features, num_centers=None, power=0.5, num_top=1000, rank=20, assay_type='logcounts', subset_row=None, check_missing=True, num_threads=1)[source]¶
Aggregate reference samples for a given label by using vector quantization to average their count profiles. The idea is to reduce the size of single-cell reference datasets so as to reduce the computation time of
train_single(). We perform k-means clustering for all cells in each label and aggregate all cells within each k-means cluster. (More specifically, the clustering is done on the principal components generated from the highly variable genes to better capture the structure within each label.) This yields one or more profiles per label, reducing the number of separate observations while preserving some level of intra-label heterogeneity.- Parameters:
ref_data (
Any) – Floating-point matrix of reference expression values, usually containing log-expression values. Alternatively, aSummarizedExperimentobject containing such a matrix.ref_labels (
Sequence) – Array of length equal to the number of columns inref_data, containing the labels for each cell.ref_features (
Sequence) – Sequence of identifiers for each feature, i.e., row inref_data.num_centers (
int|None) – Maximum number of aggregated profiles to produce for each label withcluster_kmeans(). IfNone, a suitable number of profiles is automatically chosen.power (
float) – Number between 0 and 1 indicating how much aggregation should be performed. Specifically, we set the number of clusters toX**powerwhereXis the number of cells assigned to that label. Ignored ifnum_centersis notNone.num_top (
int) – Number of highly variable genes to use for PCA prior to clustering, seechoose_highly_variable_genes().rank (
int) – Number of principal components to use during clustering, seerun_pca().assay_type (
int|str) – Integer or string specifying the assay ofref_datacontaining the relevant expression matrix, ifrefis aSummarizedExperimentobject.subset_row (
Sequence|None) – Array of row indices specifying the rows ofref_datato use for clustering. IfNone, no additional filtering is performed. Note that even ifsubset_rowis provided, aggregation is still performed on all genes.check_missing (
bool) – Whether to check for and remove rows with missing (NaN) values fromref_data.num_threads (
int) – Number of threads to use.
- Return type:
- Returns:
A
SummarizedExperimentcontaining the aggregated values in its first assay. The label for each aggregated profile is stored in the column data.
Examples
>>> # Mock up some log-expression data for a reference dataset. >>> import singler >>> ref = singler.mock_reference_data(num_replicates=50) >>> labels = ref.get_column_data()["label"] >>> import scranpy >>> ref = scranpy.normalize_rna_counts_se(ref) >>> >>> # Aggregation at different resolutions: >>> aggr0_5 = singler.aggregate_reference(ref, labels, ref.get_row_names(), power=0.5) >>> print(aggr0_5) >>> aggr0 = singler.aggregate_reference(ref, labels, ref.get_row_names(), power=0) # i.e., centroids only. >>> print(aggr0) >>> aggr1 = singler.aggregate_reference(ref, labels, ref.get_row_names(), power=1) # i.e., no aggregation. >>> print(aggr1)
- singler.annotate_integrated(test_data, ref_data, ref_labels, test_features=None, ref_features=None, test_assay_type=0, ref_assay_type='logcounts', test_check_missing=False, ref_check_missing=True, train_single_args={}, classify_single_args={}, train_integrated_args={}, classify_integrated_args={}, num_threads=1)[source]¶
Annotate a single-cell expression dataset based on the correlation of each cell to profiles in multiple labelled references. The results from each reference are then combined across references.
- Parameters:
test_data (
Any) –A matrix-like object representing the test dataset, where rows are features and columns are samples (usually cells). Entries may be expression values of any kind, only the ranking within each column is used.
Alternatively, a
SummarizedExperimentcontaining such a matrix in one of its assays.ref_data (
dict|Sequence|NamedList) –Sequence consisting of one or more of the following:
A matrix-like object representing the reference dataset, where rows are features and columns are samples. Entries should be expression values, usually log-transformed (see comments for the
ref_dataargument intrain_single()).A
SummarizedExperimentobject containing such a matrix in its assays.
Alternatively, a dictionary where each value is a matrix-like object or
SummarizedExperimentand each key is the name of the reference.Alternatively, a (possibly named)
NamedListwhere each entry is a matrix-like object orSummarizedExperiment.ref_labels (
dict|Sequence|NamedList) –Sequence of the same length as
ref_data. Thei-th entry should be a sequence of length equal to the number of columns ofref_data[i], containing the label associated with each column.Alternatively, a dictionary where each value is a sequence of column labels and each key is the name of the reference.
Alternatively, a
NamedListwhere each entry is a sequence of column names. If theNamedListis named, the names should be the same as those ofref_data.test_features (
Sequence|None) – Sequence of length equal to the number of rows intest_data, containing the feature identifier for each row. AlternativelyNone, to use the row names of the experiment as features.ref_features (
dict|Sequence|NamedList|None) –Sequence of the same length as
ref_data. Thei-th entry should be a sequence of length equal to the number of rows ofref_data[i], containing the feature identifier associated with each row. Alternatively, thei-th entry may be set toNoneto use the row names of the experiment as features.Alternatively, a dictionary where each value is a sequence of feature names and each key is the name of the reference.
Alternatively, a
NamedListwhere each entry is a sequence of feature names. If theNamedListis named, the names should be the same as those ofref_data.If
None, the row names are used for all references, assumingref_dataonly containsSummarizedExperimentobjects.test_assay_type (
str|int) – Assay oftest_datacontaining the expression matrix, iftest_datais aSummarizedExperiment.test_check_missing (
bool) – Whether to check for and remove missing (i.e., NaN) values from the test dataset.ref_assay_type (
str|int) – Assay containing the expression matrix for any entry ofref_datathat is aSummarizedExperiment.ref_check_missing (
bool) – Whether to check for and remove missing (i.e., NaN) values from the reference datasets.train_single_args (
dict) – Further arguments to pass totrain_single().classify_single_args (
dict) – Further arguments to pass toclassify_single().train_integrated_args (
dict) – Further arguments to pass totrain_integrated().classify_integrated_args (
dict) – Further arguments to pass toclassify_integrated().num_threads (
int) – Number of threads to use for the various steps.
- Return type:
NamedList- Returns:
A
NamedListcontaining the following elements.”single”: a
NamedListofBiocFrameobjects, containing the per-reference classification results. Each entry is equivalent to runningannotate_single()fortest_dataon the corresponding reference separately. Ifref_datawas named, theNamedListwill also be named.A
BiocFramefromclassify_integrated(), containing the integrated results across references.
Examples
>>> # Mocking up data. >>> import singler >>> ref = singler.mock_reference_data(num_replicates=8) >>> ref1 = ref[:,[True, False] * int(ref.shape[1]/2)] >>> ref2 = ref[:,[False, True] * int(ref.shape[1]/2)] >>> >>> cd2 = ref2.get_column_data() >>> label2 = [l.lower() for l in cd2["label"]] # converting to lower-case for some variety. >>> cd2.set_column("label", label2, in_place=True) >>> ref2.set_column_data(cd2, in_place=True) >>> >>> import scranpy >>> ref1 = scranpy.normalize_rna_counts_se(ref1) >>> ref2 = scranpy.normalize_rna_counts_se(ref2) >>> >>> # Classifying within and across references. >>> test = singler.mock_test_data(ref) >>> full_res = singler.annotate_integrated( >>> test, >>> [ref1, ref2], >>> [ref1.get_column_data()["label"], ref2.get_column_data()["label"]] >>> ) >>> >>> print(full_res["single"][0]) # i.e., classification against ref1 >>> print(full_res["single"][1]) # i.e., classification against ref2 >>> print(full_res["integrated"]) # combined classification results
- singler.annotate_single(test_data, ref_data, ref_labels, test_features=None, ref_features=None, test_assay_type=0, ref_assay_type=0, test_check_missing=False, ref_check_missing=True, train_args={}, classify_args={}, num_threads=1)[source]¶
Annotate a single-cell expression dataset based on the correlation of each cell to profiles in a labelled reference.
- Parameters:
test_data (
Any) –A matrix-like object representing the test dataset, where rows are features and columns are samples (usually cells). Entries can be expression values of any kind; only the ranking within each column will be used.
Alternatively, a
SummarizedExperimentcontaining such a matrix in one of its assays.ref_data (
Any) –A matrix-like object representing the reference dataset, where rows are features and columns are samples. Entries should be expression values, usually log-transformed (see comments for the
refargument intrain_single()).Alternatively, a
SummarizedExperimentcontaining such a matrix in one of its assays.ref_labels (
Sequence) – Sequence of length equal to the number of columns ofref_data, containing the label associated with each column.test_features (
Sequence|None) – Sequence of length equal to the number of rows intest_data, containing the feature identifier for each row. AlternativelyNone, to use the row names of the experiment as features.ref_features (
Sequence|None) – Sequence of length equal to the number of rows ofref_data, containing the feature identifier for each row. AlternativelyNone, to use the row names of the experiment as features.test_assay_type (
str|int) – Assay containing the expression matrix, iftest_datais aSummarizedExperiment.ref_assay_type (
str|int) – Assay containing the expression matrix, ifref_datais aSummarizedExperiment.test_assay_type – Whether to remove rows with missing values from the test dataset.
ref_assay_type – Whether to remove rows with missing values from the reference dataset.
train_args (
dict) – Further arguments to pass totrain_single().classify_args (
dict) – Further arguments to pass toclassify_single().num_threads (
int) – Number of threads to use for the various steps.
- Return type:
- Returns:
A
BiocFrameof labelling results, seeclassify_single()for details.
Examples
>>> # Mocking up data with log-normalized expression values: >>> import singler >>> ref = singler.mock_reference_data() >>> test = singler.mock_test_data(ref) >>> >>> import scranpy >>> ref = scranpy.normalize_rna_counts_se(ref) >>> test = scranpy.normalize_rna_counts_se(test) >>> >>> # Running the classification: >>> pred = singler.annotate_single(test, ref, ref_labels=ref.get_column_data()["label"]) >>> print(pred) >>> import collections >>> print(collections.Counter(zip(pred["best"], test.get_column_data()["label"])))
- singler.classify_integrated(test_data, results, integrated_prebuilt, assay_type=0, quantile=0.8, use_fine_tune=True, fine_tune_threshold=0.05, num_threads=1)[source]¶
Integrate classification results across multiple references for a single test dataset.
- Parameters:
test_data (
Any) –A matrix-like object where each row is a feature and each column is a test sample (usually a cell), containing expression values. Normalized and/or transformed expression values are also acceptable as only the ranking is used within this function.
Alternatively, a
SummarizedExperimentcontaining such a matrix in one of its assays.results (
dict|Sequence|NamedList) –Sequence of
BiocFrameobjects. EachBiocFrameshould contain classification results generated by runningclassify_single()ontest_datawith one of the references. References should be in the same order as that used to constructintegrated_prebuilt.Alternatively, a dictionary where each value is a
BiocFrameand each key is the name of a reference. Names should be the same as those used to constructintegrated_prebuilt.Alternatively, a
NamedListwhere each entry is aBiocFrame. Names should be the same as those used to constructintegrated_prebuilt.integrated_prebuilt (
TrainedIntegratedReferences) – Integrated reference object, constructed withtrain_integrated().assay_type (
str|int) – Assay containing the expression matrix, iftest_datais aSummarizedExperiment.quantile (
float) – Quantile of the correlation distribution for computing the score for each label. Larger values increase sensitivity of matches at the expense of similarity to the average behavior of each label.use_fine_tune (
bool) – Whether fine-tuning should be performed. This improves accuracy for distinguishing between references with similar best labels but requires more computational work.fine_tune_threshold (
float) – Maximum difference from the maximum correlation to use in fine-tuning. All references above this threshold are used for another round of fine-tuning.num_threads (
int) – Number of threads to use during classification.
- Return type:
- Returns:
A
BiocFramewith one row per column intest_data. It contains the following columns:best_label: list containing the assigned label in the best reference for each test sample.best_reference: an integer NumPy array containing the index of the best reference for each test sample.Each index refers to a position in
resultsorref_prebuiltintrain_integrated().
scores: a nestedBiocFramewhere each column corresponds to a reference. Each column is another nestedBiocFrameand contains the scores across labels for its corresponding reference.delta: a double-precision NumPy array containing the difference in scores between the best and second-best reference.
Examples
>>> # Mocking up data. >>> import singler >>> ref = singler.mock_reference_data(num_replicates=8) >>> ref1 = ref[:,[True, False] * int(ref.shape[1]/2)] >>> ref2 = ref[:,[False, True] * int(ref.shape[1]/2)] >>> >>> cd2 = ref2.get_column_data() >>> label2 = [l.lower() for l in cd2["label"]] # converting to lower-case for some variety. >>> cd2.set_column("label", label2, in_place=True) >>> ref2.set_column_data(cd2, in_place=True) >>> >>> import scranpy >>> ref1 = scranpy.normalize_rna_counts_se(ref1) >>> ref2 = scranpy.normalize_rna_counts_se(ref2) >>> >>> # Performing classification within each reference. >>> test = singler.mock_test_data(ref) >>> >>> built1 = singler.train_single(ref1, ref1.get_column_data()["label"], ref1.get_row_names()) >>> res1 = singler.classify_single(test, built1) >>> built2 = singler.train_single(ref2, ref2.get_column_data()["label"], ref2.get_row_names()) >>> res2 = singler.classify_single(test, built2) >>> >>> # Combining results across references. >>> in_built = singler.train_integrated(test.get_row_names(), [built1, built2]) >>> in_res = singler.classify_integrated(test, [res1, res2], in_built) >>> print(in_res)
- singler.classify_single(test_data, ref_prebuilt, assay_type=0, quantile=0.8, use_fine_tune=True, fine_tune_threshold=0.05, num_threads=1)[source]¶
Classify a test dataset against a reference by assigning labels from the latter to each column of the former using the SingleR algorithm.
- Parameters:
test_data (
Any) –A matrix-like object where each row is a feature and each column is a test sample (usually a single cell), containing expression values. Each entry may be an expression value of any kind; only the ranking within each column is used by this function.
Alternatively, a
SummarizedExperimentcontaining such a matrix in one of its assays.ref_prebuilt (
TrainedSingleReference) – A pre-built reference created withtrain_single().assay_type (
str|int) – Assay containing the expression matrix, iftest_datais aSummarizedExperiment.quantile (
float) – Quantile of the correlation distribution for computing the score for each label. Larger values increase sensitivity of matches at the expense of similarity to the average behavior of each label.use_fine_tune (
bool) – Whether fine-tuning should be performed. This improves accuracy for distinguishing between similar labels but requires more computational work.fine_tune_threshold (
float) – Maximum difference from the maximum correlation to use in fine-tuning. All labels above this threshold are used for another round of fine-tuning.num_threads (
int) – Number of threads to use during classification.
- Return type:
- Returns:
A
BiocFramewith one row per column oftest_data. This contains the following columns:best: a list containing the assigned label for each sample intest_data.scores: a nestedBiocFramewhere each column corresponds to a reference label and contains the score for that label across all test samples.delta: a double-precision NumPy array containing the difference in scores between the best and second-best label.
The metadata contains
markers, a list of the markers from each pairwise comparison between labels; andused, a list containing the union of markers from all comparisons.
Examples
>>> # Mocking up data. >>> import singler >>> ref = singler.mock_reference_data(num_replicates=8) >>> test = singler.mock_test_data(ref) >>> >>> import scranpy >>> ref = scranpy.normalize_rna_counts_se(ref) >>> >>> # Training and applying a classifier. >>> built = singler.train_single(ref, ref.get_column_data()["label"], ref.get_row_names()) >>> res = singler.classify_single(test, built) >>> print(res) >>> import collections >>> print(collections.Counter(zip(res["best"], test.get_column_data()["label"])))
- singler.get_classic_markers(ref_data, ref_labels, ref_features, assay_type='logcounts', check_missing=True, num_de=None, num_threads=1)[source]¶
Compute markers from a reference using the classic SingleR algorithm. This is typically done for reference datasets derived from replicated bulk transcriptomic experiments.
- Parameters:
A matrix-like object containing the log-normalized expression values of a reference dataset. Each column is a sample and each row is a feature.
Alternatively, this can be a
SummarizedExperimentcontaining a matrix-like object in one of its assays.Alternatively, a list of such matrices or
SummarizedExperimentobjects, typically for multiple batches of the same reference. It is assumed that different batches exhibit at least some overlap in theirref_featuresandref_labels.ref_labels (
Sequence|list[Sequence]) –If
ref_datais not a list,ref_labelsshould be a sequence of length equal to the number of columns ofref_data. Each entry should be a label (usually a string) for each column ofref_data.If
ref_datais a list,ref_labelsshould also be a list of the same length. Each entry should be a sequence of length equal to the number of columns of the corresponding entry ofref_dataand should contain the column labels.ref_features (
Sequence|list[Sequence]) –If
ref_datais not a list,ref_featuresshould be a sequence of length equal to the number of rows ofref_data. Each entry should be a feature name (usually a string) for each row.If
ref_datais a list,ref_featuresshould also be a list of the same length. Each entry should be a sequence of length equal to the number of rows of the corresponding entry ofrefand should containg the feature names.assay_type (
str|int) – Name or index of the assay of interest, ifrefis or containsSummarizedExperimentobjects.check_missing (
bool) – Whether to check for and remove rows with missing (NaN) values in the reference matrices. This can be set toFalseif it is known that no NaN values exist.num_de (
int|None) – Number of differentially expressed genes to use as markers for each pairwise comparison between labels. IfNone, an appropriate number of genes is automatically determined.num_threads (
int) – Number of threads to use for the calculations.
- Return type:
- Returns:
A dictionary of dictionary of lists containing the markers for each pairwise comparison between labels, i.e.,
markers[a][b]contains the upregulated markers for labelaover labelb.
Examples
>>> import singler >>> ref = singler.mock_reference_data() >>> import scranpy >>> ref = scranpy.normalize_rna_counts_se(ref) >>> classical = singler.get_classic_markers(ref, ref.get_column_data()["label"], ref.get_row_names()) >>> classical["A"]["B"]
- singler.mock_reference_data(num_labels=5, num_replicates=4, num_genes=1000, prop_markers=0.5)[source]¶
Mock up some reference data for the various examples in the singler package. The simulated data is very simple and should not be used for performance comparisons.
- Parameters:
- Return type:
- Returns:
A
SummarizedExperimentcontaining a count matrix and per-column labels.
Examples
>>> import singler >>> ref = singler.mock_reference_data() >>> print(ref)
- singler.mock_test_data(mock_ref, num_cells=100)[source]¶
Mock up some test data for the various examples in the singler package. The simulated data is very simple and should not be used for performance comparisons.
- Parameters:
mock_ref (
SummarizedExperiment) – Mock reference data generated bymock_reference_data().num_cells (
int) – Number of cells for which to generate test data.
- Return type:
- Returns:
A
SummarizedExperimentcontaining a count matrix and per-cell labels.
Examples
>>> import singler >>> ref = singler.mock_reference_data() >>> test = singler.mock_test_data(ref) >>> print(test)
- singler.number_of_classic_markers(num_labels)[source]¶
Compute the number of markers to detect for a given number of labels, using the classic SingleR marker detection algorithm.
Examples
>>> import singler >>> singler.number_of_classic_markers(15)
- singler.train_integrated(test_features, ref_prebuilt, warn_lost=True, num_threads=1)[source]¶
Build a set of integrated references for classification of a test dataset.
- Parameters:
test_features (
Sequence) – Sequence of features for the test dataset.ref_prebuilt (
dict|Sequence|NamedList) – List of prebuilt references, typically created by callingtrain_single().warn_lost (
bool) – Whether to emit a warning if the markers for each reference are not all present in all references.num_threads (
int) – Number of threads.
- Return type:
- Returns:
An integrated reference object, for classification with
classify_integrated().
Examples
>>> # Mocking up data. >>> import singler >>> ref = singler.mock_reference_data(num_replicates=8) >>> ref1 = ref[:,[True, False] * int(ref.shape[1]/2)] >>> ref2 = ref[:,[False, True] * int(ref.shape[1]/2)] >>> >>> cd2 = ref2.get_column_data() >>> label2 = [l.lower() for l in cd2["label"]] # converting to lower case for some variety. >>> cd2.set_column("label", label2, in_place=True) >>> ref2.set_column_data(cd2, in_place=True) >>> >>> import scranpy >>> ref1 = scranpy.normalize_rna_counts_se(ref1) >>> ref2 = scranpy.normalize_rna_counts_se(ref2) >>> >>> # Building a classifier for each reference. >>> test = singler.mock_test_data(ref) >>> built1 = singler.train_single(ref1, ref1.get_column_data()["label"], ref1.get_row_names()) >>> built2 = singler.train_single(ref2, ref2.get_column_data()["label"], ref2.get_row_names()) >>> >>> # Creating an integrated classifier across references. >>> in_built = singler.train_integrated(test.get_row_names(), {"first": built1, "second": built2}) >>> in_built.reference_labels >>> in_built.reference_names
- singler.train_single(ref_data, ref_labels, ref_features, test_features=None, assay_type='logcounts', restrict_to=None, check_missing=True, markers=None, marker_method='classic', num_de=None, hint_sce=True, marker_args={}, aggregate=False, aggregate_args={}, nn_parameters=<knncolle.vptree.VptreeParameters object>, num_threads=1)[source]¶
Build a single reference dataset in preparation for classification.
- Parameters:
ref_data (
Any) –A matrix-like object where rows are features, columns are reference profiles, and each entry is the expression value. If
markersis not provided, expression should be normalized and log-transformed in preparation for marker prioritization via differential expression analyses. Otherwise, any expression values are acceptable as only the ranking within each column is used.Alternatively, a
SummarizedExperimentcontaining such a matrix in one of its assays.ref_labels (
Sequence) – Sequence of labels for each reference profile, i.e., column inref_data.ref_features (
Sequence) – Sequence of identifiers for each feature, i.e., row inref_data.test_features (
Sequence|None) – Sequence of identifiers for each feature in the test dataset.assay_type (
str|int) – Assay containing the expression matrix, ifref_datais aSummarizedExperiment.check_missing (
bool) – Whether to check for and remove rows with missing (NaN) values fromref_data.restrict_to (
set|dict|None) – Subset of available features to restrict to. Only features inrestrict_towill be used in the reference building. IfNone, no restriction is performed.markers (
dict[Any,dict[Any,Sequence]] |None) – Upregulated markers for each pairwise comparison between labels. Specifically,markers[a][b]should be a sequence of features that are upregulated inacompared tob. All such features should be present infeatures, and all labels inlabelsshould have keys in the inner and outer dictionaries.marker_method (
Literal['classic','auc','cohens_d']) – Method to identify markers from each pairwise comparisons between labels inref_data. Ifclassic, we callget_classic_markers(). Ifaucorcohens_d, we callscore_markers(). Only used ifmarkersis not supplied.num_de (
int|None) – Number of differentially expressed genes to use as markers for each pairwise comparison between labels. IfNoneandmarker_method = "classic", an appropriate number of genes is determined byget_classic_markers(). Otherwise, it is set to 10. Only used ifmarkersis not supplied.hint_sce (
bool) – Whether to print a hint to changemarker_methodwhenref_datais aSingleCellExperiment. It will also suggest settingaggregate = Truefor greater efficiency whenref_datacontains 1000 cells or more.marker_args (
dict) – Further arguments to pass to the chosen marker detection method. Ifmarker_method = "classic", this isget_classic_markers(), otherwise it isscore_markers(). Only used ifmarkersis not supplied.aggregate (
bool) – Whether the reference dataset should be aggregated to pseudo-bulk samples for speed, seeaggregate_reference()for details.aggregate_args (
dict) – Further arguments to pass toaggregate_reference()whenaggregate = True.nn_parameters (
Parameters|None) – Algorithm for constructing the neighbor search index, used to compute scores during classification.num_threads (
int) – Number of threads to use for reference building.
- Return type:
- Returns:
The pre-built reference, ready for use in downstream methods like
classify_single().
Examples
>>> # Mocking up data. >>> import singler >>> ref = singler.mock_reference_data(num_replicates=8) >>> test = singler.mock_test_data(ref) >>> >>> import scranpy >>> ref = scranpy.normalize_rna_counts_se(ref) >>> >>> # Training a classifier. >>> built = singler.train_single(ref, ref.get_column_data()["label"], ref.get_row_names()) >>> built.num_labels >>> built.num_markers >>> built.labels >>> built.markers["A"]["B"] # markers for A over B. >>> len(built.marker_subset())