singlepp_loaders
Load pre-processed reference datasets for SingleR
|
This repository implements functions to load singlepp reference datasets for use in cell type annotation. Each reference dataset is pre-processed into a custom format that eliminates the need for ranking and marker detection. The aim is to avoid unnecessary work on underpowered client devices, e.g., for more responsive web applications. Briefly, a reference dataset is represented by three files corresponding to the following components:
[0, N)
where N
is the number of unique labels.y
is this object, then y[i][j][k]
should contain the k
-th best marker gene that is upregulated in label i
compared to label j
. Marker genes should be reported as row indices of the expression matrix.In practical usage, a reference dataset will also contain:
Check out some existing datasets for concrete examples.
We can parse each component of our reference dataset from a text file, Gzip-compressed file or Zlib-compressed buffer.
We can verify the consistency of all components with the verify()
function:
These components are used in singlepp::train_single()
to build a classifier that can be applied to a test dataset. More details can be found in the singlepp documentation.
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
Then you can link to singlepp_loaders to make the headers available during compilation:
To install the library, use:
By default, this will use FetchContent
to fetch all external dependencies. If you want to install them manually, use -DSINGLEPP_LOADERS_FETCH_EXTERN=OFF
. See the tags in extern/CMakeLists.txt
to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/
- either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
. This requires the external dependencies listed in extern/CMakeLists.txt
.