Nous revisitons cytofkit en ligne de commandes. Nous allons définir un script pour effectuer les différentes étapes et affiner les paramètres. Le point de départ est la documentation pas à pas de cytofkit 1.11.3 disponible sous Bioconductor 3.7.
Commentaires à venir…
## Run with Commands (Step-by-Step) This is a MarkDown document. If you follow it step by step, copy the code encapsulated between the start tag ```{r} and the end tag ```. ### Initialization ```{r} # Libraries library(flowCore) library(cytofkit) # Configure fcsDir = "C:/demo/PBMC8_30min" prjName = "cytofkit_1000evts" # For building channel list, see the code above or get the colnames before # the Cell Subset Detection step, for which this channel selection is required. channels = c("CD3(110:114)Dd<CD3>", "CD45(In115)Dd<CD45>", "CD4(Nd145)Dd<CD4>", "CD20(Sm147)Dd<CD20>", "CD33(Nd148)Dd<CD33>", "CD123(Eu151)Dd<CD123>", "CD14(Gd160)Dd<CD14>", "IgM(Yb171)Dd<IgM>", "HLA-DR(Yb174)Dd<HLA-DR>", "CD7(Yb176)Dd<CD7>") ``` ### Pre-processing ```{r} ## Loading the FCS data: fcsFiles <- list.files(fcsDir, pattern = '.fcs$', full = TRUE) ## File name fcsFiles ## parameters channels ``` First, we need to read the files and build a single merged dataset. This means read, compensate, transform and merge the files. The merging consists in sampling each file, then in concatenating the sampled data. The following code allows one to view the read/compensate/transform process on a single file. The full process is in the next chunk of code. ```{r, eval=FALSE} ## Extract the expression matrix with transformation data_transformed <- cytof_exprsExtract(fcsFile = file, comp = FALSE, transformMethod = "cytofAsinh") ``` The building of a single merged dataset takes place here. ```{r} ## get transformed, combined exprs data message("Extract expression data...") ## ## If analysing flow cytometry data, you can set comp to TRUE or ## provide a transformation matrix to apply compensation ## If you have multiple FCS files, expression can be extracted and combined ## To reproduce the computation, initiate the random generator data_transformed <- cytof_exprsMerge( fcsFiles = fcsFiles, comp = FALSE, transformMethod = "cytofAsinh", mergeMethod = "ceil", fixedNum = 2000, sampleSeed = 42) ## change mergeMethod to apply different combination strategy ## Take a look at the extracted expression matrix ## Total dimension after merging cat(paste(dim(data_transformed), collapse = " x "), " data was extracted!\n") ## Top left of the matrix head(data_transformed[ ,1:3]) ## Column names of the matrix colnames(data_transformed) ``` ### Cell Subset Detection ```{r, message=FALSE, } message("Run clustering and dimension reductions...") ## use clustering algorithm to detect cell subsets ## keep only selected markers data_for_clustering <- data_transformed[, channels] ## Dimensions dim(data_for_clustering) ## to speed up the script checking, we only use 100 cells #data_transformed <- data_transformed[1:100, ] ## ## run PhenoGraph ## Rphenograph directly works on high dimensional xdata ## Rphenograph_k is k nearest neighbors ## TODO: does Rphenograph use a seed? cluster_PhenoGraph <- cytof_cluster( xdata = data_for_clustering, method = "Rphenograph", Rphenograph_k = 30) ## ## run ClusterX ## ClusterX clustering is based on the transformed ydata. ## First, reduce the dimension of the data, then, clusterize. ## One can tune the following parameters and their default values. ## perplexity = 30, controls how many nearest neighbours are taken into ## account when constructing the embedding in the low-dimensional space ## theta = 0.5, speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE ## max_iter = 1000, increasing iterations usually improves the separation of the islands data_transformed_tsne <- cytof_dimReduction( data = data_for_clustering, method = "tsne", tsneSeed = 123, max_iter = 5000) ## Now we use the 2D result of tSNE to clusterize cluster_ClusterX <- cytof_cluster( ydata = data_transformed_tsne, method = "ClusterX") ``` ```{r, eval=FALSE} ## run DensVM (takes long time, we skip here) ## DensVM clustering is based on the transformed ydata and uses xdata to train the model. cluster_DensVM <- cytof_cluster( xdata = data_transformed, ydata = data_transformed_tsne, method = "DensVM") ``` ```{r, message=FALSE} ## run FlowSOM ## FlowSOM directly works on high dimensional xdata ## By default, the grid is 10 x 10 ## FlowSOM_k is the number of meta-clusters computed from the nodes of the grid cluster_FlowSOM <- cytof_cluster( xdata = data_transformed, method = "FlowSOM", FlowSOM_k = 20, flowSeed = 123) ``` ```{r} ## combine data data_all <- cbind( data_transformed, data_transformed_tsne, PhenoGraph = cluster_PhenoGraph, ClusterX = cluster_ClusterX, FlowSOM = cluster_FlowSOM) data_all <- as.data.frame(data_all) ## Rename columns colnames(data_all) = gsub(".+<(.+?)>$", "\\1", colnames(data_all)) ## Final view colnames(data_all) ```