cytofkit: commandes pas à pas

Nous revisitons cytofkit en ligne de commandes. Nous allons définir un script pour effectuer les différentes étapes et affiner les paramètres. Le point de départ est la documentation pas à pas de cytofkit 1.11.3 disponible sous Bioconductor 3.7.
Commentaires à venir…
## Run with Commands (Step-by-Step)

This is a MarkDown document. If you follow it step by step, copy the code 
encapsulated between the start tag ```{r} and the end tag ```.

### Initialization

```{r}
# Libraries
library(flowCore)
library(cytofkit)
# Configure
fcsDir = "C:/demo/PBMC8_30min"
prjName = "cytofkit_1000evts"
# For building channel list, see the code above or get the colnames before
# the Cell Subset Detection step, for which this channel selection is required.
channels = c("CD3(110:114)Dd<CD3>", "CD45(In115)Dd<CD45>", "CD4(Nd145)Dd<CD4>",
  "CD20(Sm147)Dd<CD20>", "CD33(Nd148)Dd<CD33>", "CD123(Eu151)Dd<CD123>", 
  "CD14(Gd160)Dd<CD14>", "IgM(Yb171)Dd<IgM>", "HLA-DR(Yb174)Dd<HLA-DR>", 
  "CD7(Yb176)Dd<CD7>")
```

### Pre-processing

```{r}
## Loading the FCS data:  
fcsFiles <- list.files(fcsDir, pattern = '.fcs$', full = TRUE)
## File name
fcsFiles
## parameters
channels
```

First, we need to read the files and build a single merged dataset. 
This means read, compensate, transform and merge the files. The merging 
consists in sampling each file, then in concatenating the sampled data.

The following code allows one to view the read/compensate/transform 
process on a single file. The full process is in the next chunk of code.

```{r, eval=FALSE}
## Extract the expression matrix with transformation
data_transformed <- cytof_exprsExtract(fcsFile = file, 
                                       comp = FALSE, 
                                       transformMethod = "cytofAsinh")
```

The building of a single merged dataset takes place here.

```{r}
## get transformed, combined exprs data
message("Extract expression data...")
##
## If analysing flow cytometry data, you can set comp to TRUE or 
## provide a transformation matrix to apply compensation
## If you have multiple FCS files, expression can be extracted and combined
## To reproduce the computation, initiate the random generator
data_transformed <- cytof_exprsMerge(
  fcsFiles = fcsFiles, 
  comp = FALSE,
  transformMethod = "cytofAsinh",
  mergeMethod = "ceil",
  fixedNum = 2000,
  sampleSeed = 42)
## change mergeMethod to apply different combination strategy
## Take a look at the extracted expression matrix
## Total dimension after merging
cat(paste(dim(data_transformed), collapse = " x "), " data was extracted!\n")
## Top left of the matrix
head(data_transformed[ ,1:3])
## Column names of the matrix
colnames(data_transformed)
```

### Cell Subset Detection

```{r, message=FALSE, }
message("Run clustering and dimension reductions...")
## use clustering algorithm to detect cell subsets
## keep only selected markers
data_for_clustering <- data_transformed[, channels]
## Dimensions
dim(data_for_clustering)
## to speed up the script checking, we only use 100 cells
#data_transformed <- data_transformed[1:100, ]
##
## run PhenoGraph
## Rphenograph directly works on high dimensional xdata
## Rphenograph_k is k nearest neighbors
## TODO: does Rphenograph use a seed?
cluster_PhenoGraph <- cytof_cluster(
  xdata = data_for_clustering,
  method = "Rphenograph",
  Rphenograph_k = 30)
##
## run ClusterX
## ClusterX clustering is based on the transformed ydata.
## First, reduce the dimension of the data, then, clusterize.
## One can tune the following parameters and their default values.
## perplexity = 30, controls how many nearest neighbours are taken into
##   account when constructing the embedding in the low-dimensional space
## theta = 0.5, speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE
## max_iter = 1000, increasing iterations usually improves the separation of the islands
data_transformed_tsne <- cytof_dimReduction(
  data = data_for_clustering,
  method = "tsne",
  tsneSeed = 123,
  max_iter = 5000)
## Now we use the 2D result of tSNE to clusterize
cluster_ClusterX <- cytof_cluster(
  ydata = data_transformed_tsne,
  method = "ClusterX")
```

```{r, eval=FALSE}
## run DensVM (takes long time, we skip here)
## DensVM clustering is based on the transformed ydata and uses xdata to train the model.
cluster_DensVM <- cytof_cluster(
  xdata = data_transformed, 
  ydata = data_transformed_tsne,
  method = "DensVM")
```

```{r, message=FALSE}
## run FlowSOM
## FlowSOM directly works on high dimensional xdata
## By default, the grid is 10 x 10
## FlowSOM_k is the number of meta-clusters computed from the nodes of the grid
cluster_FlowSOM <- cytof_cluster(
  xdata = data_transformed,
  method = "FlowSOM",
  FlowSOM_k = 20,
  flowSeed = 123)
```

```{r}
## combine data
data_all <- cbind(
  data_transformed,
  data_transformed_tsne, 
  PhenoGraph = cluster_PhenoGraph,
  ClusterX = cluster_ClusterX, 
  FlowSOM = cluster_FlowSOM)
data_all <- as.data.frame(data_all)
## Rename columns
colnames(data_all) = gsub(".+<(.+?)>$", "\\1", colnames(data_all))
## Final view
colnames(data_all)
```