As technology improves, scientists are able to generate high-throughput data faster and cheaper. Consequently, the field of biological sciences is progressively becoming more reliant on data science tools like machine learning methods for analysis and sorting of big data. The Complex Object Parametric Analyzer and Sorter (COPAS) is a large particle flow cytometer that can perform high-throughput fluorescence screens on small animals, like Caenorhabditis elegans. The outputs of the COPAS are extinction coefficient (EXT), Time of Flight (TOF, arbitrary length unit) and fluorescence. However, the COPAS outputs include unwanted objects like bubbles or bacteria and some animals pass the flow cell in a non-straight manner producing abnormal profiles leading to inaccurate developmental staging. In this thesis, I have created an R package, named COPASProfiler, that generates experiment-specific supervised machine learning (ML) classification models which can detect and remove abnormal profiles enabling standardized fluorescence quantification and analysis. I used COPASProfiler to develop a pipeline to automate fluorescence analysis of high-throughput COPAS data sets. Using R shiny, I created a web program with a graphical user interface that allows users to view, annotate, quantify fluorescence, and classify COPAS-generated datasets. The COPASProfiler is available on GitHub and can be installed using one single R command. Lastly, the COPASProfiler comes with multiple tutorials and examples, and was designed to accommodate users with minimal programming experience. COPASProfiler should enable robust high-throughput fluorescence studies of regulatory elements (e.g., enhancers, promoters, and 3’UTRs) and long-term epigenetic silencing in C. elegans.
|Date made available
|KAUST Research Repository