test
Search publications, data, projects and authors

Thesis

English

ID: <

10670/1.7vh6p9

>

Where these data come from
Statistical methods for data mining in genomics databases (Gene Set En- richment Analysis)

Abstract

Our focus is on statistical testing methods, that compare a given vector of numeric values, indexed by all genes in the human genome, to a given set of genes, known to be associated to a particular type of cancer for instance. Among existing methods, Gene Set Enrichment Analysis is the most widely used. However it has several drawbacks. Firstly, the calculation of p-values is very much time consuming, and insufficiently precise. Secondly, like most other methods, it outputs a large number of significant results, the majority of which are not biologically meaningful. The two issues are addressed here, by two new statistical procedures, the Weighted and Doubly Weighted Kolmogorov-Smirnov tests. The two tests have been applied both to simulated and real data, and compared with other existing procedures. Our conclusion is that, beyond their mathematical and algorithmic advantages, the WKS and DWKS tests could be more informative in many cases, than the classical GSEA test and efficiently address the issues that have led to their construction.

Your Feedback

Please give us your feedback and help us make GoTriple better.
Fill in our satisfaction questionnaire and tell us what you like about GoTriple!