My third internship at the University of Perpignan (France)
During this second year of master of environmental genomics, I made a 6 months internship at the Ecology and Evolution of Interaction lab (now IHPE lab) in the University of Perpignan (France).
You can download (in pdf) the french version of this page.
The subject of this internship was:
DNA methylation pan-species analysis with bioinformatic tools
- Christoph Grunau
- Jan Bulla
DNA methylation, CpGo/e, gene body, prediction, Gaussian distribution and non-Gaussian distribution
DNA methylation at the cytosine level in CpG di-nucleotides is a fundamental and common feature to the most species. Since it remains neutral with regard to the genetic code translation, the nature “use it” for bear an additional information, epigenetic information which bear on the gene transcriptional state. DNA methylation cannot be determined with a simple sequencing, even if these last years, some methods based on genomic sequencing (e.g. Bisulfite genomic sequencing) allow to know the methylation of each cytosine in a given sequence. However, during evolution, methylation leaves a print in the gene sequence. The relation observed/expected of CpG tend to be close to 1 in genes or genomes without methylation and it is lower to 1 in methylated genes (figure 1).
The goal of this project is to systematically analyze the data in GeneDB/estDB for several thousand species and identify several DNA methylation types with the help of statistical description of observed/expected ratios in CpGs in the ESTs. As the figure 1 shows, Gaussian distribution mixtures can be used to differentiate uni- and bimodal densities. The estimation procedures for this density type are well established and easy to use. Unfortunately, they are not entirely fulfilled from a statistical viewpoint, and doesn't have the tendency to correctly identifies the correct mode numbers. As consequence, it is necessary to implement estimated algorithms for mixed models which used non-Gaussian variables. The used programming language is R, an « open source » statistical software. The candidate needs to have at least basic knowledges in R or wanted to acquire it (mainly, autonomous).
The internship will be co-supervised by Ch. Grunau (UPVD) and J. Bulla (University of Caen).
I will add the graphical abstract soon.
Publication and communication
I followed on this work during my PhD and I wrote these publications:
- Notos - a Galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types in BMC bioinformatics (2018).
- Universality of the DNA methylation codes in Eucaryotes in Scientific Reports (2019).
I also did these posters:
B. Aliaga, V. Lacal, I. Bulla, D. Duval, J. Bulla, C. Grunau. A pan-species study of DNA methylation patterns by means of distributions of CpG o/e ratios. European Society for Evolutionary Biology (ESEB), 10th-14th August 2015, Lausanne, Switzerland.
Letter of recommendation
I did my PhD with Christoph Grunau, my advisor for this internship. You can find a letter of recommendation written by him after my PhD.