Degree: Bachelor, Master
Nationality: International Students
Application deadlines: Open
Scientific question . Humans are diploid, and hence there exist two versions of each chromosome, one inherited from the mother and the other from the father. Determining the DNA sequences of these two chromosomal copies—called haplotypes—is important for many applications ranging from population history to clinical questions. Existing sequencing technologies cannot read a chromosome from start to end, but instead deliver small pieces of sequence (called reads). Like in a jigsaw puzzle, the underlying genome sequences are reconstructed from the reads by finding the overlaps between sequences. We develop algorithms to solve the genome assembly for diploids, that is, “to simultaneously solve two jigsaw puzzles with very similar yet different images”. We will apply this method on cancer genomes that have complex rearrangements.
Approach . Due to sequencing errors in the reads, heterozygous and repetitive genomic regions, the assembly problem is challenging. Over the past few decades, researchers solved it by casting it as an overlap graph problem, where nodes are the reads and edges represent the overlap between reads. To detect regions where haplotypes differ (called heterozygosity), we look for simple local structures called bubbles. A bubble is a type of directed acyclic subgraph with a single distinct source and sink vertices that consists of multiple edges (with the same direction) between these pairs of vertices. Once bubbles have been identified, they are simplified by removing structures most likely resulting from sequencing errors. The resulting bubbles can then be used to solve the “phasing problem”: find haplotype paths based on maximum-likelihood framework.
1. Programming: C++, python, shell scripting, graph algorithms
2. Basic knowledge of bioinformatic tools
3. Enthusiasm to solve the problem
Possibility to work remotely, with regular meetings on the campus.
What you will get:
– Extensive mentorship in computational methods
– Knowledge of how, conceptually, we can solve biological problems using computational
– The opportunity to work in a diverse environment that includes people with vastly different,
but complementary skill sets.
– Responsibility and satisfaction of owning your own project.
Candidates will be called for a short discussion (interview) to access your creativity,
reasoning, and problem solving skills.
For further information, please contact Ms Feng, Department of Computing, University of Surrey, Please contact Shilpa Garg ( [email protected] , [email protected] ) and include your CV if you’re interested in inventing the future of biology using computational techniques.