I joined a new lab this Fall semester at the Unversity of Nebraska, Lincoln working with Dr. Clarke. In this project, we are trying to infer genomic distances between various E. Coli strains. This information could be useful to identify clusters of pathogenetic E. Coli and help track down root sources of contamination to minimize food bourne outbreaks. How? E. Coli genomes vary from location to location, and thus, determining a “distance” between the genomes can help determine if these E. Coli samples are from the same outbreak or not. Since genome sequencing is now quick and cost-effictive thanks to the rise of “Next-Generation Sequencing” technologies, accurate algorithmic pipelines to process that information is critical. That is where this project fits in. Calculating a “distance” between two genomes is a complicated problem depending on how deep down the rabbit hole you want to go. For starters, an average E. Coli genome is about 5 million base pairs long and bacterial genomes are relatively dynamic in terms of large insertions and deletions.
Relatedly, the FDA already has a whole genome sequencing program designed to help public officials identify and understand pathogens isolated from patients, the environment or food.
I’m looking forward to the work I will be doing for this project. It will be a mix of manipulating genomic data on high performance computing clusters and statistical work to validate the distances.