Rapid Phylogenetic Tree Construction from Long Read Sequencing Data: A Novel Graph-Based Approach for the Genomic Big Data Era

Authors

  • Harisankar Sadasivan Computer Science and Engineering, University of Michigan Ann Arbor, MI 48109, USA
  • Luke Ross Computer Science and Engineering, University of Michigan Ann Arbor, MI 48109, USA
  • Chih-Yu Chang Computer Science and Engineering, University of Michigan Ann Arbor, MI 48109, USA
  • Kushantha Upulanga Attanayake Computer Science and Engineering, University of Michigan Ann Arbor, MI 48109, USA

Abstract

Genomics is the largest producer of big data, with an expected 40 EB of data every year. The rapid growth of genomic data necessitates efficient methods for analysis and classification. We present a novel, automated pipeline for swift phylogenetic tree construction from long-read sequencing data. Our approach addresses computational challenges by utilizing compact repeat graphs instead of full genome assemblies. We integrate advanced graph embedding techniques, combining structural and content-based approaches, to capture genomic relationships efficiently. Demonstrating our method on 20 bacterial genomes across 5 classes, we achieve a cophenetic correlation of 0.53 with the ground truth phylogenetic tree. Our pipeline reconstructs meaningful evolutionary relationships directly from sequencing reads without requiring complete assemblies or time-consuming alignments. This work represents a significant advancement towards rapid pathogen classification during outbreaks and offers a scalable solution for analyzing the expanding universe of sequenced organisms. By bridging graph theory, machine learning, and genomics, our method paves the way for more efficient phylogenetic analysis in the era of big data biology.

Published

2020-02-08

How to Cite

Sadasivan, H., Ross, L., Chang, C.-Y., & Attanayake, K. U. (2020). Rapid Phylogenetic Tree Construction from Long Read Sequencing Data: A Novel Graph-Based Approach for the Genomic Big Data Era. Journal of Engineering and Technology, 2(1), 1−14. Retrieved from http://mzjournal.com/index.php/JET/article/view/150