Reproducible Performance Benchmarking for Genomics Workflows on HPC Cluster
Hi! I’m Martin, and I will be working on Reproducible Performance Benchmarking for Genomics Workflows on HPC Cluster under the mentorship of In Kee Kim. Our work is driven by the scale of computing systems that hosts data commons – we believe that performance characterization of genomics workload should be done rapidly and at the scale similar to production settings. Feel free to check our proposal for more details!
We propose GenScale, a genomics workload benchmarking tool which can achieve both the scale and speed necessary for characterizing performance under large-scale settings. GenScale will be built on top of industrial-grade cluster manager (e.g. Kubernetes), metrics collection & monitoring systems (e.g. Prometheus), and will support comprehensive set of applications used in state-of-art genomics workflows. Initial version developed during this project will include DNA and RNA alignment workflows.
Finally, we believe that open access and reproducible research will greatly accelerate the pace of scientific discovery. We aim to package our artefacts and generated datasets in ways that makes it easiest to replicate, analyze, and build upon. I personally look forward to learn from & contribute to the open source community!