SpaRC: Scalable Sequence Clustering using Apache Spark

Автор: insideHPC Report

Загружено: 2018-02-26

Просмотров: 312

Описание:

In this video from the Stanford HPC Conference, Zhong Wang from the Genome Institute, Lawrence Berkeley National Laboratory presents: SpaRC: Scalable Sequence Clustering using Apache Spark.

"Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. Assembly of these data sets requires tradeoffs between scalability and accuracy. Current assembly methods optimized for scalability often sacrifice accuracy and vice versa. An ideal solution would both scale and produce optimal accuracy for individual genes or genomes. Here we describe an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC) that partitions reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomes and metagenomes from both short and long read sequencing technologies. It achieves near linear scalability with input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modification while delivering similar performance. Our results demonstrate that SpaRC provides a scalable solution for clustering billions of reads from next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar large scale sequence data analysis problems. The software is available under the BSD license at https://bitbucket.org/LizhenShi/sparc."

Learn more: https://bitbucket.org/LizhenShi/sparc
and
http://hpcadvisorycouncil.com

Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

SpaRC: Scalable Sequence Clustering using Apache Spark

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

An Uber Journey in Distributed Deep Learning

An Uber Journey in Distributed Deep Learning

Tutorial on Linux Containers

Tutorial on Linux Containers

@HPCpodcast Industry View: LRZ’s Upcoming HPE Cray GX Supercomputer, Powered by NVIDIA Vera Rubin

@HPCpodcast Industry View: LRZ’s Upcoming HPE Cray GX Supercomputer, Powered by NVIDIA Vera Rubin

At ISC 2025: E4 Computer Engineering and NVIDIA Discuss Their High Performance European Alliance

At ISC 2025: E4 Computer Engineering and NVIDIA Discuss Their High Performance European Alliance

@HPCpodcast-102: TOP500 at ISC25 Conference

@HPCpodcast-102: TOP500 at ISC25 Conference

How process mining improves the things you do not see | Wil van der Aalst | TEDxRWTHAachen

How process mining improves the things you do not see | Wil van der Aalst | TEDxRWTHAachen

Гренландия: остров китов, нищеты и алкоголизма | Интервью с местными, снег, лед и хаски

Гренландия: остров китов, нищеты и алкоголизма | Интервью с местными, снег, лед и хаски

System Design Concepts Course and Interview Prep

System Design Concepts Course and Interview Prep

ISC 2025: HPC Luminaries on Trends to Watch in Hamburg Next Week

ISC 2025: HPC Luminaries on Trends to Watch in Hamburg Next Week

Sandia Labs’ Doug Kothe: The Mission-Driven Rewards of Working at the U.S. National Labs

Sandia Labs’ Doug Kothe: The Mission-Driven Rewards of Working at the U.S. National Labs

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

Открытие Варбурга: 4 переключателя, которые мешают раку расти | Здоровье с Доктором

Открытие Варбурга: 4 переключателя, которые мешают раку расти | Здоровье с Доктором

SAMOBÓJ I 106. GOL LEWANDOWSKIEGO W LIDZE MISTRZÓW! | SLAVIA - FC BARCELONA, SKRÓT MECZU

SAMOBÓJ I 106. GOL LEWANDOWSKIEGO W LIDZE MISTRZÓW! | SLAVIA - FC BARCELONA, SKRÓT MECZU

A Look ahead at SC25: How to Get the Most out of Next Week's Conference in St. Louis

A Look ahead at SC25: How to Get the Most out of Next Week's Conference in St. Louis

2 HOURS :: Yann Tiersen, 6 pieces piano

2 HOURS :: Yann Tiersen, 6 pieces piano "Amélie", Piano Cover by @RoseWilson

Кирилл Набутов, Кирилл Рогов | Обзор от BILD

Кирилл Набутов, Кирилл Рогов | Обзор от BILD

Для Чего РЕАЛЬНО Нужен был ГОРБ Boeing 747?

Для Чего РЕАЛЬНО Нужен был ГОРБ Boeing 747?

Как подготовиться к появлению квантовых вычислений с Алисой и Бобом и Hyperion Research

Как подготовиться к появлению квантовых вычислений с Алисой и Бобом и Hyperion Research

ЗАЧЕМ ТРАМПУ ГРЕНЛАНДИЯ? / Уроки истории @MINAEVLIVE

ЗАЧЕМ ТРАМПУ ГРЕНЛАНДИЯ? / Уроки истории @MINAEVLIVE

КОЗЫРЕВ - астрофизик ДОКАЗАЛ, что ВРЕМЯ это ЭНЕРГИЯ: дважды СИДЕЛ, приговорён к РАССТРЕЛУ

КОЗЫРЕВ - астрофизик ДОКАЗАЛ, что ВРЕМЯ это ЭНЕРГИЯ: дважды СИДЕЛ, приговорён к РАССТРЕЛУ