Genes and geography -- a bioinformatics project
Автор: OMGenomics
Загружено: 2021-12-29
Просмотров: 32814
This is a full walkthrough of a bioinformatics project: Run PCA/TSNE on some population genotype data.
00:00 Intro
01:07 Hunting for data
04:55 Inspecting the VCF
06:02 Finding population labels for the samples
10:20 Parsing VCF with pysam
16:02 Going from alleles to numbers for a numpy array
21:47 When to work in colab versus python script
26:00 Saving data with pandas
28:42 Adding population labels from the panel file
33:33 To Colab!
36:54 PCA
40:17 First plot! Mission accomplished :)
42:03 Using Altair for plotting with labels
44:51 Second plot with population labels!
46:05 Merging with the igsr_population.tsv data
49:43 TSNE
53:36 Exercise: PCA on the SNPs
54:21 Conclusion and origin story for this project
Download a VCF of population genotypes from the 1000 Genomes project.
Use pysam to parse it and summarize it into a 2D numpy array to run PCA and save it as a pandas dataframe.
Run PCA and tSNE on it and visualize the results with both matplotlib and Altair, coloring the points based on the ancestry labels.
Here is the project ideas videos where I mentioned this project first: • Bioinformatics project ideas -- if you're interested in the origin story.
All code including python script, download URLs for input files, and the Colab notebook: https://github.com/MariaNattestad/pca...
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: