Expectation Maximization (EM) for MEME Motif Discovery in Bioinformatics (Part 2 of 3)

Автор: Saniya Khullar

Загружено: 2021-02-19

Просмотров: 1391

Описание:

Please note: MEME is Multiple Expectation maximizations for Motif Elicitation. In bioinformatics, motifs typically are sequence patterns that occur many times in a group of related protein or DNA sequences. Typically, motifs are associated with some biological function (e.g. Transcription Factor Binding Sites where Transcription Factors bind to regulatory elements like promoters/enhancers). Saniya goes through a detailed toy example of applying MEME algorithm to learn a Position Weight Matrix (PWM) and associated motif occurrences.

Please note this is 2nd detailed video walking through an example of using MEME to discover motifs for TF binding.
Part 1 of 3 (previous video):    • Expectation Maximization (EM) for MEME Mot...
Part 2 of 3 (current video):    • Expectation Maximization (EM) for MEME Mot...
Part 3 of 3 (next video):    • Expectation Maximization (EM) for MEME Mot...

Please note PWM is actually called Position Weighted Matrix and not Probability Weighted Matrix. Sorry!
********* Please note this toy example: **********
L = 6 bases (length of the DNA sequence)
W = 3 bases (motif bases); please note this is a parameter we selected.
N = 4 sequences

4 DNA sequences:
1. GTCAGG
2. GAGAGT
3. ACGGAG
4. CCAGTC

Using MEME algorithm, please find Position Weight Matrix (PWM) or P-matrix including background (non-motif) probabilities. Please also find occurrences of motifs in these 4 sequences. :)

Assumptions: please set matching letters in subsequence to be some value pi (= 0.7).
11 unique motifs that are found across all 4 sequences :)
GTC, TCA, CAG, AGG, GAG, AGA, AGT, ACG, CGG, GGA, CCA.
Here, m = # of possible start positions for a motif in DNA sequence, and is 4 (as Saniya shows =).
Typos found:
*47m 55s: 0.0766% is probability of getting Sequence 1 given that motif for sequence 1 starts at position 3.
*1h 27m 11s: numerator should be 0.23.

**********************************************************

Please reach out with any and all questions and please subscribe to Saniya's YouTube channel for more updates. :)

TIME STAMPS:
00:00 Expectation Maximization (EM) for MEME Motif Discovery in Bioinformatics (Part 2 of 3)
00:21 Z matrix (probability of motif starting in given position of sequence)
05:54 Initial Z matrix in our example (based on initializing each value to 1/m.
07:51 How to initialize the Position Weight Matrix (PWM) for a given motif based on our default values.
10:01 What is a background (non-motif) position? What is a motif position? Interpret a PWM
11:48 Initial assumption for background (non-motif) positions: 25% prob. for each base
12:05 Rule we use to initialize our PWM for a given motif
13:59 11 unique motifs: GTC, TCA, CAG, AGG, GAG, AGA, AGT, ACG, CGG, GGA, CCA.
14:07 Example of initializing PWM for motif: GTC
15:28 Initial Position Weight Matrices for 11 unique motifs
18:48 Checklist of Info we gathered so far for MEME algorithm
22:26 Count # of each type of DNA base across all of the sequences
23:00 Overview of the basic EM approach (Expectation Maximization) for Motif Discovery
====E -step: ====
26:57 Probability of a Sequence given a motif starting position
29:00 Little Break in between :) (around 30 secs)
29:36 Interpreting the formula for the probability of a sequence given a motif starting position
32:06 Focus on GAG motif (out of the 11 motifs) for example going forward. (Please apply similar concepts for other 10 motifs. Saniya randomly chose GAG for illustration)
32:43 Focus on GAG motif and Sequence 1: if we use initial PWM for motif GAG, what's probability of observing sequence 1 given that motif for sequence 1 starts at position 3 (corresponds to CAG) instead of positions 1, 2, or 4?
47:55 Correct Probability should be 0.0766% (typo was made!)
48:15 Calculate probability of motif starting in positions 1, 2, or 4 for DNA Sequence 1 if we use PWM for motif GAG.
53:01 Normalize each row of the Z matrix (so row will sum to 1): columns in each row represent probability of motif for that sequence starting in column's respective given position. Thus, summing across all columns for a row should sum to 1.
54:54 Repeating this same step for the other sequences (2 to 4) to fully update the Z matrix based on our PWM initialized for GAG. The motifs are off for seq3 and seq 4. Also GAG for seq 2 should be 70 *70*70 for j = 1
===== M -step: ======
01:00:17 M-step: re-estimate P-matrix (our PWM) using updated Z-matrix values: 1st find expected # of each DNA base in motif position
01:18:01 Update our motif counts
01:19:21 n_T,2: expected # of DNA bases in 2 position of motif
01:21:04 Find background (non-motif) counts for bases
01:26:00 Probability of DNA base in particular position (based on our counts)
======= Summary ====
01:28:04 Summary of approach for GAG (E & M steps): 1 iteration
01:29:43 Probability of each X given updated Z and P matrices
01:30:48 Calculate Log-Likelihood (Video 3)

Expectation Maximization (EM) for MEME Motif Discovery in Bioinformatics (Part 2 of 3)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Expectation Maximization (EM) for MEME Motif Discovery in Bioinformatics (Part 3 of 3)

Expectation Maximization (EM) for MEME Motif Discovery in Bioinformatics (Part 3 of 3)

Expectation Maximization (EM) for MEME Motif Discovery in Bioinformatics (Part 1 of 3)

Expectation Maximization (EM) for MEME Motif Discovery in Bioinformatics (Part 1 of 3)

Positive December Jazz ☕ Sweet Morning Coffee Jazz & Bossa Nova Instrumental for Great Mood

Positive December Jazz ☕ Sweet Morning Coffee Jazz & Bossa Nova Instrumental for Great Mood

Best Christmas Music Playlist 2026 🎁 Top Christmas Songs of All Time 🎄 Merry Christmas Songs 2026

Best Christmas Music Playlist 2026 🎁 Top Christmas Songs of All Time 🎄 Merry Christmas Songs 2026

Кремль заговорил о смерти Путина / Киев и Москва договорились?

Кремль заговорил о смерти Путина / Киев и Москва договорились?

2019 STAT115 Lect10.2 Motif Finding Using Expectation Maximization

2019 STAT115 Lect10.2 Motif Finding Using Expectation Maximization

Christmas Jazz 2026 🎄 Relaxing Coffee Jazz Music & Christmas Bossa Nova Piano for Good Mood

Christmas Jazz 2026 🎄 Relaxing Coffee Jazz Music & Christmas Bossa Nova Piano for Good Mood

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

Chillout Lounge Radio - 24/7 Live | Smooth Background Music | Focus, Study, Work, Sleep, Meditation

Chillout Lounge Radio - 24/7 Live | Smooth Background Music | Focus, Study, Work, Sleep, Meditation

Как и почему менялся Путин?

Как и почему менялся Путин?

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Замуж в 12, рыцари-скуфы и пояса верности. Настоящее Средневековье | ФАЙБ

Замуж в 12, рыцари-скуфы и пояса верности. Настоящее Средневековье | ФАЙБ

Deep Focus - Music For Studying | Improve Your Focus - Study Music

Deep Focus - Music For Studying | Improve Your Focus - Study Music

Сергей Есенин: Настоящая история без школьных мифов / Личности / МИНАЕВ

Сергей Есенин: Настоящая история без школьных мифов / Личности / МИНАЕВ

Теорема Байеса, геометрия изменения убеждений

Теорема Байеса, геометрия изменения убеждений

Quantum Mechanics Is Faster Than Light

Quantum Mechanics Is Faster Than Light

Ни один искусственный интеллект меня не впечатлил.

Ни один искусственный интеллект меня не впечатлил.

Gibbs Sampler for Sequence Motif Detection Likelihood Ratio (Bioinformatics)

Gibbs Sampler for Sequence Motif Detection Likelihood Ratio (Bioinformatics)

Мы ЗАСТРЯЛИ в Солнечной системе, и вот почему... | Михаил Никитин, Борис Штерн

Мы ЗАСТРЯЛИ в Солнечной системе, и вот почему... | Михаил Никитин, Борис Штерн

Превращаем NotebookLM в жесткого аналитика и маркетолога! (Разбор апдейта)

Превращаем NotebookLM в жесткого аналитика и маркетолога! (Разбор апдейта)