sequence alignment problem

1. Setup:

Input:
Strings $X = x_1 . . . x_m , Y = y_1 . . . y_n$ over some alphabet Σ like {A,C,G,T}.
Penalty $α_{gap}$ for inserting a gap, $α_{ab}$ for matching a & b [presumably $α_{ab} = 0$ of a = b]
Feasible solutions:
Alignments - i.e., insert gaps to equalize lengths of the string
Goal:
Alignment with minimum possible total penalty

2. Analysis

Optimal Substructure
For a optimal alignment of X, Y, there are 3 relevant possibilities for the contents of the final position:
$$case 1: x_m & y_n $$

$$ case 2: x_m & gap$$

$$ case 3: y_n & gap$$
If case 1 holds, then induced alignment of X’ & Y’ is optimal;
If case 2 holds, then induced alignment of X’ & Y is optimal;
If case 3 holds, then induced alignment of X & Y’ is optimal. (contradiction)
Relevant Subproblems
$(X_i , Y_j)$ where:
$X_i$ = 1st i letters of X
$Y_j$ = 1st j letters of Y
(Only peel off letters from the right end of the strings.)
Recurrence
$P_ij$ = penalty of optimal alignment of $X_i$ & $Y_j$ .
For all $i = 1, . . . , m$ and $j = 1, . . . , n$:
1

P_ij = min(a[xi][yi]+P[i-1][j-1], a_gap+P[i][j-1], a_gap+P[i-1][j])
Base case:
$$P_{i,0} = P_{0,i} = i * alpha_{gap}$$

3. The Algorithm

A[][] is a 2-D array
A[i][0] = A[0][i] = i * a_gap
for i = 1:m
    for j = 1:n
        A[i][j] = min(A[i-1][j-1]+a_xiyj, A[i-1][j]+a_gap, A[i][j-1]+a_gap);

Running time: $O(mn)$

4. Reconstructing a Solution

Trace back from A[m][n].

while (m > 0 && n > 0) {
    switch matching_case {
      case 1: match x_i, y_j, i--, j--
      case 2: match x_i, gap, i--
      case 3: match gap, y_j, j--
    }
}
while (n==0) {match remaining substring X using gaps of Y}
while (m==0) {match remaining substring Y using gaps of X}

sequence alignment problem

1. Setup:

2. Analysis

3. The Algorithm

4. Reconstructing a Solution

近期文章

近期评论

标签

热门

文章归档

分类目录

功能