optimal bsts

Setup:

Input: Frequencies $p_1, p_2,…, p_n$ for items 1,2,…,n

Goal: Compute a valid search tree that minimizes the weighted (average) search time.

$$C(T) = sum_{items} p_i * (depth(i)+1)$$

Analysis:

Optimal Substructure

Suppose an optimal BST for keys {1, 2, . . . , n} has root r , left subtree $T_1$ , right subtree $T_2$. Then $T_1$ is optimal for the keys {1, 2, . . . , r − 1} and $T_2$ for the keys {r + 1, r + 2, . . . , n} (contradiction)

At this time, we have:

$$C(T) = sum_{items} p_i + C(T_1) + C(T_2)$$

Relevant Subproblems

Contiguous intervals (S = {i, i + 1, . . . , j − 1, j} for every i ≤ j) Note: Since we sometimes have to calculate the suffix of a prefix(this is not neither a prefix or suffix, but an contiguous interval)

The Recurrence:

Notation: For 1 ≤ i ≤ j ≤ n, let $C_ij$ = weighted search cost of an optimal BST for the items { i, i + 1, . . . , j−1, j } [with probabilities $[ p_i , p_{i+1} , . . . , p_j ]$

Recurrence: For every 1 ≤ i ≤ j ≤ n: $$C_{ij}=min_{r=1,…,j}{sum^j_{k=i}p_k + C_{i,r-1}+C_{r+1,j}}$$

Algorithm:

Important: Solve smallest subproblems first

1
2
3
4
5
Let A = 2D array
for s = 0:n-1
for i = 1:n
A[i][i+s] = min_r(sum(p(i:i+s) + A[i][r-1] + A[r+1][i+s])
return A[1][n]

Complexity Analysis:

$-Theta(n^2)$ Subproblems

$-Theta(j-i)$ time to compute A[i][j]

$-Theta(n^3)​$ time overall