Issue
I'm implementing the forward recursion for Hidden Markov Model: the necessary steps (a-b) are shown here
Below is my implementation
from scipy.stats import norm
import numpy as np
import random
def gmmdensity(obs, w, mu, sd):
# compute probability density function of gaussian mixture
gauss_mixt = norm.pdf(obs, mu, sd)[:,None]*w
return gauss_mixt
def alpha(obs, states, A, pi, w, mu, sigma):
dens = np.sum(gmmdensity(obs, w, mu, sigma), axis = 2)
# scaling factor is used to renormalize probabilities in order to
# avoid numerical underflow
scaling_factor = np.ones(len(obs))
alpha_matrix = np.zeros((len(states), len(obs)))
# for t = 0
alpha_matrix[:,0] = pi*dens[0]
scaling_factor[0] = 1/np.sum(alpha_matrix[:,0], axis = 0)
alpha_matrix[:,0] *= scaling_factor[0]
# for t == 1:T
for t in range(1, len(obs)):
alpha_matrix[:,t] = np.matmul(alpha_matrix[:,t-1], A)*dens[t]
scaling_factor[t] = 1/np.sum(alpha_matrix[:,t], axis = 0)
alpha_matrix[:,t] *= scaling_factor[t]
return alpha_matrix, scaling_factor
Let's generate some data to run the algorithm
obs = np.concatenate((np.random.normal(0, 1, size = 500),
np.random.normal(1.5, 1, size = 500))).reshape(-1,1)
N = 2 # number of hidden states
M = 3 # number of mixture components
states = list(range(N))
pi = np.array([0.5, 0.5]) # initial probabilities
A = np.array([[0.8, 0.2], [0.3, 0.7]]) # transition matrix
mu = np.array([np.min(obs), np.median(obs), np.max(obs)]) # means of mixture components
sigma = np.array([1, 1, 1]) # variances of mixture components
w = np.array([[0.2, 0.3, 0.5], [0.6, 0.2, 0.2]]) # weights of mixture components
Let's see how fast the algorithm is
%timeit alpha(obs, states, A, pi, w, mu, sigma)
13.6 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Is there any possibility to make this code faster? I thought about using numba or cython, but it never fully worked in this case.
Solution
TL;DR: Numba can strongly speed up this code, especially with basic loops. This answer comes lately compared to the others, but it provides the fastest implementation so far.
This is indeed typically a situation where Numba can strongly speed up the computation. Indeed, np.matmul
takes more than 1 µs on my machine to multiply a vector of only 2 items with a matrix of 2x2. This should take clearly no more than 0.01 µs on my machine (assuming data are in the L1 cache). Most of the time is lost in the Numpy call overheads. np.sum
takes > 3 µs while it should take about few nanoseconds (for the same reason) : it is just a sum of 2 numbers!
To solve this efficiently in Numba you need to use basic loops and avoid creating new temporary arrays (allocations are expensive in such a basic loop). Note that gmmdensity
cannot be easily translated to Numba since it calls norm.pdf
of Scipy which is AFAIK not supported by Numba yet. Here is a fast implementation:
import numba as nb
import numpy as np
@nb.njit('(int64, int64, float64[:,::1], float64[:,::1], float64[::1])')
def fast_iteration(obsLen, statesLen, dens, A, pi):
# scaling factor is used to renormalize probabilities in order to
# avoid numerical underflow
scaling_factor = np.ones(obsLen)
alpha_matrix = np.zeros((statesLen, obsLen))
# for t = 0
alpha_matrix[:,0] = pi*dens[0]
scaling_factor[0] = 1/np.sum(alpha_matrix[:,0], axis = 0)
alpha_matrix[:,0] *= scaling_factor[0]
tmp = np.zeros(statesLen)
for t in range(1, obsLen):
# Line: alpha_matrix[:,t] = np.matmul(alpha_matrix[:,t-1], A)*dens[t]
for i in range(statesLen):
s = 0.0
for k in range(statesLen):
s += alpha_matrix[k, t-1] * A[k, i]
tmp[i] = s * dens[t, i]
# Line: scaling_factor[t] = 1/np.sum(alpha_matrix[:,t], axis = 0)
s = 0.0
for i in range(statesLen):
s += tmp[i]
scaling_factor[t] = 1.0 / s
# Line: alpha_matrix[:,t] *= scaling_factor[t]
for i in range(statesLen):
alpha_matrix[i, t] = tmp[i] * scaling_factor[t]
return alpha_matrix, scaling_factor
@nb.njit('(float64[:,::1], float64[::1], float64[::1])')
def fast_normal_pdf(obs, mu, sigma):
n, m = obs.shape[0], mu.size
assert obs.shape[1] == 1
assert mu.size == sigma.size
inv_sigma = 1.0 / sigma
one_over_sqrt_2pi = 1 / np.sqrt(2 * np.pi)
result = np.empty((n, m))
for i in range(n):
for j in range(m):
tmp = (obs[i, 0] - mu[j]) * inv_sigma[j]
tmp = np.exp(-0.5 * tmp * tmp)
result[i, j] = one_over_sqrt_2pi * inv_sigma[j] * tmp
return result
@nb.njit('(float64[:,::1], float64[:,::1], float64[::1], float64[::1])')
def fast_gmmdensitysum(obs, w, mu, sigma):
pdf = fast_normal_pdf(obs, mu, sigma)
result = np.zeros((pdf.shape[0], w.shape[0]))
for i in range(pdf.shape[0]):
for k in range(w.shape[0]):
s = 0.0
for j in range(pdf.shape[1]):
s += pdf[i, j] * w[k, j]
result[i, k] = s
return result
def fast_alpha(obs, states, A, pi, w, mu, sigma):
dens = fast_gmmdensitysum(obs, w, mu, sigma)
return fast_iteration(len(obs), len(states), dens, A, pi)
# Usage
obs = np.concatenate((np.random.normal(0, 1, size = 500),
np.random.normal(1.5, 1, size = 500))).reshape(-1,1)
N = 2 # number of hidden states
M = 3 # number of mixture components
states = list(range(N))
pi = np.array([0.5, 0.5]) # initial probabilities
A = np.array([[0.8, 0.2], [0.3, 0.7]]) # transition matrix
mu = np.array([np.min(obs), np.median(obs), np.max(obs)]) # means of mixture components
sigma = np.array([1, 1, 1], dtype=np.float64) # variances of mixture components
w = np.array([[0.2, 0.3, 0.5], [0.6, 0.2, 0.2]]) # weights of mixture components
Note that the type of sigma
has been explicitly provided so it is correct. fast_normal_pdf
and fast_normal_pdf
are optimized implementations inspired by the good answer of @NickODell.
Benchmark
Here are performance results on my machine with a i5-9600KF CPU:
Initial code: 10417.4 µs x1
Andrej Kesely: 974.6 µs x11
Nick ODell: 335.1 µs x31
This implementation: 53.9 µs x193 <------
While this implementation is bigger than others, it is clearly the fastest implementation by a large margin. It is 193 times faster than the initial implementation on my machine, and more than 6 times faster than the other fastest one so far.
fast_gmmdensitysum
is the most expensive part (>60%), especially the np.exp
function (~40%) which is hard to optimize further.
Answered By - Jérôme Richard
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.