Tuesday, December 19, 2023

[FIXED] Speed up iterative algorithm in Python

December 19, 2023 numpy, python No comments

Issue

I'm implementing the forward recursion for Hidden Markov Model: the necessary steps (a-b) are shown here

Below is my implementation

from scipy.stats import norm
import numpy as np
import random

def gmmdensity(obs, w, mu, sd):
    # compute probability density function of gaussian mixture 
    gauss_mixt = norm.pdf(obs, mu, sd)[:,None]*w
    return gauss_mixt

def alpha(obs, states, A, pi, w, mu, sigma):    
    dens = np.sum(gmmdensity(obs, w, mu, sigma), axis = 2)
    # scaling factor is used to renormalize probabilities in order to
    # avoid numerical underflow
    scaling_factor = np.ones(len(obs))
    alpha_matrix = np.zeros((len(states), len(obs)))
    
    # for t = 0
    alpha_matrix[:,0] = pi*dens[0]
    scaling_factor[0] = 1/np.sum(alpha_matrix[:,0], axis = 0)
    alpha_matrix[:,0] *= scaling_factor[0]
    
    # for t == 1:T
    for t in range(1, len(obs)):
        alpha_matrix[:,t] = np.matmul(alpha_matrix[:,t-1], A)*dens[t]
        scaling_factor[t] = 1/np.sum(alpha_matrix[:,t], axis = 0)
        alpha_matrix[:,t] *= scaling_factor[t]

    return alpha_matrix, scaling_factor

Let's generate some data to run the algorithm

obs = np.concatenate((np.random.normal(0, 1, size = 500), 
                        np.random.normal(1.5, 1, size = 500))).reshape(-1,1)
N = 2 # number of hidden states
M = 3 # number of mixture components
states = list(range(N))
pi = np.array([0.5, 0.5]) # initial probabilities
A = np.array([[0.8, 0.2], [0.3, 0.7]]) # transition matrix
mu = np.array([np.min(obs), np.median(obs), np.max(obs)]) # means of mixture components
sigma = np.array([1, 1, 1]) # variances of mixture components
w = np.array([[0.2, 0.3, 0.5], [0.6, 0.2, 0.2]]) # weights of mixture components

Let's see how fast the algorithm is

%timeit alpha(obs, states, A, pi, w, mu, sigma)
13.6 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Is there any possibility to make this code faster? I thought about using numba or cython, but it never fully worked in this case.

Solution

TL;DR: Numba can strongly speed up this code, especially with basic loops. This answer comes lately compared to the others, but it provides the fastest implementation so far.

This is indeed typically a situation where Numba can strongly speed up the computation. Indeed, np.matmul takes more than 1 µs on my machine to multiply a vector of only 2 items with a matrix of 2x2. This should take clearly no more than 0.01 µs on my machine (assuming data are in the L1 cache). Most of the time is lost in the Numpy call overheads. np.sum takes > 3 µs while it should take about few nanoseconds (for the same reason) : it is just a sum of 2 numbers!

To solve this efficiently in Numba you need to use basic loops and avoid creating new temporary arrays (allocations are expensive in such a basic loop). Note that gmmdensity cannot be easily translated to Numba since it calls norm.pdf of Scipy which is AFAIK not supported by Numba yet. Here is a fast implementation:

import numba as nb
import numpy as np

@nb.njit('(int64, int64, float64[:,::1], float64[:,::1], float64[::1])')
def fast_iteration(obsLen, statesLen, dens, A, pi):
    # scaling factor is used to renormalize probabilities in order to
    # avoid numerical underflow
    scaling_factor = np.ones(obsLen)
    alpha_matrix = np.zeros((statesLen, obsLen))
    
    # for t = 0
    alpha_matrix[:,0] = pi*dens[0]
    scaling_factor[0] = 1/np.sum(alpha_matrix[:,0], axis = 0)
    alpha_matrix[:,0] *= scaling_factor[0]
    tmp = np.zeros(statesLen)

    for t in range(1, obsLen):
        # Line: alpha_matrix[:,t] = np.matmul(alpha_matrix[:,t-1], A)*dens[t]
        for i in range(statesLen):
            s = 0.0
            for k in range(statesLen):
                s += alpha_matrix[k, t-1] * A[k, i]
            tmp[i] = s * dens[t, i]

        # Line: scaling_factor[t] = 1/np.sum(alpha_matrix[:,t], axis = 0)
        s = 0.0
        for i in range(statesLen):
            s += tmp[i]
        scaling_factor[t] = 1.0 / s

        # Line: alpha_matrix[:,t] *= scaling_factor[t]
        for i in range(statesLen):
            alpha_matrix[i, t] = tmp[i] * scaling_factor[t]

    return alpha_matrix, scaling_factor

@nb.njit('(float64[:,::1], float64[::1], float64[::1])')
def fast_normal_pdf(obs, mu, sigma):
    n, m = obs.shape[0], mu.size
    assert obs.shape[1] == 1
    assert mu.size == sigma.size
    inv_sigma = 1.0 / sigma
    one_over_sqrt_2pi = 1 / np.sqrt(2 * np.pi)
    result = np.empty((n, m))
    for i in range(n):
        for j in range(m):
            tmp = (obs[i, 0] - mu[j]) * inv_sigma[j]
            tmp = np.exp(-0.5 * tmp * tmp)
            result[i, j] = one_over_sqrt_2pi * inv_sigma[j] * tmp
    return result

@nb.njit('(float64[:,::1], float64[:,::1], float64[::1], float64[::1])')
def fast_gmmdensitysum(obs, w, mu, sigma):
    pdf = fast_normal_pdf(obs, mu, sigma)
    result = np.zeros((pdf.shape[0], w.shape[0]))
    for i in range(pdf.shape[0]):
        for k in range(w.shape[0]):
            s = 0.0
            for j in range(pdf.shape[1]):
                s += pdf[i, j] * w[k, j]
            result[i, k] = s
    return result

def fast_alpha(obs, states, A, pi, w, mu, sigma):    
    dens = fast_gmmdensitysum(obs, w, mu, sigma)
    return fast_iteration(len(obs), len(states), dens, A, pi)


# Usage

obs = np.concatenate((np.random.normal(0, 1, size = 500), 
                        np.random.normal(1.5, 1, size = 500))).reshape(-1,1)
N = 2 # number of hidden states
M = 3 # number of mixture components
states = list(range(N))
pi = np.array([0.5, 0.5]) # initial probabilities
A = np.array([[0.8, 0.2], [0.3, 0.7]]) # transition matrix
mu = np.array([np.min(obs), np.median(obs), np.max(obs)]) # means of mixture components
sigma = np.array([1, 1, 1], dtype=np.float64) # variances of mixture components
w = np.array([[0.2, 0.3, 0.5], [0.6, 0.2, 0.2]]) # weights of mixture components

Note that the type of sigma has been explicitly provided so it is correct. fast_normal_pdf and fast_normal_pdf are optimized implementations inspired by the good answer of @NickODell.

Benchmark

Here are performance results on my machine with a i5-9600KF CPU:

Initial code:        10417.4 µs      x1
Andrej Kesely:         974.6 µs     x11
Nick ODell:            335.1 µs     x31
This implementation:    53.9 µs    x193    <------

While this implementation is bigger than others, it is clearly the fastest implementation by a large margin. It is 193 times faster than the initial implementation on my machine, and more than 6 times faster than the other fastest one so far.

fast_gmmdensitysum is the most expensive part (>60%), especially the np.exp function (~40%) which is hard to optimize further.

Answered By - Jérôme Richard

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, December 19, 2023

[FIXED] Speed up iterative algorithm in Python

Issue

Solution

Benchmark

0 comments:

Post a Comment

Popular Posts

Labels