Issue
The question is simple. I have three equations: OmegaMIX = beta1Omega1+beta2Omega2 and I want to find the best coefficients by the linear regression, where intercept is 0. I have obtained the code, but the beta1 and beta2 are probabilities, so it must be beta1+beta2 = 1, which is not the case. How can this be set? This is the existing code:
from sklearn import linear_model
inputfilename = 'JO.csv' #input
df = pd.read_csv(inputfilename)
x = df.drop('OmegaMIX',axis=1) #Reference temperature
y = df['OmegaMIX'] #Multiparametric LIRs
regr = linear_model.LinearRegression(positive=True,fit_intercept=False)
regr.fit(x, y)
print('\u03B21 = ', regr.coef_[0])
print('\u03B22 = ', regr.coef_[1])
print("Quality = ", regr.score(x, y, sample_weight=None))
and the output is:
β1 = 0.33995522604783796
β2 = 0.5794270911721245
Quality = 0.9995968335914979
The input file is:
OmegaMIX,Omega1,Omega2
2.70,4.43,2.09
1.84,3.00,1.37
0.50,1.19,0.17
Solution
I don't think this can be done on sklearn. I would rather use CVXPY as you can control the constraints.
Here's an example :
import pandas as pd
import cvxpy as cp
df = pd.read_csv('JO.csv')
# Setting out matrix X and our vector y, cvxpy needs them to be numpy arrays
X = df.drop('OmegaMIX', axis=1).values
y = df['OmegaMIX'].values
# Defining the variables of the problem
beta1 = cp.Variable()
beta2 = cp.Variable()
# setting the constraints
constraints = [beta1 >= 0, beta2 >= 0, beta1 + beta2 == 1]
# Defining the linear regression problem
objective = cp.Minimize(cp.sum_squares(X[:, 0] * beta1 + X[:, 1] * beta2 - y))
# Solving the probelem
problem = cp.Problem(objective, constraints)
problem.solve()
# printing the betas
print('\u03B21 =', beta1.value)
print('\u03B22 =', beta2.value)
print('\u03B21 + \u03B22 =', (beta1.value + beta2.value))
print("Quality =", problem.value)
Answered By - Jeremy Tarrieu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.