Issue
I'm attempting to use scikit-learn.linear_model's LinearRegression find the multiple linear regression coefficients for different variables at each latitude and longitude point along the time dimension like so:
for i in range(len(data.lat)):
for j in range(len(data.lon)):
storage_dframe[i, j, :] = LinearRegression().fit(np.array((data.ivar1.values[:, i, j].reshape(-1, 1),
data.ivar2.values[:, i, j].reshape(-1, 1)).reshape(len(data.time), 2),
data.dvar.values[:, i, j].reshape(len(data.time)).coef_)
While this general form works, there are abundant NaN values in my data because it comes from real observations. I generally do not want to impute data whenever possible, trying to preserve whatever real relations there might be. Is it possible to copy a behavior from scipy.stats.linregress, where "Missing values are considered pair-wise: if a value is missing in x, the corresponding value in y is masked?" This feels like the best route; otherwise, could I add a conditional clause along the lines of
if data.ivar1[:, i, j].isnull() or data.ivar[:, i, j].isnull() == True:
storage_dfram[i, j, :] = np.nan
else:
storage_dframe[i, j, :] = LinearRegression().fit(np.array((data.ivar1.values[:, i, j].reshape(-1, 1),
data.ivar2.values[:, i, j].reshape(-1, 1)).reshape(len(data.time), 2),
data.dvar.values[:, i, j].reshape(len(data.time)).coef_)
I've attempted essentially that, with no success. Please feel free to chime in!
Solution
this boolean clause handles it:
if data.isel(lat=i,lon=j).ivar1.isnull().any() or data.isel(lev=2,lat=i,lon=j).ivar2.isnull().any() or data.isel(lev=2, lat=i,lon=j).ivar3.isnull().any() or data.isel(lev=0, lat=i,lon=j).ivar4.isnull().any() or data2.isel(lat=i, lon=j).dvar.isnull().any() == True:
storage_dframe[i, j, :] = np.nan
else:
storage_dframe[i, j, :] = LinearRegression(...)
where ivarx is the xth independent variable and dvar is the dependent variable.
Answered By - Logan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.