Issue
I'm trying to practice a simple exercise in imputing categorical variables. I'm aware that SimpleImputer works directly on categorical variables, but I'm just doing an exercise for myself. I'm having a very strange error with the final line of this code.
import pandas as pd; import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OrdinalEncoder
df=pd.DataFrame({"cat":["S","M","L","M","S","S",np.nan]})
encoder=OrdinalEncoder(); imputer=SimpleImputer(strategy="most_frequent")
df["encoded"]=encoder.fit_transform(df[["cat"]])
df["encoded_imp"]=imputer.fit_transform(df[["encoded"]])
#This final line is not working
df["encoded_cat"]=encoder.inverse_transform(df[["encoded_imp"]])
The final line above is not working.
The error is (Please note that my Python installation is installed by/within Julia):
ValueError Traceback (most recent call last)
Cell In[10], line 9
6 df["encoded"]=encoder.fit_transform(df[["cat"]])
7 df["encoded_imp"]=imputer.fit_transform(df[["encoded"]])
----> 9 df["encoded_cat"]=encoder.inverse_transform(df[["encoded_imp"]])
File ~\.julia\conda\3\x86_64\lib\site-packages\pandas\core\frame.py:4091, in DataFrame.__setitem__(self, key, value)
4088 self._setitem_array([key], value)
4089 else:
4090 # set column
-> 4091 self._set_item(key, value)
File ~\.julia\conda\3\x86_64\lib\site-packages\pandas\core\frame.py:4300, in DataFrame._set_item(self, key, value)
4290 def _set_item(self, key, value) -> None:
4291 """
4292 Add series to DataFrame in specified column.
4293
(...)
4298 ensure homogeneity.
4299 """
-> 4300 value, refs = self._sanitize_column(value)
4302 if (
4303 key in self.columns
4304 and value.ndim == 1
4305 and not isinstance(value.dtype, ExtensionDtype)
4306 ):
4307 # broadcast across multiple columns if necessary
4308 if not self.columns.is_unique or isinstance(self.columns, MultiIndex):
File ~\.julia\conda\3\x86_64\lib\site-packages\pandas\core\frame.py:5040, in DataFrame._sanitize_column(self, value)
5038 if is_list_like(value):
5039 com.require_length_match(value, self.index)
-> 5040 return sanitize_array(value, self.index, copy=True, allow_2d=True), None
File ~\.julia\conda\3\x86_64\lib\site-packages\pandas\core\construction.py:608, in sanitize_array(data, index, dtype, copy, allow_2d)
606 subarr = data
607 if data.dtype == object:
--> 608 subarr = maybe_infer_to_datetimelike(data)
609 if (
610 object_index
611 and using_pyarrow_string_dtype()
612 and is_string_dtype(subarr)
613 ):
614 # Avoid inference when string option is set
615 subarr = data
File ~\.julia\conda\3\x86_64\lib\site-packages\pandas\core\dtypes\cast.py:1172, in maybe_infer_to_datetimelike(value)
1169 raise TypeError(type(value)) # pragma: no cover
1170 if value.ndim != 1:
1171 # Caller is responsible
-> 1172 raise ValueError(value.ndim) # pragma: no cover
1174 if not len(value):
1175 return value
ValueError: 2
EDIT: I can fix this with
df["encoded_cat"]=encoder.inverse_transform(df[["encoded_imp"]]).flatten()
but I'm still curious why the original error.
Solution
If you have a certain shape to the right hand side, you need the same shape to the left hand side if your values have dtype object
.
# shape=(7, ) vs shape=(7, 1) => it doesn't work
df["encoded_cat"] = encoder.inverse_transform(df[["encoded_imp"]])
# shape=(7, 1) vs shape=(7, 1) => it works
df[["encoded_cat"]] = encoder.inverse_transform(df[["encoded_imp"]])
# df[["encoded_cat"]] = encoder.inverse_transform(np.expand_dims(df["encoded_imp"], 1))
# shape=(7, ) vs shape=(7, ) => it works
df["encoded_cat"] = encoder.inverse_transform(df[["encoded_imp"]]).flatten
# df["encoded_cat"] = encoder.inverse_transform(df[["encoded_imp"]]).squeeze()
So you need to expand the dimension for encode.inverse_transform
works then reduce the dimension to fit to your column. There are many function to numpy to manipulate the dimensions: np.hstack, np.vstack, np.newaxis (broadcasting), np.expand_dims, np.squeeze, etc.
It works if you force a "fixed" dtype:
df["encoded_cat"] = encoder.inverse_transform(df[["encoded_imp"]]).astype(str)
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.