Issue
Using the iris dataset as a hypothetical hello world example:
import pandas as pd
from sklearn import datasets
iris = datasets.load_iris()
df = pd.DataFrame(iris['data'], columns = iris['feature_names'])
df['iris_class'] = pd.Series(iris['target'], name = 'target_values')
df['iris_class_name'] = df['iris_class'].replace([0,1,2], ['iris-' + species for species in iris['target_names'].tolist()])
df.columns = df.columns.str.replace("[() ]", "")
print(df.head())
Let us say I want to use tf.keras.layers.Embedding instead of one-hot/dummy encoding as part of ANN for regression. e.g.:
iris_class_name + sepalwidthcm + petallengthcm -> sepallengthcm
where sepallengthcm is the dependent variable. I came across this:
city_lookup = tf.keras.layers.StringLookup(vocabulary = city_vocabulary, mask_token = None);
city_embedding= tf.keras.Sequential([
city_lookup,
tf.keras.layers.Embedding(len(city_vocabulary) + 1, embedding_dimension)
], "city_embedding")
city = features["city"]
city_embedding_output = city_embedding(city)
but am not sure how to exactly use it in my use case. Any pointers very much welcome. Thanks!
Solution
You can map iris_class_name
to n-dimensional vector representations and then concatenate with the other continuous features:
import pandas as pd
from sklearn import datasets
import numpy as np
import tensorflow as tf
iris = datasets.load_iris()
df = pd.DataFrame(iris['data'], columns = iris['feature_names'])
df['iris_class'] = pd.Series(iris['target'], name = 'target_values')
df['iris_class_name'] = df['iris_class'].replace([0,1,2], ['iris-' + species for species in iris['target_names'].tolist()])
df.columns = df.columns.str.replace("[() ]", "")
vocab = df['iris_class_name'].unique()
embedding_dimension = 10
lookup = tf.keras.layers.StringLookup(vocabulary = vocab, mask_token = None)
embedding= tf.keras.Sequential([
lookup,
tf.keras.layers.Embedding(len(vocab) + 1, embedding_dimension)
])
names = df['iris_class_name'].to_numpy()
embedding_output = embedding(names)
features = np.concatenate((embedding_output, df[['sepalwidthcm', 'petallengthcm']].to_numpy()), axis=-1)
print(features.shape)
(150, 12)
Since you have 3 unique iris class names, you could also simply create an integer-to-vector dictionary manually, but it is up to you.
Answered By - AloneTogether
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.