Issue
I have designed a simple MLP model trained on 6k data samples.
class MLP(nn.Module):
def __init__(self,input_dim=92, hidden_dim = 150, num_classes=2):
super().__init__()
self.input_dim = input_dim
self.num_classes = num_classes
self.hidden_dim = hidden_dim
#self.softmax = nn.Softmax(dim=1)
self.layers = nn.Sequential(
nn.Linear(self.input_dim, self.hidden_dim),
nn.ReLU(),
nn.Linear(self.hidden_dim, self.hidden_dim),
nn.ReLU(),
nn.Linear(self.hidden_dim, self.hidden_dim),
nn.ReLU(),
nn.Linear(self.hidden_dim, self.num_classes),
)
def forward(self, x):
x = self.layers(x)
return x
and the model has been instantiated
model = MLP(input_dim=input_dim, hidden_dim=hidden_dim, num_classes=num_classes).to(device)
optimizer = Optimizer.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss()
and the hyperparameters:
num_epoch = 300 # 200e3//len(train_loader)
learning_rate = 1e-3
batch_size = 64
device = torch.device("cuda")
SEED = 42
torch.manual_seed(42)
My implementation mostly follows this question. I save the model as pre-trained weights model_weights.pth
.
The accuracy of model
on the test dataset is 96.80%
.
Then, I have another 50 samples (in finetune_loader
) that I am trying to fine-tune the model on these 50 samples:
model_finetune = MLP()
model_finetune.load_state_dict(torch.load('model_weights.pth'))
model_finetune.to(device)
model_finetune.train()
# train the network
for t in tqdm(range(num_epoch)):
for i, data in enumerate(finetune_loader, 0):
#def closure():
# Get and prepare inputs
inputs, targets = data
inputs, targets = inputs.float(), targets.long()
inputs, targets = inputs.to(device), targets.to(device)
# Zero the gradients
optimizer.zero_grad()
# Perform forward pass
outputs = model_finetune(inputs)
# Compute loss
loss = criterion(outputs, targets)
# Perform backward pass
loss.backward()
#return loss
optimizer.step() # a
model_finetune.eval()
with torch.no_grad():
outputs2 = model_finetune(test_data)
#predicted_labels = outputs.squeeze().tolist()
_, preds = torch.max(outputs2, 1)
prediction_test = np.array(preds.cpu())
accuracy_test_finetune = accuracy_score(y_test, prediction_test)
accuracy_test_finetune
Output: 0.9680851063829787
The accuracy remains the same as before fine-tuning the model to 50 samples, I checked, and the output probabilities are also the same.
What could be the reason? Am I making some mistakes in the code for fine-tuning?
Solution
You have to re-initialize the optimizer with the new model -- the model_finetune object. Currently, as I see it in your code, it seems to still use the optimizer which is initialized with your old model weights -- model.parameters().
Answered By - Ashwath S
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.