PyTorch如何在訓練過程中對多種不同損失函數的損失值進行反向傳播?

Yanwei Liu
4 min readDec 12, 2021

假設output為模型的輸出;target為ground truth label;而criterion_a、criterion_b、criterion_c分別為三種不同的損失函數,其形式可能是其他寫法,但本文為求簡潔,故簡略寫成以下之形式:

loss_a = criterion_a(output, target)
loss_b = criterion_b(output, target)
loss_c = criterion_c(output, target)

此時可有兩種作法:

1. 相加後直接進行反向傳播

通常這一種作法不太會有錯誤產生,接下來介紹的第二種可能就會有錯誤出現。

loss = loss_a + loss_b + loss_c
loss.backward()

2.各別進行反向傳播

若按照下面的寫法可能會出現RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed.的錯誤

loss_a.backward()
loss_b.backward()
loss_c.backward()

可以將第一個進行backward的loss添加retain_graph=True,如下所示:

loss_a.backward(retain_graph=True)
loss_b.backward(retain_graph=True) # 2022/04/24修正
loss_c.backward()

3.針對獨立的網路,應該要使用不同的Optimizer

# 參考 https://discuss.pytorch.org/t/how-to-have-two-optimizers-such-that-one-optimizer-trains-the-whole-parameter-and-the-other-trains-partial-of-the-parameter/62966# https://blog.csdn.net/weixin_44058333/article/details/99701876# 針對同模型的不同部分參數
optim1 = torch.optim.SGD(model.parameters(), lr=0.001)
optim2 = torch.optim.Adam(model.conv3.parameters(), lr=0.05)
==================================================================# 針對兩個不同模型optimizer1= torch.optim.SGD(net1.parameters(), learning_rate, momentum,weight_decay)
optimizer2= torch.optim.SGD(net2.parameters(), learning_rate, momentum,weight_decay)
.....

loss1 = loss()
loss2 = loss()
optimizer1.zero_grad() #set the grade to zero
loss1.backward(retain_graph=True) #retain graph for second backward
optimizer1.step()
optimizer2.zero_grad() #set the grade to zero
loss2.backward()
optimizer2.step()

參考資料

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time — PyTorch Forums

pytorch程序中错误的集锦 — 知乎 (zhihu.com)

RuntimeError: Trying to backward through the graph a second time…_Huiyu Blog-CSDN博客

--

--