Let's start from simple working example with plain loss function and regular backward. We will build short computational graph and do some grad computations on it.

Code:

``import torchfrom torch.autograd import gradimport torch.nn as nn# Create some dummy data.x = torch.ones(2, 2, requires_grad=True)gt = torch.ones_like(x) * 16 - 0.5  # "ground-truths" # We will use MSELoss as an example.loss_fn = nn.MSELoss()# Do some computations.v = x + 2y = v ** 2# Compute loss.loss = loss_fn(y, gt)print(f'Loss: {loss}')# Now compute gradients:d_loss_dx = grad(outputs=loss, inputs=x)print(f'dloss/dx: {d_loss_dx}')``

Output:

``Loss: 42.25dloss/dx:(tensor([[-19.5000, -19.5000], [-19.5000, -19.5000]]),)``

Ok, this works! Now let's try to reproduce error "grad can be implicitly created only for scalar outputs". As you can notice, loss in previous example is a scalar. `backward()` and `grad()` by defaults deals with single scalar value: `loss.backward(torch.tensor(1.))`. If you try to pass tensor with more values you will get an error.

Code:

``v = x + 2y = v ** 2try:    dy_hat_dx = grad(outputs=y, inputs=x)except RuntimeError as err:    print(err)``

Output:

`grad can be implicitly created only for scalar outputs`

Therefore, when using `grad()` you need to specify `grad_outputs` parameter as follows:

Code:

``v = x + 2y = v ** 2dy_dx = grad(outputs=y, inputs=x, grad_outputs=torch.ones_like(y))print(f'dy/dx: {dy_dx}')dv_dx = grad(outputs=v, inputs=x, grad_outputs=torch.ones_like(v))print(f'dv/dx: {dv_dx}')``

Output:

``dy/dx:(tensor([[6., 6.],[6., 6.]]),)dv/dx:(tensor([[1., 1.], [1., 1.]]),)``

NOTE: If you are using `backward()` instead, simply do `y.backward(torch.ones_like(y))`.

Source: stackoverflow.com