Caffe net juggles two "streams" of numbers.

The first is the data "stream": images and labels pushed through the net. As these inputs progress through the net they are converted into high-level representation and eventually into class probabilities vectors (in classification tasks).

The second "stream" holds the parameters of the different layers, the weights of the convolutions, the biases etc. These numbers/weights are changed and learned during the train phase of the net.

Despite the fundamentally different role these two "streams" play, caffe nonetheless use the same data structure, `blob`

, to store and manage them.

However, for each layer there are two **different** blobs vectors one for each stream.

Here's an example that I hope would clarify:

```
import caffe
solver = caffe.SGDSolver( PATH_TO_SOLVER_PROTOTXT )
net = solver.net
```

If you now look at

```
net.blobs
```

You will see a dictionary storing a "caffe blob" object for each layer in the net. Each blob has storing room for both data and gradient

```
net.blobs['data'].data.shape # >> (32, 3, 224, 224)
net.blobs['data'].diff.shape # >> (32, 3, 224, 224)
```

And for a convolutional layer:

```
net.blobs['conv1/7x7_s2'].data.shape # >> (32, 64, 112, 112)
net.blobs['conv1/7x7_s2'].diff.shape # >> (32, 64, 112, 112)
```

`net.blobs`

holds the first data stream, it's shape matches that of the input images up to the resulting class probability vector.

On the other hand, you can see another member of `net`

```
net.layers
```

This is a caffe vector storing the parameters of the different layers.

Looking at the first layer (`'data'`

layer):

```
len(net.layers[0].blobs) # >> 0
```

There are no parameters to store for an input layer.

On the other hand, for the first convolutional layer

```
len(net.layers[1].blobs) # >> 2
```

The net stores one blob for the filter weights and another for the constant bias. Here they are

```
net.layers[1].blobs[0].data.shape # >> (64, 3, 7, 7)
net.layers[1].blobs[1].data.shape # >> (64,)
```

As you can see, this layer performs 7x7 convolutions on 3-channel input image and has 64 such filters.

Now, how to get the gradients? well, as you noted

```
diffs = net.backward(diffs=['data','conv1/7x7_s2'])
```

Returns the gradients of the *data* stream. We can verify this by

```
np.all( diffs['data'] == net.blobs['data'].diff ) # >> True
np.all( diffs['conv1/7x7_s2'] == net.blobs['conv1/7x7_s2'].diff ) # >> True
```

(**TL;DR**) You want the gradients of the parameters, these are stored in the `net.layers`

with the parameters:

```
net.layers[1].blobs[0].diff.shape # >> (64, 3, 7, 7)
net.layers[1].blobs[1].diff.shape # >> (64,)
```

To help you map between the names of the layers and their indices into `net.layers`

vector, you can use `net._layer_names`

.

**Update** regarding the use of gradients to visualize filter responses:

A gradient is normally defined for a **scalar** function. The loss is a scalar, and therefore you can speak of a gradient of pixel/filter weight with respect to the scalar loss. This gradient is a single number per pixel/filter weight.

If you want to get the input that results with maximal activation of a **specific** internal hidden node, you need an "auxiliary" net which loss is exactly a measure of the activation to the specific hidden node you want to visualize. Once you have this auxiliary net, you can start from an arbitrary input and change this input based on the gradients of the auxilary loss to the input layer:

```
update = prev_in + lr * net.blobs['data'].diff
```