Recipe for CNN creation

Yanwei Liu
2 min readJul 26, 2020

These contents are retrieved from Hands-On Convolutional Neural Networks with TensorFlow for note only. If you think the content is helpful to you, please purchase the original book for a more detailed discussion on Convolutional Neural Networks.

  1. Use convolution layers with kernels of size 3x3. Larger kernels are more expensive in terms of both parameters and computation. On top of this, as we saw in the earlier chapters, you can stack Conv layers to produce a bigger receptive field, and with the benefit of more nonlinear activations.
  2. First layer convolutions should generally have at least 32 filters. This way, deeper layers are not restricted by the number of features that the first layer extracted.
  3. Try to avoid the use of pooling layers, if possible. Instead, use convolution layers with strides of 2. This will downsample our inputs like pooling but it doesn’t just throw away valuable information like pooling does. Also, using strided convolutions is like combining Conv and pooling together in one layer.
  4. When you decrease the spatial size of your feature maps, you should increase the number of filters you use so that you don’t lose too much information too quickly. In deep networks, avoid reducing the spatial size too quickly in the first layers.
  5. Follow the advice in this chapter about starting your network design as small and then gradually increasing the complexity so that you avoid overfitting issues.
  6. Use batchnorm. It really helps with training your networks!
  7. Gradually decrease the spatial size of your feature maps as you get deeper into your network.
  8. Minimize the number of FC layers (use dropout before the final layer). Use FC only if you need to concatenate some scalar features in the end. (You can even avoid that by encoding things on the input channel)
  9. If you require a large receptive field (detection or classification where object size is closer to the total image size), try using dilated convolutions with exponential dilation factor for each layer. This way, you will grow your receptive field very quickly while keeping the number of parameters low.
  10. If the network becomes deep and training loss does not decrease, then consider using residual connections.
  11. After you have your network accuracy within expected values, and if computation cost is an issue, then you might look at techniques like depthwise convolution

--

--