Coursera Deep Learning Spec Course Module 1.pdf

  • binary classification taskları için last layerda sigmoid, hidden layerlarda ReLU kullan
  • almost never use sigmoid function except for last layer for large and small values of x, learning is small since gradient becomes too small tanh could be used instead
  • ReLU a = max(0, z) veya leaky ReLU a = max(0.01z, z) kullan

  • Linear activation function’ı 1 case hariç hiçbir yerde activation function olarak kullanmıyoruz çünkü linear activation function öğrenememeye sebep oluyor. Here is why:
  • Linear activation function şu casede kullanılabilir:
    • y is an element of Real number [-inf, inf]
    • sadece output layerda linear activation olacak, hidden layerlarda ReLU, tanh vs olacak

Do not initialize weights to 0 no learning at all

It is perfectly okey to have a for loop to calculate activation functions when there are more than one hidden layers.

cache to be used in backward propagation