DataScience: Neuron Activation Function

In our past article on data science on tensorflow, we had briefly review on the basic of how neural network works. Seems that we are able to create some graph using tensorflow but the details on the neuron activation function is critical for us to understand. Let explore the topic.

What is Activation Function?

  • Activation functions are functions used in a neural network to compute the weighted sum of inputs and biases, which is in turn used to decide whether a neuron can be activated or not.
  • It manipulates the presented data and produces an output for the neural network that contains the parameters in the data.
  • The activation functions are also referred to as transfer functions.
  • The activation function takes the decision of whether or not to pass the signal. In this case, it is a simple step function with a single parameter – the threshold.
  • Now, when we learn something new ( or unlearn something ), the threshold and the synaptic weights of some neurons change. This creates new connections among neurons making the brain learn new things.

Non-Linear Activation Functions

  • Modern neural network models use non-linear activation functions.
  • They allow the model to create complex mappings between the network’s inputs and outputs, such as images, video, audio, and data sets that are non-linear or have high dimensionality. 
  • Majorly there are 3 types of Non-Linear Activation functions.
    • Sigmoid Activation Functions
    • Rectified Linear Units or ReLU
    • Tanh action

Sigmoid Activation

  • Its behavior is similar to that of a perceptron. And it generates outputs between 0 and 1.
  • However, there are great chances that it would generate outputs other than 0 and 1. And it is continuous, hence, its susceptibility to the vanishing gradient problem.

Tanh action

  • It is also known as the hyperbolic tangent activation function.
  • Similar to sigmoid, tanh also takes a real-valued number but squashes it into a range between -1 and 1.
  • Unlike sigmoid, tanh outputs are zero-centered since the scope is between -1 and 1. 
  • Vanishing gradient: When inputs become small or large, the function saturates at -1 or 1, with a derivative extremely close to 0. Thus it has almost no gradient to propagate back through the network, so there is almost nothing left for lower layers.

ReLU

\[f(x) = \max(0,x)\]
  • The ReLU accelerates the convergence of a neuron’s stochastic gradient descent thereby increasing the learning speed of the entire network. However, for the speed to be this fast, it impairs the weight of inputs to aid transmission.
  • The output of ReLU does not have a maximum value (It is not saturated) and this helps Gradient Descent 

Using Tensorflow on activation function

  • Before we get start to use tensorflow on activation function.
  • We will try to work on the using tensorflow session (for V1) to to generate computational graph arranging to graph of nodes.
  • Next, we will try to activate the 3 commonly use activation functions.
  • Include ReLU6 to restrict the value to 6 and softplus which using below function,
  • Softplus function is quite similar to the Rectified Liner Unit (ReLU) function, with the main difference being softplus function’ differentiability at the x = 0. 
f(x) = ln(1 + e^x)
  • Finally, we get 5x array of output for 5 activation functions.

Thanks for reading our neural network activation function using tensorflow. See you again and take care.

Leave a comment