The softmax function A.K.A normalized exponential function, squashes a K-dimensional input vector into a K-dimensional output vector of real values where each entry is in the range (0, 1).
It’s like a sigmoid function, but it also divides each output such that the total sum of the outputs add up to 1.
Hence, it tells you the probability that any of the classes are true compared to the other classes.
The formula is:
for j = 1, …, K.
Where:
z = vector of the inputs
j = indexes of the output units
so that any z input will be the division between e to the power of z and the sum of all (e to the power of z)