Bipolar sigmoid activation function

12/30/2022

Starting at about x=14.6 errors hit the 1% range, and above about x=16.6 game’s over due to division by zero. Look at the output of y = logit(sigmoid(x)) when x is of type float32, the default in Keras and, as far as I know, in most other frameworks, too:įigure 2: Numerical imprecisions with float32 The real potential problem, though, is the numerical instability that this to and fro may cause, resulting in an overflow in the extreme case. To repeat, we first run numbers through one function (sigmoid), only to convert them back using the inverse function (logit). So, they need to be converted back to raw values before being fed into sigmoid_cross_entropy_with_logits. In Keras, by contrast, the expectation is that the values in variable outputrepresent probabilities and are therefore bounded by - that’s why from_logitsis by default set to False. This is what sigmoid_cross_entropy_with_logits, the core of Keras’s binary_crossentropy, expects. In Deep Learning, logits usually and unfortunately means the ‘raw’ outputs of the last layer of a classification network, that is, the output of the layer before it is passed to an activation/normalization function, e.g. Figure 1: Curves you’ve likely seen before

0 Comments

Bipolar sigmoid activation function

Leave a Reply.

Author

Archives

Categories