Weight and bias initialization routines for Sigmoidal Feedforward Network

摘要

The success of the Sigmoidal Feedforward Networks in the solution of complex learning task can be attributed to their Universal Approximation Property. These networks are trained using non-linear iterative optimization method (of first-order or second-order) to solve a learning task. The convergence rate in Sigmoidal Feedforward Network training is affected by the initial choice of weights, therefore, in this paper, we propose two new weight initialization routines (Routine-1 and Routine-2) using characteristics of input and output data and property of activation function. Routine-1 uses the linear dependency of weight update step size on derivative of activation function and thus, initialize weights and bias to activate the activation function region near zero (input), where the derivative is maximum, therefore, increasing the weight update step size, and hence, the convergence speed. The same principle is used to derive Routine-2, that initialize weights and bias to activate distinct point in the significant range of activation function (where significant range defines the non-saturated region in activation function), such that, each node evolves independently of each other, and act as distinct feature identifier. Initializing weights in significant range reduces chances of (hidden) nodes getting stuck in saturated state. The networks initialized using proposed routines has higher convergence and higher probability to achieve deeper minima. The efficiency of proposed routines is evaluated by comparing them to conventional random weight initialization routine and 11 weight initialization routines proposed in literature (4 well established routines and 7 recently proposed routines) for several benchmark problems. The proposed routine is also tested for larger networks sizes and larger datasets such as MNIST. The results show that the performance of proposed routines is better than conventional random weight initialization routine and 11 established weight initialization routines.