https://miro.medium.com/max/1200/0*kX9jvPZHH8I2BJ4R

Original Source Here

# 4 Key Design Considerations for Your Neural Network Model

## From the layout of layers to optimization rules, pay attention to these elements.

Artificial neural networks (ANNs) are a commonly used tool in deep learning. In this earlier tutorial, you can learn what they are, learn their basic structure, and code a simple neural network with only one neuron. When you design your own neural networks, there are a number of considerations to take into account. This article will describe a few.

# Layout of Network Layers

Rarely if ever are neural networks as simple as one neuron. For example, the majority of neural networks have at least several layers. You will need hidden layers, layers of artificial neurons between the input and output neuron layers, if the data are to be separated in a non-linear fashion.

You might want to think about each hidden neuron as a linear classifier. The number of neurons in the first hidden layer should equal the number of lines you would need to draw to classify your data. The later hidden layers and the output layer connect the various linear classifiers.

# Activation Functions

In neural networks, an **activation function** is a function that determines the artificial neuron’s output, given specific inputs. In my earlier tutorial, we used a sigmoid function, which has the advantage of forcing outputs to be within a specific range. Another advantage is that a sigmoid function is monotonic — in other words, the value order of inputs is the same as the value order of outputs. A disadvantage of sigmoid functions is that especially where the sigmoid curve is relatively flat, learning can be slow.

Another popular type of activation function is called the **rectified linear unit (ReLU)**. The value of this function is simply 0 if *x *is less than 0. Otherwise, it is *x*. ReLUs enable a faster learning process, even if they create an arbitrary distinction between negative and positive values of *x. *In their advantages and disadvantages, hyperbolic tangent or tanh activation functions tend to be a happy medium between sigmoid and ReLU.

# Loss Functions

**Loss** is merely the prediction error of the neural network, determined in each pass-through. Ideally, it should be minimized. The loss function, or the function of loss against output and predicted value, is used to update the weights of the neural network for the next pass-through. The calculation of the new weights is based in some way on the **gradient**, a function representing the slope of the loss function at each point. Different types of loss functions should be used for different types of regression or classification tasks, as described in more detail here.

# Optimization Rules

An **optimizer** is an algorithm or other method which will update the attributes of the neural network, in order to minimize the losses. For example, it can account for the history of gradient updates, rather than only updating the gradient from a single set — or **batch** — of data samples. It may incorporate **momentum** — in other words, the newest update will be the weighted average of all previous updates, with the older weights decayed exponentially.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot