
Fundamentals of AI
Machine learning has the ability to respond to scenarios with high-dimensional feature spaces. For example, “Go”, the game, can render $10^{10^{24}}$ possible games, $84 \times 84$ images can render $10^{10^4}$ possible images, whilst WordNet's English database contains $\sim 150,000$ words.
Deep hierarchical representations can be constructed to solve complex classification problems. Binary classification tasks consist of attributes of $0$ or $1$ stages and can be executed in cartesian or polar coordinates. Note that the $\operatorname{softmax}$ function will often be used to funnel values in these various coordinate systems into a $[0, 1]$ range. However, firstly, the use of $\operatorname{heaviside}$, $\operatorname{sigmoid}$, $\operatorname{tanh}$ functions shall be considered.
We shall establish a parametric function that can model the relation between an input $x = \{x_1, \dots, x_N\}$ and an output $y = f(x;\theta)$ where $\theta = \{w = (w_1, \dots, w_N), w_0 = b \}$, denoted as a collection of parameters, namely the weights and biases. Note that the input layer has an $N$-dimensional input space whereas the outcome of the hidden unit shall be expressed as $h = x \cdot w + b = \sum_{i=1}^{N} w_i x_i + b$ such that the output of the network can be expressed as $f(x;\theta) = H(h) = H(x \cdot w + b)$. Please refer to \autoref{fig: Basic Neural Network} to see a simple example where the $\operatorname{step}$ function acts as a gate to normalize outputs for classification purposes.
Let the goal be to approximate the objective function $f^$ from the input to the output space. A binary classifier $f^(x) = y \in [0, 1]$ could be used or a cartesian or polar classifier.