AI in baby steps, experiment 1: making and using a perceptron

Ruwan Rajapakse
Dec 10, 2024
8 min read

Updated: Dec 11, 2024

Spoiler: VBA code snippet and working 10-dimensional perceptron emulator included below!

Hello, reader! I’m not an AI—though, over time, it might become increasingly difficult to tell the difference. If all goes well and I’m able to continue this series of curious experiments, we may find ourselves confronting that very question. And I don’t just mean that generative AI is improving so rapidly that its output could soon surpass what we flesh-and-blood humans can create.

No, what I’m suggesting runs deeper: the fundamental building blocks and organizing principles of AI may not be so different from those at work in our own brains. These principles, which enable our minds to learn from the world we’re born into and develop intuitive understandings of it, seem to resonate more closely than we might care to admit with the concepts underpinning the learning machines we’re designing and refining at an exponential pace.

Machines are quickly acquiring knowledge, forming perspectives, and developing intuitions about the world—and about how to interact with it—much like we humans do.

I’ll delve more into this unsettling thought later. But for now, let’s take a step back to understand what I’m trying to achieve with this series of experiments.

I aim to introduce you to several key AI building blocks—such as perceptrons, multi-layer perceptrons, convolutional neural networks, transformers, and more—by exploring their conceptual foundations through a simple and practical approach. We'll start by discussing the theory and motivation behind each building block, then move on to demonstrate how they can be implemented using basic imperative programming.

My goal is to make these concepts accessible to "ancillary" IT professionals—such as product managers, quality assurance engineers, business analysts, project managers, business managers, and others—who are eager to master the fundamentals of AI but may not have experience with advanced machine learning tools and frameworks. All you need is a first-year college-level understanding of mathematics and some prior experience with very simple programming.

To keep things straightforward, I won’t dive deeply into the history of how these building blocks came to be. Instead, I’ll take a practical approach: introducing what they are, how they work, and why they are useful. We’ll explore their utility by building each building block and putting it to practical use!

With that groundwork laid, let’s jump into our first exciting experiment: understanding and constructing a perceptron—the most basic learning machine, which still serves as the foundation for many sophisticated deep learning systems today. Fascinating, isn’t it? Buckle up, and let’s get started!

The concept behind a perceptron is straightforward: can we design a machine capable of learning to identify and classify objects in the real world? For example, could we create a system that recognizes a flower as Iris setosa based on its petal length and width? Imagine providing the machine with a large dataset containing various flower types and their corresponding petal dimensions, enabling it to learn the characteristics of Iris setosa. A perceptron is precisely such a machine.

A perceptron is an artificial neuron that observes phenomena and classifies them, similar to how neurons in the human brain function. It accomplishes this by learning an algebraic function that maps observable quantitative data to a classification. This learning process involves training the perceptron with a dataset of pre-classified examples.

The example of identifying Iris setosa based solely on petal length and width is a simplified, or "toy," example. In reality, many flower types may overlap dimensionally with Iris setosa. However, for the sake of simplicity, let’s assume there are only three flower types in the world—one of them being Iris setosa—and that they are broadly distinct in terms of dimensions. Below are the first ten data points from such a pre-classified dataset, commonly referred to as "training data" in machine learning.

Petal Length (x1)	Petal Width (x2)	Flower (just FYI)	Classification (y)
5.1	1.9	Iris-setosa	1
6.7	5.7	Iris-virginica	-1
7.2	6.1	Iris-virginica	-1
5.1	1.5	Iris-setosa	1
5	1.3	Iris-setosa	1
4.6	1.5	Iris-setosa	1
5.7	3.5	Iris-versicolor	-1
5.2	1.5	Iris-setosa	1
5.8	5.1	Iris-virginica	-1
6.4	5.3	Iris-virginica	-1
4.8	1.4	Iris-setosa	1

If we plot the values of the two variables (flower length x1 and flower width x2) against each other, we can see that the dimensions of the iris setosa form a distinct cluster of data points (see green cluster below), suggesting the possibility that we could identify an algebraic function that defines a line which separates their data points from those of the other flowers (see blue line).

At this point, let's think a little further. Suppose flowers were more easily identifiable not just by petal length and width but also by stalk length. In that case (as in the real world), we would have three variables—x1, x2, and x3—that influence the classification of a flower.

Here’s a key intuition to grasp before moving forward: if there were three variables determining flower type, plotting x1, x2, and x3 in three-dimensional space would reveal clusters of data points for Iris setosa. The dataset representing the flower's petal length, petal width, and stalk length would occupy a specific, localized region in that 3D space. In this scenario, we could imagine a two-dimensional plane in the 3D space that separates the region corresponding to Iris setosa from those of other flowers.

Now, if we had four or more variables (e.g., petal length, petal width, stalk length, and flower height) that helped identify a flower even more accurately, the concept extends further. We could define a multi-dimensional "hyperplane"—a flat surface in higher-dimensional space. Although such a hyperplane is hard to visualize, it can be represented and manipulated mathematically. This intuition of unique clustering in a multi-dimensional feature space (e.g., for an Iris setosa flower) is essential to understanding how a perceptron learns.

With this understanding, the problem becomes conceptually straightforward. We need to identify a mathematical (algebraic) function that determines whether a new, unidentified flower’s dimensions fall on the correct side of the dividing boundary. This boundary could be a line (in the case of two variables), a plane (for three variables), or a hyperplane (for four or more variables).

A perceptron is precisely such a computational device. It learns the appropriate algebraic function algorithmically by adjusting the hyperplane until it neatly separates the training data for a specific object type in the multi-dimensional space. Once trained, the perceptron applies this function to new, unclassified data points, determining whether they fall on the correct side of the hyperplane. Interestingly, the perceptron’s conceptual structure—explored by McCulloch and Pitts in 1943—draws inspiration from the simplified functional workings of real neurons in the human brain.

A perceptron assumes the existence of a function f(x1, x2, x3, ..., xn) for a given object type (such as an iris setosa flower). This function determines the perpendicular distance of a new unclassified object from the separating hyperplane. If f(x1, x2, x3, ..., xn) > 0 the object is on the correct side of the hyperplane. The perceptron learns this function by processing pre-classified datasets (as we will see shortly). Using this function, it performs a simple logic check, returning y = 1 if the object is on the correct side of the hyperplane and thus belongs to the target object type.

In other words:

y = 1, if f(x1, x2, x3, ..., xn) > 0
y = -1 if f(x1, x2, x3, ..., xn) <= 0

When a perceptron computes f(x1, x2, x3, ..., xn), it does so by multiplying each variable (x) by a corresponding "weight" (w), and then summing the results along with a constant bias term (w0).

f(x1, x2, x3, ..., xn) = w1x1 + w2x2 + w3x3 +, ..., + wnxn + w0.

The idea is that, for a given object type (like an Iris setosa flower), each variable (x) contributes a specific weight (w) to determining the object’s classification. These weights represent how much influence each variable has in the context of the object’s clustering in hyperspace. Additionally, the constant bias term (w0) accounts for an arbitrary baseline influence in the classification process.

Now we arrive at the most crucial equation for the perceptron: the y-function. This equation is fundamental to understanding perceptrons and should be firmly ingrained in the minds of every AI student.

Where wTx represents the dot product of the transpose of the weight column vector w with the variable vector x, where x0 is always 1. This dot product is equivalent to the function f(x1, x2, x3, ..., xn), as it measures the perpendicular distance from the given data point to the separating hyperplane.

Establishing that wTx represents the perpendicular distance involves some straightforward but rather lengthy mathematical reasoning. For a detailed explanation of how this equation is derived, I recommend referring to the first two chapters of Anil Ananthaswamy's "Why Machines Learn".

At this point, we have a function that determines whether a new, unclassified dataset belongs to the object of interest—such as an Iris setosa flower—or not. But how do we learn the weights and derive the specific equation representing the distance to the separating hyperplane, for example, f(x1, x2) = 4.1x1 - 11x2 + 2?

This is achieved using an iterative algorithm (explained in pseudocode below), which gradually adjusts the weight vector to its correct values. Once these values are determined, they can be plugged into the function f(x1, x2, x3, ..., xn) to classify new data points.

Initialize weight vector w = 0

Set Updates = False

Repeat:

Set Updates = False

For each data point (x, y) in the training dataset:

If y * (w^T x) <= 0:

Update the weight vector: w = w + y * x

Set Updates = True

End For

Until Updates = False

This algorithm works by recursively updating the weights, causing the hyperplane to shift during execution until it reaches a position where all the data points labeled as "correct" fall on the correct side of the hyperplane, and those labeled as "false" fall on the opposite side. It is guaranteed to work for datasets that are linearly separable. For a detailed explanation of why this is the case, I once again recommend consulting Anil Ananthaswamy's Why Machines Learn.

Alright, enough theory! Now, let’s move on to the truly exciting part. Can we emulate a multi-dimensional perceptron in a simple, universally accessible way—one that allows anyone with a bit of curiosity to experiment with it, test it on various datasets, and uncover patterns in the data? And perhaps even explore the code to understand exactly how it works? The answer is yes!

Download the file below, which contains an Excel spreadsheet with a 10-dimensional perceptron macro and a sample dataset. (Please ensure you scan the file before use, as one can never fully guarantee the safety of files hosted on public servers.)

For those curious about how the learning algorithm's matrix manipulations are implemented, I’ve included a formatted image of the code below. Please note that my code is somewhat clunky, with several Select Case statements, and it’s not meant to be a model of elegance. My goal is simply to help newcomers to AI understand the perceptron algorithmically, without getting bogged down in the finer points of code optimization.

Can a basic, single perceptron be used to discern patterns in the real world? Absolutely! The example dataset included in the spreadsheet contains the complete dataset for identifying a real Iris setosa flower against three other flower types, sourced from a publicly available dataset.

However, as we’ll see, while a single perceptron is useful for simpler tasks like this one, it falls short when tackling complex patterns, such as handwriting recognition or detecting skin tumors. That said, a multi-layered network of similar perceptual nodes, each with slight enhancements to their individual functions, can perform remarkably well in recognizing intricate patterns.

with Ruwan Rajapakse

AI in baby steps, experiment 1: making and using a perceptron

Recent Posts

Comentarios