Neural Network 101: Making Your Brain Cells Jealous

Ever wondered how your smartphone can recognise your face, or how your phone camera seems to magically recognise objects in the frame? No, there isn’t a tiny team of phone fairies analysing your pictures; it’s the work of neural networks at play! These intelligent species are the brains behind the scenes, transforming our world in fascinating ways. Imagine this: you’re sipping your morning coffee, scrolling through your social media feed, and suddenly, an ad pops up showcasing the exact item you were eyeing yesterday. How does it know? How can it possibly understand your preferences and desires as accurately as your best friend?

In the next few minutes, we’re going to unravel some of this mystery. Do not worry we are not going to dive deep into it, just getting our feet wet. Picture this as a leisurely stroll through history and how we reached here, while trying my best to simplify as much as possible.

Some History

The year is 1943, Warren McCulloch, a neurophysiologist, and Walter Pitts, a logician, teamed up to develop a mathematical model of an artificial neuron based on a real neuron from the human brain.

Fun Fact: Our brain, is made up of trillions of neurons, interconnected with each other consuming the most energy compared to the entire body. And yes, Red Bull will not make you smarter!

Pretty sweet! *These two brainiacs teamed up to cook up a mathematical recipe for artificial brains, long before Siri and Alexa could even say “Hello.”*

Pitts was self-taught and Despite his lack of an officially recognized position, his work with McCulloch was influential, and was taken up by a psychologist named Frank Rosenblatt. Rosenblatt further developed the artificial neuron to give it the ability to learn. Even more importantly, he worked on building the first device that actually used these principles, the Mark I Perceptron.

*Perceptrons: Because even machines deserve a cool title!*

Rosenblatt wrote about this work:

“We are now about to witness the birth of such a machine–-a machine capable of perceiving, recognizing and identifying its surroundings without any human training or control.”

The perceptron was built, and was able to successfully recognize simple shapes. However, due to its limitations and the lack of advanced hardware, its potential was overlooked, leading to the onset of the AI winter—a period of reduced funding and interest in AI research. Despite being sidelined, the concept of perceptrons remained crucial, laying the groundwork for future advancements in artificial intelligence.

The next most pivotal work in neural networks in the coming years was the multi-volume Parallel Distributed Processing (PDP) by David Rumelhart, James McClellan, and the PDP Research Group, released in 1986 by MIT Press. The approach laid out in PDP is very similar to the approach used in today’s neural networks. The book defined parallel distributed processing as requiring:

A set of processing units
A state of activation
An output function for each unit
A pattern of connectivity among units
A propagation rule for propagating patterns of activities through the network of connectivities
An activation rule for combining the inputs impinging on a unit with the current state of that unit to produce an output for the unit
A learning rule whereby patterns of connectivity are modified by experience
An environment within which the system must operate

Scientific jargon: Check ✅

Pretty daunting: Check ✅

Alright lets break it down.

Imagine a Busy Office:

Processing Units (Employees): In our office analogy, employees represent processing units. Each person has a specific role and skill set, contributing to the tasks at hand.
State of Activation (Employee Readiness): Employees are like switches; when they are ready and focused, they are in an activated state, prepared to work efficiently.
Output Function (Completed Tasks): The completed tasks, such as reports, presentations, or projects, represent the output. Each employee’s work contributes to the final outcomes of the office.
Pattern of Connectivity (Work Relationships): The relationships and interactions among employees form a pattern of connectivity. Some employees work closely together, sharing information and collaborating, while others have more distant connections.
Propagation Rule (Passing Information): Information and instructions pass through the office network. Employees communicate and share updates, ensuring everyone is on the same page, similar to patterns of activities propagating through the network.
Activation Rule (Employee Selection for Projects): Imagine you have various projects in the office, each requiring specific skills. The activation rule is like choosing the best-suited employee with the right expertise for a particular project.
Learning Rule (Skill Enhancement): As employees gain experience, they learn and adapt. The office environment provides learning opportunities, and employees modify their approaches based on past experiences, similar to modifying patterns of connectivity based on office experience.
Environment (Office Space): The office space represents the environment. It’s where employees interact, face challenges, and accomplish tasks. The environment shapes how employees work together and influences their productivity.

A burning question: Why Neural Networks?

Suppose we wanted a computer to distinguish between cats and dogs. Sounds simple for us, right? But for a clueless computer, it’s like asking a fish to climb a tree – utter chaos! Unlike our human intuition, computers need explicit instructions. Imagine explaining the subtle differences between cats and dogs in a language a computer understands. It’s as tricky as teaching a parrot to recite Shakespeare accurately!

Normally, it’s easy enough for us to write down the steps to complete a task when we’re writing a program. We just think about the steps we’d take if we had to do the task by hand, and then we translate them into code. For instance, we can write a function that sorts a list. In general, we’d write a function that looks something like

Where *inputs* might be an unsorted list, and *results* a sorted list.

But for recognizing objects in a photo that’s a bit tricky; what are the steps we take when we recognize an object in a picture? We really don’t know, since it all happens in our brain without us being consciously aware of it!

At the dawn of computing in 1949, a visionary IBM researcher named Arthur Samuel embarked on a quest: finding a novel approach to make computers accomplish tasks. This pioneering endeavor marked the birth of “machine learning”. In his seminal 1962 essay, “Artificial Intelligence: A Frontier of Automation,” Samuel laid the foundation for a journey that continues to shape the future of technology.

“Programming a computer for such computations is, at best, a difficult task, not primarily because of any inherent complexity in the computer itself but, rather, because of the need to spell out every minute step of the process in the most exasperating detail. Computers, as any programmer will tell you, are giant morons, not giant brains.”

His basic idea was this: instead of telling the computer the exact steps required to solve a problem, show it examples of the problem to solve, and let it figure out how to solve it itself. This turned out to be very effective: by 1961 his checkers-playing program had learned so much that it beat the Connecticut state champion! Here’s how he described his idea (from the same essay as above):

The idea of a “weight assignment”
The fact that every weight assignment has some “actual performance”
The requirement that there be an “automatic means” of testing that performance,
The need for a “mechanism” (i.e., another automatic process) for improving the performance by changing the weight assignments

More Jargon eh! Get your ovens ready we will be baking some cookies.

Generated by stable diffusion. Pretty Neat IMO!

Imagine you are baking some cookies and want the perfect recipe.

Weight Assignment (Ingredients Proportions): Think of the ingredients as the weights. You decide how much flour, sugar, and chocolate chips to use.

Actual Performance (Taste of the Cookies): After baking, the taste of your cookies is akin to their actual performance. It depends on how well you balanced the ingredients’ proportions. If you used too much sugar, your cookies might be too sweet.

Automatic Means of Testing (Taste Testers): You let your friends taste the cookies. Their feedback is your automatic means of testing. If they love the cookies, your proportions (weight assignments) were just right. If not, you know you need to adjust.

Mechanism for Improving (Recipe Adjustment): Based on your friends’ feedback, you adjust the recipe. Maybe use less sugar next time. This adjustment process is your mechanism for improving – a bit like trial and error to get the perfect cookie taste.

At this point in time if you have baked the perfect cookies you can send them over. 🙂

Here is the upgraded model of our previous program.

*Well now our dumb computer is showing sparks of AGI!*

After the performance test(taste test), the updated proportion are propagated backward to the initial weights(Ingredient Proportions) and that my folks is… wait for it… “Back Propogation”

Also note that once the model is trained—that is, once we’ve chosen our final, best, favorite weight assignment—then we can think of the weights as being part of the model, since we’re not varying them any more.

Therefore, actually using a model after it’s trained looks like:

This looks identical to our original diagram, just with the word program replaced with model. This is an important insight: a trained model can be treated just like a regular computer program.

It’s not too hard to imagine what the model might look like for a checkers program. There might be a range of checkers strategies encoded, and some kind of search mechanism, and then the weights could vary how strategies are selected, what parts of the board are focused on during a search, and so forth. But it’s not at all obvious what the model might look like for an image recognition program, or for understanding text, or for many other interesting problems we might imagine.

Coming back to the initial problem or differentiating cats from dogs. Well we cannot encode rules saying that dogs have four legs because who else has four legs. Think about it…. Yes you are right cats. We need something better. What we would like is some kind of function that is so flexible that it could be used to solve any given problem, just by varying its weights. Amazingly enough, this function actually exists! It is the neural network. Yes the very daunting term is just a measly old function, just a very complex one.

That is, if you regard a neural network as a mathematical function, it turns out to be a function which is extremely flexible depending on its weights. A mathematical proof called the universal approximation theorem shows that this function can solve any problem to any level of accuracy, in theory. Pretty Dope!

But what about that process? One could imagine that you might need to find a new “mechanism” for automatically updating weights for every problem. This would be laborious. What we’d like here as well is a completely general way to update the weights of a neural network, to make it improve at any given task. Conveniently, this also exists! This is called stochastic gradient descent (SGD). We’ll see how neural networks and SGD work in a later blog, as well as explaining the universal approximation theorem.

For now, however, we will instead use Samuel’s own words:

We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programmed would “learn” from its experience.

If you reached the end of it thank you for reading. I really appreciate it.

Here is what you can do next while you eagerly wait for the next blog:

How might neural networks revolutionize industries you’re passionate about? Whether it’s environmental conservation, business, or social change, the possibilities are endless.

Intrigued by the history of artificial intelligence and neural networks? Explore the pioneers, breakthroughs, and challenges that have shaped this fascinating field.

Discover the stories of individuals and organizations leveraging neural networks for social good, and consider how you can make a positive impact too.

Peace!

References:

Howard, J., & Gugger, S. (31 July 2020). Practical Deep Learning for Coders. O’Reilly.
Strickland, E. (2021, September 30). Mark 1 Perceptron. IEEE Spectrum.
https://spectrum.ieee.org/history-of-ai

6 responses to “Neural Network 101: Making Your Brain Cells Jealous”

Harsh Kantariya

November 8, 2023 at 2:25 pm

Looks preety near, liked the blog designs and fonts and the overall layout. About the content, few examples and images are good. But it is like all other thousands of blogs I have read on this topic.
In the end, you can write how and why you are writing this blog and what inspired you to join this field. Get little personal with readers.

Reply
1. nakranivaibhav
  
  November 18, 2023 at 7:30 am
  
  Yes i will try my best in the future blogs to connect better.
  
  Reply
Amita Todkar

December 11, 2023 at 4:48 am

The content is made simple to understand. Good work!

Reply
1. nakranivaibhav
  
  December 16, 2023 at 5:03 pm
  
  Thank you! It means a lot 🙂
  
  Reply
Alpesh

January 11, 2024 at 3:53 pm

To lot helpful me to understand neural network i want to say thanks.
Explaination is very very deep level inside a all blogs of deep learning diary.

Reply
1. nakranivaibhav
  
  January 14, 2024 at 1:33 pm
  
  Thank you for the kind words! This means a lot
  
  Reply