Deep Learning: A Beginner’s Guide

In today’s AI-driven world, Deep Learning (DL) is transforming industries like healthcare, finance, and automation. But how does it work? Let’s break it down with simple explanations and practical insights.

Deep Learning vs. Machine Learning

Both Machine Learning (ML) and Deep Learning (DL) extract patterns from data, but they differ fundamentally:

Machine Learning: Relies on handcrafted features (e.g., edge detection in images). Simple models include:

Linear Regression:

$$( f(x) = \mathbf{w}^T \mathbf{x} + b )$$
Logistic Regression:

$$( P(y=1) = \frac{1}{1 + e{-(\mathbf{w}T \mathbf{x} + b)}} )$$

Deep Learning: Uses hierarchical feature learning via neural networks. A simple 2-layer network computes:

Hidden Layer:

$$( \mathbf{h} = \sigma(\mathbf{W}_1 \mathbf{x} + \mathbf{b}_1) )$$
Output Layer:

$$( \mathbf{y} = \sigma(\mathbf{W}_2 \mathbf{h} + \mathbf{b}_2) )$$

Understanding Deep Learning Architectures

DL mimics the brain using artificial neural networks (ANNs). Key architectures include:

Artificial Neural Networks (ANNs):
- Used for tabular data (e.g., predicting house prices).
- Forward pass:
  
  $$( \mathbf{y} = \sigma(\mathbf{W}_n \cdots \sigma(\mathbf{W}_1 \mathbf{x} + \mathbf{b}_1) + \mathbf{b}_n) )$$
Convolutional Neural Networks (CNNs):
- Best for image recognition. A convolutional layer applies filters to detect patterns:
  
  $$( \mathbf{F}{out} = \mathbf{F}{in} * \mathbf{K} + \mathbf{b} )$$
Recurrent Neural Networks (RNNs):
- Best for sequential data (e.g., text, speech). At time ( t ):
  
  $$( \mathbf{h}_t = \sigma(\mathbf{W}h \mathbf{h}{t-1} + \mathbf{W}_x \mathbf{x}_t + \mathbf{b}) )$$
- LSTM (Long Short-Term Memory) improves long-term memory handling.

How Deep Learning Learns: Training Process

Backpropagation computes gradients using the chain rule:

$$( \frac{\partial J}{\partial \mathbf{W}} = \frac{\partial J}{\partial \mathbf{y}} \cdot \frac{\partial \mathbf{y}}{\partial \mathbf{h}} \cdot \frac{\partial \mathbf{h}}{\partial \mathbf{W}} )$$
Gradient Descent updates weights to minimize loss ( J ):

$$( \mathbf{W}{new} = \mathbf{W}{old} - \eta \nabla_{\mathbf{W}} J )$$

Single-Layer Perceptron: The Simplest Neural Network

A perceptron is a basic ANN for binary classification (e.g., pass/fail decisions).

Mathematical Formulation:

Weighted Sum:

$$( z = \mathbf{w}^T \mathbf{x} + b )$$

Activation Functions:

Step Function (Hard decision):

$$[ f(z) = \begin{cases} 1 & \text{if } z \geq 0, \ 0 & \text{otherwise}. \end{cases} ]$$
Sigmoid (Probability-based decision):

$$[ f(z) = \frac{1}{1 + e^{-z}} ]$$

Training Rule:

Perceptron updates weights based on errors:

$$( \mathbf{w}{new} = \mathbf{w}{old} + \eta (y_{true} - y_{pred}) \mathbf{x} )$$

Example Dataset:

IQ (( x_1 ))	Study Hours (( x_2 ))	Exam Success (( y ))
90	6	1
95	3	0
110	2	0
100	5	1

Model learns weights , such that:

$$( w_1 x_1 + w_2 x_2 + b \geq 0 \Rightarrow y=1 )$$

PyTorch Implementation of the Perceptron

import torch
import torch.nn as nn
import torch.optim as optim

# Dataset (IQ, Study Hours)
X = torch.tensor([[90.0, 6], [95, 3], [110, 2], [100, 5]], dtype=torch.float32)
y = torch.tensor([[1], [0], [0], [1]], dtype=torch.float32)

# Define the Perceptron
class Perceptron(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        return self.sigmoid(self.linear(x))

model = Perceptron()
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Training Loop
epochs = 1000
for epoch in range(epochs):
    outputs = model(X)
    loss = criterion(outputs, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

# Predictions
with torch.no_grad():
    predicted = (model(X) > 0.5).float()
    print("\nPredictions:", predicted.squeeze().numpy())

Output

Epoch [100/1000], Loss: 0.5198  
Epoch [200/1000], Loss: 0.3989  
...  
Epoch [1000/1000], Loss: 0.1932  
Predictions: [1. 0. 0. 1.]  # Matches the true labels!

Limitations of Perceptrons and Solutions

Problem: Perceptrons fail on non-linearly separable data (e.g., XOR problem):

(x_1)	(x_2)	(y)
0	0	0
0	1	1
1	0	1
1	1	0

y=0   ●───────────○  
      │           │  
y=1   ○───────────●

Solution: Multi-Layer Perceptrons (MLPs) use hidden layers and activation functions:

This enables complex decision boundaries and real-world problem-solving.

Conclusion
Deep Learning’s strength lies in learning hierarchical representations through stacked layers. While single-layer perceptrons are limited, architectures like CNNs and RNNs enable breakthroughs in image recognition, speech processing, and more.

Next Up: We’ll explore CNNs for image classification! 🚀