diff --git "a/index.html" "b/index.html" --- "a/index.html" +++ "b/index.html" @@ -13117,7 +13117,7 @@ div#notebook {
-

How does a neural net really work

In this notebook I'm exploring fast.ai's Kaggle notebook on "How does a neural net really work". This relates to Lesson 3 of the fast.ai Deep Learning course. While the video provides a solid explanation, the enigmatic imports and variables can be difficult to comprehend. I'm reimplementing some sections to see if if sticks. In a nutshell, this is what is happening in this notebook:

+

How does a neural net really work

In this notebook I'm exploring fast.ai's Kaggle notebook on "How does a neural net really work". This relates to Lesson 3 and Lesson 5 of the fast.ai Deep Learning course. While the video provides a solid explanation, the enigmatic imports and variables can be difficult to comprehend. I'm reimplementing some sections to see if if sticks. In a nutshell, this is what is happening in this notebook:

  1. Revising Regressions
    • Plot a generic quadratic function ($ax^2 + bx + c$)
    • @@ -13127,9 +13127,10 @@ div#notebook {
  2. Understand and break down the Gradient Descent algorithm
  3. -
  4. The Basics of a Neural-Network
      -
    • Understand what is a ReLU
    • -
    • Create the simplest neural-network possible
    • +
    • The Basics of a Neural-Network using the Titanic Survival dataset form Kaggle
        +
      • Explore ReLUs and how it differs from a simpel linear function
      • +
      • Build a single-layer neural network using a simple linear function $f(x) = m*x$ (m being a array of weights that we multiply by our features)
      • +
      • Do Deep Learning by layering coefficients/wheights/neurons to do a multi-layer neural network
@@ -13142,7 +13143,7 @@ div#notebook {
# Installing the dependencies within the notebook to make it easier to run on colab
-%pip install -Uqq fastai==2.7.18 ipywidgets==8.1.5 plotly==5.24.1
+%pip install -Uqq fastai==2.7.18 ipywidgets==8.1.5 plotly==5.24.1 datasets==3.3.2
 
@@ -13162,7 +13163,8 @@ div#notebook {
-

1. Revising Regressions

+

1. Revising Regressions

This section, from the fast.ai course, sets the stage for understanding how neural networks learn "weights". +We'll plot some points on a graphic and use visualizations to see how changing the coefficients affects the function to better fit the points.

@@ -13179,6 +13181,12 @@ div#notebook {
from fastai.basics import torch, plt
+import numpy as np, pandas as pd
+
+# Make pandas and numpy use the entire screan
+np.set_printoptions(linewidth=140)
+torch.set_printoptions(linewidth=140, sci_mode=False, edgeitems=7)
+pd.set_option('display.width', 140)
 
 # Set the figure DPI to 90 for better resolution
 plt.rc('figure', dpi=90)
@@ -13310,12 +13318,12 @@ div#notebook {
 
-
+
@@ -13397,12 +13405,12 @@ Prediction: 4.0, Actual: 4.2, Absolute Difference: 0.200
-
+
@@ -13543,9 +13551,9 @@ var element = $('#259b5904-8016-4be9-a0f8-828e0f13c3d6');
-
@@ -13764,7 +13772,7 @@ var element = $('#82feb05b-de4e-42b1-9bc1-96b7dbb58213');
from fastai.metrics import mae
 
-def demo_auto_fit(steps=50):
+def demo_auto_fit(steps=20):
     x, y = generate_noisy_data(mk_quad(3,2,1))
 
     abc = torch.tensor([1.0,1.0,1.0], requires_grad=True)
@@ -13819,48 +13827,1255 @@ step=16; loss=0.76; abc=tensor([2.7889, 1.5358, 1.4100], requires_grad=True)
 step=17; loss=0.74; abc=tensor([2.8330, 1.5484, 1.3900], requires_grad=True)
 step=18; loss=0.71; abc=tensor([2.8771, 1.5611, 1.3700], requires_grad=True)
 step=19; loss=0.69; abc=tensor([2.9212, 1.5737, 1.3500], requires_grad=True)
-step=20; loss=0.68; abc=tensor([2.9155, 1.5863, 1.3100], requires_grad=True)
-step=21; loss=0.66; abc=tensor([2.9346, 1.5832, 1.2800], requires_grad=True)
-step=22; loss=0.65; abc=tensor([2.9289, 1.5958, 1.2400], requires_grad=True)
-step=23; loss=0.64; abc=tensor([2.9480, 1.5926, 1.2100], requires_grad=True)
-step=24; loss=0.62; abc=tensor([2.9672, 1.5895, 1.1800], requires_grad=True)
-step=25; loss=0.61; abc=tensor([2.9864, 1.5863, 1.1500], requires_grad=True)
-step=26; loss=0.60; abc=tensor([2.9807, 1.6000, 1.1200], requires_grad=True)
-step=27; loss=0.60; abc=tensor([3.0064, 1.5937, 1.1200], requires_grad=True)
-step=28; loss=0.59; abc=tensor([3.0018, 1.6105, 1.1000], requires_grad=True)
-step=29; loss=0.59; abc=tensor([3.0275, 1.6042, 1.1000], requires_grad=True)
-step=30; loss=0.59; abc=tensor([3.0283, 1.6137, 1.0900], requires_grad=True)
-step=31; loss=0.58; abc=tensor([3.0290, 1.6232, 1.0800], requires_grad=True)
-step=32; loss=0.58; abc=tensor([3.0547, 1.6168, 1.0800], requires_grad=True)
-step=33; loss=0.58; abc=tensor([3.0555, 1.6263, 1.0700], requires_grad=True)
-step=34; loss=0.58; abc=tensor([3.0563, 1.6358, 1.0600], requires_grad=True)
-step=35; loss=0.58; abc=tensor([3.0571, 1.6453, 1.0500], requires_grad=True)
-step=36; loss=0.58; abc=tensor([3.0828, 1.6389, 1.0500], requires_grad=True)
-step=37; loss=0.58; abc=tensor([3.0835, 1.6484, 1.0400], requires_grad=True)
-step=38; loss=0.58; abc=tensor([3.0843, 1.6579, 1.0300], requires_grad=True)
-step=39; loss=0.57; abc=tensor([3.0851, 1.6674, 1.0200], requires_grad=True)
-step=40; loss=0.57; abc=tensor([3.1136, 1.6558, 1.0300], requires_grad=True)
-step=41; loss=0.58; abc=tensor([3.0956, 1.6516, 1.0100], requires_grad=True)
-step=42; loss=0.57; abc=tensor([3.0992, 1.6558, 1.0100], requires_grad=True)
-step=43; loss=0.57; abc=tensor([3.1027, 1.6600, 1.0100], requires_grad=True)
-step=44; loss=0.57; abc=tensor([3.1063, 1.6642, 1.0100], requires_grad=True)
-step=45; loss=0.57; abc=tensor([3.0911, 1.6547, 1.0000], requires_grad=True)
-step=46; loss=0.57; abc=tensor([3.0946, 1.6589, 1.0000], requires_grad=True)
-step=47; loss=0.57; abc=tensor([3.0982, 1.6632, 1.0000], requires_grad=True)
-step=48; loss=0.57; abc=tensor([3.1017, 1.6674, 1.0000], requires_grad=True)
-step=49; loss=0.57; abc=tensor([3.1053, 1.6716, 1.0000], requires_grad=True)
-Best abc parameters: tensor([3.1053, 1.6716, 1.0000])
+Best abc parameters: tensor([2.9212, 1.5737, 1.3500])
 
+
+
+
+

3. The Basics of a Neural-Network

+
+
+
+
+
+
+

3.1 Introducing Non-Linearity with ReLU

We've seen that simple functions like quadratics can model some data, but real-world data is rarely so straightforward. Imagine trying to predict something complex, like whether a picture is a cat or a dog, based on many pixel values (our 'dimensions'). A simple quadratic or even a single linear function just won't be flexible enough to capture the intricate patterns in such high-dimensional data.

+

To handle this complexity, we need to build more powerful functions. Simply combining linear functions won't solve the problem because any combination of linear functions is still just a linear function! Linear functions can only model linear relationships in the data. Real-world data, like images of cats and dogs, is highly non-linear.

+

To introduce non-linearity, we use activation functions. ReLU (Rectified Linear Unit) is a simple yet powerful activation function that introduces non-linearity. By applying ReLU to the output of linear functions, we can create models that can learn complex, non-linear patterns in the data. This non-linearity is what allows neural networks to model intricate relationships that simple linear models cannot. This will lead us to the idea of a ReLU, a simple activation function, and the simplest "neural network" we can build with it.

+
+
+
+
+
+
In [12]:
+
+
+
def rectified_linear(m,b,x):
+    y = m*x+b
+    return torch.clip(y, 0.)
+
+plot_function(partial(rectified_linear, 1,1))
+
+
+
+
+
+
+
+
+
+No description has been provided for this image +
+
+
+
+
+
+
+
+

Combining two ReLUs allows us to create more complex, piecewise linear functions, as illustrated in the interactive plot below. This combination increases the flexibility of our model, enabling it to capture more intricate relationships in the data.

+
+
+
+
+
+
In [13]:
+
+
+
def double_relu(m1,b1,m2,b2,x):
+    return rectified_linear(m1,b1,x) + rectified_linear(m2,b2,x)
+
+@interact(m1=-1.5, b1=-1.5, m2=1.5, b2=1.5)
+def plot_double_relu(m1, b1, m2, b2):
+    plot_function(partial(double_relu, m1,b1,m2,b2), ylim=(-1,6))
+
+
+
+
+
+
+
+
+
+ + +
+
+
+
+
+
+
+
+

3.2 Building a Neural Network from-Scratch

From this point forward, we will be following this notebook: Linear model and neural net from scratch.

+

Important ⚠️: For simplicity, I'm skipping all the steps that involve data cleanup and preparation. This means of course that my model will most likely not have a very good performance.

+

We are using the Titanic competition from Kaggle. I have made a copy in my Hugging Face workspace, which tbh I did to to experiment how Datasets work on Hugging Face.

+
+

The goal is to create a model to predict whether a passenger Survived, which is provided in our dataset.

+

In essence, we will now combine functions like those we've explored above, such as ReLUs, to construct a simple neural network. This network will receive passenger features as input, apply weights (similar to $m$ in our previous examples), and hopefully predict whether the passenger Survived with the lowest possible loss/error.

+
+
+
+
+
+
In [14]:
+
+
+
from datasets import load_dataset
+# Load only train and test splits.
+dataset = load_dataset("paulopontesm/titanic", data_files={"train": "train.csv", "test": "test.csv"})
+# Access the splits
+train_dataset_df = dataset["train"].to_pandas()
+test_dataset_df = dataset["test"].to_pandas()
+
+train_dataset_df
+
+
+
+
+
+
+
+
Out[14]:
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.250000NoneS
1211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.283302C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.925000NoneS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.099998C123S
4503Allen, Mr. William Henrymale35.0003734508.050000NoneS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.000000NoneS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.000000B42S
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.450001NoneS
88989011Behr, Mr. Karl Howellmale26.00011136930.000000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.750000NoneQ
+

891 rows × 12 columns

+
+
+
+
+
+
+
+
+

Since we need numerical data for our model, we'll just use the columns that already contain numbers as predictors. These are the columns that are already numerical.

+
+
+
+
+
+
In [15]:
+
+
+
import numpy as np
+
+train_dataset_df.describe(include=(np.number))
+
+
+
+
+
+
+
+
Out[15]:
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PassengerIdSurvivedPclassAgeSibSpParchFare
count891.000000891.000000891.000000714.000000891.000000891.000000891.000000
mean446.0000000.3838382.30864229.6991180.5230080.38159432.204208
std257.3538420.4865920.83607114.5264971.1027430.80605749.693432
min1.0000000.0000001.0000000.4200000.0000000.0000000.000000
25%223.5000000.0000002.00000020.1250000.0000000.0000007.910400
50%446.0000000.0000003.00000028.0000000.0000000.00000014.454200
75%668.5000001.0000003.00000038.0000001.0000000.00000031.000000
max891.0000001.0000003.00000080.0000008.0000006.000000512.329224
+
+
+
+
+
+
+
+
+

Now that we have numbers for the features, we can create tensors/arrays for our features (aka called independent variables) and target (aka dependent variable).

+

Even thought I mentioned above that I didn't want to do a lot of data transofrmations, I think we really need to remove the NaNs and to normalize the numbers.

+
+
+
+
+
+
In [16]:
+
+
+
from torch import tensor
+
+# And one for the target. Also known as dependent variables or outputs
+t_dep = tensor(train_dataset_df.Survived)
+
+indep_cols = ['Age', 'SibSp', 'Parch', 'Fare']
+
+# We need to do 2 things before proceeding so that we can use our data.
+# 1. Replace all the nans by the mode of that column
+for col in indep_cols:
+    mode_val = train_dataset_df[col].mode()[0]
+    train_dataset_df[col] = train_dataset_df[col].fillna(mode_val)
+
+# 2. to prevent one column from dominating all the others, by making each row range between 0 and 1. 
+# We can do this by dividing each entry by
+# the max value on that column
+for col in indep_cols:
+    max_val = train_dataset_df[col].max()
+    train_dataset_df[col] = train_dataset_df[col] / max_val
+
+# Create a tensor with our predictors. Also known as independent variables, features, or inputs.
+t_indep = tensor(train_dataset_df[indep_cols].values, dtype=torch.float)
+t_indep
+
+
+
+
+
+
+
+
Out[16]:
+
+
tensor([[0.2750, 0.1250, 0.0000, 0.0142],
+        [0.4750, 0.1250, 0.0000, 0.1391],
+        [0.3250, 0.0000, 0.0000, 0.0155],
+        [0.4375, 0.1250, 0.0000, 0.1036],
+        [0.4375, 0.0000, 0.0000, 0.0157],
+        [0.3000, 0.0000, 0.0000, 0.0165],
+        [0.6750, 0.0000, 0.0000, 0.1012],
+        ...,
+        [0.3125, 0.0000, 0.0000, 0.0138],
+        [0.4875, 0.0000, 0.8333, 0.0568],
+        [0.3375, 0.0000, 0.0000, 0.0254],
+        [0.2375, 0.0000, 0.0000, 0.0586],
+        [0.3000, 0.1250, 0.3333, 0.0458],
+        [0.3250, 0.0000, 0.0000, 0.0586],
+        [0.4000, 0.0000, 0.0000, 0.0151]])
+
+
+
+
+
+
+
+
+

Looks good (?)...

+

Because we want to calculate accuracy later, for fun, let's also keep a chunk of the training data for this. This is called a validation set.

+

Note: This other notebook explains the difference between the validation set and the test set. They seem similar, but looks like it's important not to confuse these two concepts. https://www.kaggle.com/code/jhoward/getting-started-with-nlp-for-absolute-beginners#Test-and-validation-sets

+
+
+
+
+
+
In [17]:
+
+
+
from fastai.data.transforms import RandomSplitter
+trn_split,val_split=RandomSplitter(seed=42)(train_dataset_df)
+
+train_set_features,validation_set_features = t_indep[trn_split],t_indep[val_split]
+train_set_targets,validation_set_targets = t_dep[trn_split],t_dep[val_split]
+len(train_set_features),len(validation_set_features)
+
+
+
+
+
+
+
+
Out[17]:
+
+
(713, 178)
+
+
+
+
+
+
+
+
+

Now we can generate random weights (ms) for each of our features. We're using a linear model, effectively calculating a weighted sum of the features: $f(x) = m_{Age}*x_{Age} + m_{SibSp}*x_{SibSp} + m_{Parch}*x_{Parch} + m_{Fare}*x_{Fare}$. We will adjust these weights to predict passenger survival based on the features.

+
+
+
+
+
+
In [18]:
+
+
+
def generate_random_coefficients(num_coeffs):
+    torch.manual_seed(42)
+    coeffs = torch.rand(num_coeffs)-0.5 # pick random numbers in the range (-0.5,0.5)
+    return coeffs.requires_grad_()
+
+nn_coeffs=generate_random_coefficients(num_coeffs=train_set_features.shape[1])
+nn_coeffs
+
+
+
+
+
+
+
+
Out[18]:
+
+
tensor([ 0.3823,  0.4150, -0.1171,  0.4593], requires_grad=True)
+
+
+
+
+
+
+
+
In [19]:
+
+
+
def calc_preds(coeffs, features): return (features*coeffs).sum(axis=1)
+
+predictions = calc_preds(nn_coeffs, train_set_features)
+predictions.topk(3)
+
+
+
+
+
+
+
+
Out[19]:
+
+
torch.return_types.topk(
+values=tensor([0.6265, 0.6265, 0.6118], grad_fn=<TopkBackward0>),
+indices=tensor([183,  94, 462]))
+
+
+
+
+
+
+
+
In [20]:
+
+
+
def calc_loss(coeffs, features, targets): return torch.abs(calc_preds(coeffs, features)-targets).mean()
+
+loss = calc_loss(coeffs=nn_coeffs, features=train_set_features, targets=train_set_targets)
+loss
+
+
+
+
+
+
+
+
Out[20]:
+
+
tensor(0.4199, grad_fn=<MeanBackward0>)
+
+
+
+
+
+
+
+
In [21]:
+
+
+
loss.backward()
+nn_coeffs.grad
+
+
+
+
+
+
+
+
Out[21]:
+
+
tensor([ 0.0990,  0.0191,  0.0030, -0.0091])
+
+
+
+
+
+
+
+
In [22]:
+
+
+
def one_epoch(coeffs, lr, train_set_features_set, train_set_targets_set):
+    loss = calc_loss(coeffs, train_set_features_set, train_set_targets_set)
+    loss.backward()
+    with torch.no_grad():
+        coeffs.sub_(coeffs.grad * lr)
+        coeffs.grad.zero_()
+    print(f"{loss:.3f}", end="; ")
+    
+def train_model(train_set_features_set, train_set_targets_set, epochs=60, lr=4):
+    torch.manual_seed(442)
+    coeffs = generate_random_coefficients(num_coeffs=t_indep.shape[1])
+    for i in range(epochs): one_epoch(coeffs, lr=lr, train_set_features_set=train_set_features_set, train_set_targets_set=train_set_targets_set)
+    return coeffs
+
+final_weights = train_model(train_set_features, train_set_targets)
+
+def show_coeffs(coeffs): return dict(zip(indep_cols, coeffs.requires_grad_(False)))
+
+show_coeffs(nn_coeffs)
+
+
+
+
+
+
+
+
+
+
0.420; 0.380; 0.403; 0.473; 0.430; 0.389; 0.427; 0.480; 0.434; 0.391; 0.406; 0.478; 0.431; 0.387; 0.407; 0.470; 0.422; 0.379; 0.427; 0.465; 0.416; 0.372; 0.436; 0.462; 0.411; 0.368; 0.435; 0.458; 0.407; 0.365; 0.418; 0.455; 0.404; 0.364; 0.387; 0.445; 0.394; 0.371; 0.432; 0.384; 0.393; 0.455; 0.401; 0.364; 0.367; 0.427; 0.453; 0.398; 0.366; 0.420; 0.371; 0.415; 0.454; 0.398; 0.367; 0.421; 0.372; 0.413; 0.453; 0.396; 
+
+
+
+
Out[22]:
+
+
{'Age': tensor(0.3823),
+ 'SibSp': tensor(0.4150),
+ 'Parch': tensor(-0.1171),
+ 'Fare': tensor(0.4593)}
+
+
+
+
+
+
+
+
+

We have weights, let's do predictions then.

+
+
+
+
+
+
In [23]:
+
+
+
calc_preds(nn_coeffs, validation_set_features)
+
+
+
+
+
+
+
+
Out[23]:
+
+
tensor([0.1258, 0.1216, 0.1212, 0.1519, 0.1311, 0.2332, 0.0116, 0.1454, 0.2672, 0.1671, 0.1600, 0.1607, 0.0614, 0.1217, 0.1683, 0.2388,
+        0.3344, 0.1232, 0.2097, 0.1559, 0.1218, 0.2914, 0.3181, 0.3005, 0.1019, 0.1882, 0.0471, 0.3201, 0.0890, 0.1216, 0.0586, 0.3388,
+        0.0979, 0.1530, 0.1544, 0.2061, 0.2551, 0.1768, 0.1219, 0.1784, 0.1002, 0.1219, 0.1587, 0.2213, 0.1028, 0.1339, 0.1749, 0.1963,
+        0.1209, 0.1116, 0.3947, 0.3054, 0.2396, 0.1258, 0.1337, 0.1623, 0.1216, 0.2378, 0.1076, 0.1506, 0.1682, 0.1394, 0.4092, 0.1075,
+        0.2854, 0.1501, 0.1263, 0.2080, 0.2374, 0.2271, 0.1313, 0.0893, 0.1575, 0.1232, 0.1379, 0.0834, 0.1219, 0.3141, 0.1024, 0.2380,
+        0.1240, 0.4003, 0.1170, 0.1212, 0.2483, 0.1696, 0.2355, 0.1873, 0.2072, 0.1216, 0.1604, 0.1219, 0.2388, 0.1498, 0.1051, 0.1836,
+        0.2431, 0.2393, 0.1121, 0.0968, 0.1254, 0.1216, 0.1600, 0.1504, 0.2511, 0.2123, 0.1519, 0.2585, 0.2530, 0.1968, 0.1419, 0.2161,
+        0.1416, 0.2902, 0.1219, 0.4060, 0.2535, 0.1216, 0.2075, 0.4413, 0.3000, 0.3901, 0.1216, 0.2205, 0.1212, 0.1655, 0.1264, 0.1313,
+        0.0850, 0.2854, 0.1540, 0.1810, 0.2472, 0.1670, 0.2979, 0.0980, 0.2940, 0.1471, 0.2341, 0.1861, 0.1550, 0.3705, 0.2213, 0.1599,
+        0.2736, 0.4228, 0.1893, 0.0957, 0.0923, 0.2085, 0.2059, 0.0826, 0.1098, 0.2523, 0.2393, 0.5530, 0.1219, 0.1563, 0.1741, 0.1277,
+        0.1591, 0.2153, 0.1845, 0.2924, 0.2974, 0.3754, 0.1787, 0.2123, 0.5154, 0.2902, 0.1512, 0.1038, 0.2167, 0.1232, 0.2174, 0.1966,
+        0.1420, 0.0896])
+
+
+
+
+
+
+
+
+

It's hard not to notice that we should be predicitng a 0 or 1 value, but instead we're getting a lot of negatives.

+

For simplicity, let's ignore this and say that everything above 0.5 survived.

+
+
+
+
+
+
In [24]:
+
+
+
preds = calc_preds(nn_coeffs, validation_set_features)
+
+print(f"True count was {torch.sum(preds>0.5)} should have been {torch.sum(validation_set_targets.bool())}")
+print(f"False count was {torch.sum(preds<=0.5)} should have been {len( validation_set_targets.bool()) - torch.sum(validation_set_targets.bool())}")
+
+
+
+
+
+
+
+
+
+
True count was 2 should have been 72
+False count was 176 should have been 106
+
+
+
+
+
+
+
+
+
+

With this we can use our validation set and calculate the % of predicitons that we are getting correctly.

+
+
+
+
+
+
In [25]:
+
+
+
def calc_accuracy(predictions, validation_set_features, validation_set_targets):
+    # Convert predictions to boolean values (True if > 0.5, False otherwise)
+    bool_predictions = predictions > 0.5
+    # Convert validation dependent variable to boolean values
+    bool_validation_set_targets = validation_set_targets.bool()
+    # Compare boolean predictions with boolean validation dependent variable to find correct predictions
+    correct_predictions = bool_validation_set_targets == bool_predictions
+    # Convert correct predictions (boolean) to float (1.0 for True, 0.0 for False)
+    accuracy_float = correct_predictions.float()
+    # Calculate the mean of the accuracy_float to get the overall accuracy
+    accuracy_val = accuracy_float.mean()
+    return accuracy_val
+
+accuracy_result = calc_accuracy(predictions=preds, validation_set_features=validation_set_features, validation_set_targets=validation_set_targets)
+print(accuracy_result)
+
+
+
+
+
+
+
+
+
+
tensor(0.5843)
+
+
+
+
+
+
+
+
+
+

Looks like we're doing slightly better than throwing a coin. I say this is a success 🤔❓🤔 I don't think so...

+

As seen in the counts above ("True count was 2 should have been 72", "False count was 176 should have been 106"), the model predicts most instances as False (not survived). +It correctly identifies many of the actual 'not survived' cases, contributing to the accuracy, but incorrectly classifies most 'survived' cases as 'not survived'. +This bias results in an accuracy that is better than random (50%), but not very high, around 58%.

+

The Linear model and neural net from scratch goes much further on this exercise. It uses other techniques to clean-up and normalize the data, it also uses the non-numerical by transforming them to numericals, and then uses a sigmoid that differently form our linear function, always give a value between 0 and 1.

+

For me, this was enought to get an better overview of what's happening inside a neural network.

+
+
+
+
+
+
+

3.3 Do Deep Learning

In the section above we implemented a simple Neural Network. Now let's explore Deep Learning, which is what truly unlocks the power of Neural Networks.

+

Deep Learning involves creating Neural Networks with multiple layers. Instead of a single layer, we stack layers of neurons, allowing the network to learn more complex patterns and representations from the data.

+
+
+
+
+
+
In [26]:
+
+
+
def generate_random_coefficients_for_deep_learning(n_coeff, num_neurons_per_hidden_layer=[10, 10]):
+    torch.manual_seed(42)
+    # Define the number of neurons for each layer, including input, hidden, and output layers.
+    # The input layer size is n_coeff, hidden layers sizes are from num_neurons_per_hidden_layer, and output layer size is 1.
+    num_neurons = [n_coeff] + num_neurons_per_hidden_layer + [1]
+    layers = []
+    for i in range(len(num_neurons)-1):
+        # Determine the size of the input for the current layer from the previous layer's neuron count
+        layer_input_size = num_neurons[i]
+        # Determine the size of the output for the current layer from the current layer's neuron count
+        layer_output_size = num_neurons[i+1]
+        # Initialize a layer with random weights between -0.5 and 0.5.
+        # torch.rand generates uniform random numbers between 0 and 1, then we shift and scale to get range [-0.5, 0.5].
+        # requires_grad_() is set to True to enable gradient tracking for these tensors, which is needed for backpropagation.
+        layer = (torch.rand(layer_input_size, layer_output_size)-0.5).requires_grad_()
+        layers.append(layer)
+    return layers
+
+dnn_layers_coeffs = generate_random_coefficients_for_deep_learning(n_coeff=train_set_features.shape[1], num_neurons_per_hidden_layer=[10, 10])
+dnn_layers_coeffs
+
+
+
+
+
+
+
+
Out[26]:
+
+
[tensor([[ 0.3823,  0.4150, -0.1171,  0.4593, -0.1096,  0.1009, -0.2434,  0.2936,  0.4408, -0.3668],
+         [ 0.4346,  0.0936,  0.3694,  0.0677,  0.2411, -0.0706,  0.3854,  0.0739, -0.2334,  0.1274],
+         [-0.2304, -0.0586, -0.2031,  0.3317, -0.3947, -0.2305, -0.1412, -0.3006,  0.0472, -0.4938],
+         [ 0.4516, -0.4247,  0.3860,  0.0832, -0.1624,  0.3090,  0.0779,  0.4040,  0.0547, -0.1577]], requires_grad=True),
+ tensor([[ 0.1343, -0.1356,  0.2104,  0.4464,  0.2890, -0.2186,  0.2886,  0.0895,  0.2539, -0.3048],
+         [-0.4950, -0.1932, -0.3835,  0.4103,  0.1440,  0.2071,  0.1581, -0.0087,  0.3913, -0.3553],
+         [ 0.0315, -0.3413,  0.1542, -0.1722,  0.1532, -0.1042,  0.4147, -0.2964, -0.2982, -0.2982],
+         [ 0.4497,  0.1666,  0.4811, -0.4126, -0.4959, -0.3912, -0.3363,  0.2025,  0.1790,  0.4155],
+         [-0.2582, -0.3409,  0.2653, -0.2021,  0.3035, -0.1187,  0.2860, -0.3885, -0.2523,  0.1524],
+         [ 0.1057, -0.1275,  0.2980,  0.3399, -0.3626, -0.2669,  0.4578, -0.1687, -0.1773, -0.4838],
+         [-0.2863,  0.1249, -0.0660, -0.3629,  0.0117, -0.3415, -0.4242, -0.2753, -0.4376, -0.3184],
+         [ 0.4998,  0.0944,  0.1541, -0.4663, -0.3284, -0.1664,  0.0782, -0.4400, -0.2154, -0.2993],
+         [ 0.0014, -0.1861, -0.0346, -0.3388, -0.3432, -0.2917, -0.1711, -0.3946,  0.4192, -0.0992],
+         [ 0.4302,  0.1558, -0.4234,  0.3460, -0.1376, -0.1917, -0.4150, -0.4971,  0.1431, -0.1092]], requires_grad=True),
+ tensor([[ 0.1947],
+         [-0.4103],
+         [ 0.3712],
+         [-0.3670],
+         [-0.0863],
+         [ 0.1044],
+         [ 0.2581],
+         [ 0.4037],
+         [ 0.4555],
+         [-0.3965]], requires_grad=True)]
+
+
+
+
+
+
+
+
+

We can test how we do without any training

+
+
+
+
+
+
In [27]:
+
+
+
def calc_preds_for_deep_learning(coeffs, features):
+    # @ is matrix multiplication in Python
+    # It was introduced in Python 3.5 as part of [PEP 465](https://peps.python.org/pep-0465/)
+    layer_features = features
+    for layer in coeffs[:-1]:
+        layer_features = layer_features @ layer
+    layer_features = layer_features @ coeffs[-1]
+    return layer_features.squeeze()
+
+def calc_loss_for_deep_learning(coeffs, features, targets): return torch.abs(calc_preds_for_deep_learning(coeffs, features)-targets).mean()
+
+dnn_preds = calc_preds_for_deep_learning(coeffs=dnn_layers_coeffs, features=validation_set_features)
+
+print(f"True count was {torch.sum(dnn_preds>0.5)} should have been {torch.sum(validation_set_targets.bool())}")
+print(f"False count was {torch.sum(dnn_preds<=0.5)} should have been {len( validation_set_targets.bool()) - torch.sum(validation_set_targets.bool())}")
+
+
+
+
+
+
+
+
+
+
True count was 12 should have been 72
+False count was 166 should have been 106
+
+
+
+
+
+
+
+
+
+

And we need to do the grandient descent for all the coeffs on each layer.

+
+
+
+
+
+
In [28]:
+
+
+
def one_epoch_for_deep_learning(coeffs, lr, train_set_features_set, train_set_targets_set):
+    loss = calc_loss_for_deep_learning(coeffs, train_set_features_set, train_set_targets_set)
+    loss.backward()
+    with torch.no_grad():
+        for layer in coeffs:
+            layer -= layer.grad * lr
+            layer.grad.zero_()
+    
+def train_model_for_deep_learning(train_set_features_set, train_set_targets_set, num_neurons_per_hidden_layer=[10, 10], epochs=60, lr=4):
+    torch.manual_seed(442)
+    coeffs = generate_random_coefficients_for_deep_learning(n_coeff=train_set_features_set.shape[1], num_neurons_per_hidden_layer=num_neurons_per_hidden_layer)
+    for i in range(epochs): one_epoch_for_deep_learning(coeffs, lr=lr, train_set_features_set=train_set_features_set, train_set_targets_set=train_set_targets_set)
+    return coeffs # Returns the trained coefficients, which have the same structure as generate_random_coefficients_for_deep_learning
+
+
+
+
+
+
+
+
+

Let's test it then with different combinations of hidden layers and neurons per layer...

+
+
+
+
+
+
In [29]:
+
+
+
for num_neurons in [[10, 10], [20, 20],[5, 5, 5],[30], [], [2], [50], [2, 2], [50, 50], [5, 10, 5], [2, 2, 2, 2]]:
+    dnn_final_weights = train_model_for_deep_learning(train_set_features, train_set_targets, num_neurons_per_hidden_layer=num_neurons)
+    dnn_preds = calc_preds_for_deep_learning(coeffs=dnn_final_weights, features=validation_set_features)
+    accuracy = calc_accuracy(predictions=dnn_preds, validation_set_features=validation_set_features, validation_set_targets=validation_set_targets)
+
+    print(f"Hidden layers: {num_neurons}")
+    print(f"True count was {torch.sum(dnn_preds>0.5)} should have been {torch.sum(validation_set_targets.bool())}")
+    print(f"False count was {torch.sum(dnn_preds<=0.5)} should have been {len( validation_set_targets.bool()) - torch.sum(validation_set_targets.bool())}")
+    print(f"Accuracy: {accuracy}")
+    print("-" * 20) # Separator for readability
+
+
+
+
+
+
+
+
+
+
Hidden layers: [10, 10]
+True count was 0 should have been 72
+False count was 0 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [20, 20]
+True count was 0 should have been 72
+False count was 0 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [5, 5, 5]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [30]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: []
+True count was 5 should have been 72
+False count was 173 should have been 106
+Accuracy: 0.6123595237731934
+--------------------
+Hidden layers: [2]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [50]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [2, 2]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [50, 50]
+True count was 0 should have been 72
+False count was 0 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [5, 10, 5]
+True count was 3 should have been 72
+False count was 175 should have been 106
+Accuracy: 0.601123571395874
+--------------------
+Hidden layers: [2, 2, 2, 2]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+
+
+
+
+
+
+
+
+
+

Not a lot has changed...

+

Just for fun, we can see how adding a sigmoid and ReLU would affect the results... The code is Ctrl+V Ctrl+C from above, but with a smarter_calc_preds_for_deep_learning.

+
+
+
+
+
+
In [30]:
+
+
+
import torch.nn.functional as F
+
+def smarter_calc_preds_for_deep_learning(coeffs, features):
+    # @ is matrix multiplication in Python
+    # It was introduced in Python 3.5 as part of [PEP 465](https://peps.python.org/pep-0465/)
+    layer_features = features
+    for layer in coeffs[:-1]:
+        layer_features = F.relu(layer_features @ layer)
+    layer_features = layer_features @ coeffs[-1]
+    return torch.sigmoid(layer_features.squeeze())
+
+def smarter_calc_loss_for_deep_learning(coeffs, features, targets):
+    predictions = smarter_calc_preds_for_deep_learning(coeffs, features)
+    return F.binary_cross_entropy(predictions, targets) # Changed loss to Binary Cross Entropy
+
+def smarter_one_epoch_for_deep_learning(coeffs, lr, train_set_features_set, train_set_targets_set):
+    loss = smarter_calc_loss_for_deep_learning(coeffs, train_set_features_set, train_set_targets_set)
+    loss.backward()
+    with torch.no_grad():
+        for layer in coeffs:
+            layer -= layer.grad * lr
+            layer.grad.zero_()
+    
+def smarter_train_model_for_deep_learning(train_set_features_set, train_set_targets_set, num_neurons_per_hidden_layer=[10, 10], epochs=60, lr=4):
+    torch.manual_seed(442)
+    coeffs = generate_random_coefficients_for_deep_learning(n_coeff=train_set_features_set.shape[1], num_neurons_per_hidden_layer=num_neurons_per_hidden_layer)
+    for i in range(epochs): smarter_one_epoch_for_deep_learning(coeffs, lr=lr, train_set_features_set=train_set_features_set, train_set_targets_set=train_set_targets_set)
+    return coeffs # Returns the trained coefficients, which have the same structure as generate_random_coefficients_for_deep_learning
+
+
+
+
+
+
+
+
In [31]:
+
+
+
for num_neurons in [[10, 10], [20, 20],[5, 5, 5],[30], [], [2], [50], [2, 2], [50, 50], [5, 10, 5], [2, 2, 2, 2]]:
+    dnn_final_weights = smarter_train_model_for_deep_learning(train_set_features, train_set_targets.float(), num_neurons_per_hidden_layer=num_neurons)
+    dnn_preds = smarter_calc_preds_for_deep_learning(coeffs=dnn_final_weights, features=validation_set_features)
+    accuracy = calc_accuracy(predictions=dnn_preds, validation_set_features=validation_set_features, validation_set_targets=validation_set_targets)
+
+    print(f"Hidden layers: {num_neurons}")
+    print(f"True count was {torch.sum(dnn_preds>0.5)} should have been {torch.sum(validation_set_targets.bool())}")
+    print(f"False count was {torch.sum(dnn_preds<=0.5)} should have been {len( validation_set_targets.bool()) - torch.sum(validation_set_targets.bool())}")
+    print(f"Accuracy: {accuracy}")
+    print("-" * 20) # Separator for readability
+
+
+
+
+
+
+
+
+
+
Hidden layers: [10, 10]
+True count was 21 should have been 72
+False count was 157 should have been 106
+Accuracy: 0.6685393452644348
+--------------------
+Hidden layers: [20, 20]
+True count was 60 should have been 72
+False count was 118 should have been 106
+Accuracy: 0.6966292262077332
+--------------------
+Hidden layers: [5, 5, 5]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [30]
+True count was 29 should have been 72
+False count was 149 should have been 106
+Accuracy: 0.6910112500190735
+--------------------
+Hidden layers: []
+True count was 11 should have been 72
+False count was 167 should have been 106
+Accuracy: 0.6348314881324768
+--------------------
+Hidden layers: [2]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [50]
+True count was 33 should have been 72
+False count was 145 should have been 106
+Accuracy: 0.6910112500190735
+--------------------
+Hidden layers: [2, 2]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+Hidden layers: [50, 50]
+True count was 32 should have been 72
+False count was 146 should have been 106
+Accuracy: 0.6853932738304138
+--------------------
+
+
+
+
+
+
+
Hidden layers: [5, 10, 5]
+True count was 28 should have been 72
+False count was 150 should have been 106
+Accuracy: 0.6853932738304138
+--------------------
+
+
+
+
+
+
+
Hidden layers: [2, 2, 2, 2]
+True count was 0 should have been 72
+False count was 178 should have been 106
+Accuracy: 0.5955055952072144
+--------------------
+
+
+
+
+
+
+
+
+
+

Interesting. That definitely improved.

+

I will stop here for now. However, the next step will likely be to add the boolean variables like is_male, is_female, is_class_1, etc. If I understood that correctly and I'm not making any mistakes, it should bring me to around 80% accuracy, like we see on the fast.ai notebook.

+
+
+
+
+
+
In [ ]:
+
+
+
 
+
+
+
+
+