This repository contains my Jupyter Notebook files on the neural networks course taught by Andrej Karpathy. The course covers neural network basics and progresses to more advanced topics. Each lecture is represented as a Jupyter Notebook file (.ipynb).
Lecture 1: The spelled-out intro to neural networks and backpropagation: building micrograd
- Introduction to gradients and calculating the slope of a function using small increments (numerical differentiation)
- Recreating micrograd (Value class) to create mathematical expressions that can be automatically backpropagated
- Visualizing mathematical expressions with a computational graph composed of the operations tracked by micrograd
- Manual backpropagation for a simple neuron model and its activation function (tanh)
- Backpropagate using pytorch.
- Building a basic neural network (multi-layer perceptron) from scratch using micrograd and applying the tanh activation function.
- Using the neural network for a simple dataset and performing forward and backward passes (training) to minimize the loss (squared error) through gradient descent.
Lecture 2: The spelled-out intro to language modeling: building makemore
- Bigram Generation:
- Creating bigrams from the dataset and counting their occurrences.
- Visualizing the bigram frequency using a heatmap.
- Probability Calculations:
- Initializing probability matrix 'P' based on bigram counts for each character.
- Smoothing the model to prevent zero probabilities. (Opcional)
- Generating new words using the trained model.
- Model Quality Evaluation:
- Calculating the likelihood and negative log likelihood of the data with respect to model parameters.
- Neural Network Approach - Bigrams:
- Building a simple neural network for bigram prediction.
- Performing a forward pass using random weights and one-hot encoding.
- Exponentiating log counts to obtain counts and converting to probabilities using softmax.
- Optimization:
- Implementing gradient descent to optimize the neural network.
- Incorporating regularization to the loss function.
- Sampling from the trained neural network to generate new words.