I discovered that I learn the best when I take good notes. To share this (very large) collection of notes I've made on topics in AI and mathematics, I've exported my notes as HTML pages, which you can find below. Because of the conversion process, there may be some links that are broken. Let me know these places, as well as general comments here.

*Instructors: I generally only include derivations that have instructional significance beyond a problem set (which also means that they are often pubically available elsewhere). If you believe I've included a derivation that is specific to a problem set in a listed course (i.e. not useful beyond that particular question), please contact me and I will take it down. *

These are a collection of subjects that are necessary for understanding machine learning. We will look at how we construct distributions, linear transformations, and the very fabric of calculus. These notes are not listed in strict order of difficulty.

Linear algebra is the question of vectors and linear tranformations applied to these vectors. It's the foundation of many machine learning concepts, where models are represented as a composition of matrices. These (somewhat messy) notes cover linear algebra from abstract and numerical approaches. The notes come from Math 51, Math 113, and various other AI courses at Stanford.

View NotesReal Analysis builds the principles of calculus from the ground-up. The result is pretty simple (nothing beyond a high school calculus class), but the formalism creates a certain proof mindset that I've found indispensable in my other AI courses. These notes include formal definitions of the basic operations, limits, continuity, derivatives, and integration. There is also a bit of point set topology in these notes. These notes come from Math 115 at Stanford.

View NotesOne of the most fundamental topics of machine learning: how do we model real-world stochastic phenomena? These notes are a cobbling-together of various classes, and it goes through simple probability and distributions for both both single and multivariate cases. These notes come from various AI courses at Stanford.

View NotesComputers do stuff. How can we make them do stuff faster? Or, in some cases, turn a difficult problem into an easier one? These notes cover some of the basic algorithms that are a must-have for computer scientists. In many cases, knowing these algorithms can help AI researchers know what is tractable and what is really hard. These notes come from CS 161 from Stanford.

View NotesIn this section, we explore the core principles behind AI models. How do we model the real world? How do we optimize these models? How do we evaluate them? These notes are not listed in strict order of difficulty.

Say we know what we want. Can we get there? These notes cover this journey. We focus mostly on gradient-based methods. You'll find all the matrix calculus basics here, as well as gradient-descent algorithms and a brief discussion of second-order methods. These notes do not cover Convex Optimization, which I will be including in a separate place. These notes come from CS 229 from Stanford.

View NotesOptimization is a very hard problem, but what happens if the optimization terrain has certain guarantees? Convex optimization focuses on a set of problems that are easy to solve but broad in its applications. Many problems can be formatted as a convex optimization problem. These notes include convexity theory (including duality) as well as many applications of optimization. These notes come from EE 364A from Stanford.

View NotesBefore neural networks, there were many other learning algorithms that work pretty well in constrained settings. Unlike neural networks, these methods have many more provable properties. Furthermore, they can be part of a larger pipeline that involves neural networks. Therefore, it is still essential to know how these classical approaches work. These notes include PCA, Naive Bayes, Support Vector Machines, and more. These notes come from CS 229 from Stanford.

View NotesNeural networks are the building blocks of many machine learning models. Before we can talk about all the cool stuff, we need to understand the basics: how are they defined, how are they optimized? These notes also cover some core ML principles like feature selection and bias-variance tradeoff. These notes come from CS 229 from Stanford.

View NotesThe real world is stochastic but also highly dependent. Clouds are correlated with rain, winter correlated with snow. Probabilistic Graphical Models are a very elegant way of modeling these dependencies. Here, we introduce Bayesian Models, Markov Random Fields, and their properties. We see how they can be sampled, evaluated, and trained. Finally, we look at some (pretty complicated) theories of variational inference. These notes come from CS 228 from Stanford.

View NotesInformation Theory deals with the question of communication. We make a lot of noise, and sometimes this noise forms important things. How can we measure degrees of randomness? How can we compress randomness? These notes also include discussions of Markov Processes and other probabilistic processes. Information Theory is important to AI because many of our objectives can be decomposed into properties introduced by information theorists: entropy, divergence, etc. These notes come from EE 276 from Stanford.

View NotesWhen it comes to fitting models, modern machine learning has us taking a lot of things for granted. But some of these assumptions are less obvious than they seem. How close can we really get to the true model by training on a sample of the data distribution? How fast can we converge using gradient descent? Can we take a crack at understanding neural networks? These notes try to answer these questions and more. The content may not be mind-blowing, but the formalism can be very helpful. These notes come from CS 229M from Stanford.

View NotesThese are the notes that talk about various AI approaches, including ways of seeing the world (computer vision), processing language (NLP), learning from reinforcement (RL), and others. Some of these notes include state-of-the-art methods.

Vision is a very important sensory modality to understand the world. These notes give an overview of the methods we use to make computers see. Includes discussions on CNNs and other algorithms. These notes come from CS 231N from Stanford.

View NotesOne of the things that sets us apart from animals is the presence of a natural language, which allows for a very efficient way of exchanging information. In these notes, we will look at how we can equip machines with the ability to read and comprehend. These notes come from CS 224N from Stanford.

View NotesMuch of machine learning is modeling a distribution. In these notes, we will look deeply at advanced distribution modeling techniques, like Variational Autoencoders, GANs, and even diffusion models. These notes come from CS 236 from Stanford.

View NotesHumans and animals learn from experiencing good and bad things. We can get computers to do something similar. These notes are a (decently) comprehensive overview of reinforcement learning techniques. It includes some theoretical analysis of tabular-based algorithms, and it also includes algorithms that involve neural networks and other complicated structures. This is a conglomerate of notes taken from CS 234 (Stanford), CS 285 (Berkeley), and CS 224R (Stanford).

View NotesAs we grow up, we gain knowledge, but we also learn how to learn. More specifically, we learn certain strategies that speed up this learning process. In the human world, we might call these study-skills. In the AI world, we call this meta-learning. In these notes, we will cover the AI models capable of meta-learning, and how we might apply them to real-world problems. These notes come from CS 330 from Stanford.

View NotesThe problem of robot-learning encompasses many fields of AI: computer vision, reinforcement learning, embodied learning, and others. But to get there, we need to understand how robots actually work, and this starts with the basics. How do you model a robot arm? It's a harder problem than you might think. You need to express the locations of the links, and then you need to model the physics of a multi-linked object. Here, we introduce all of these things that allow robot-learning to be possible. These notes come from CS 223A from Stanford.

View Notes