1. Field Overview - Intro to AI

1.1 AI History

This section provides an overview of Artificial Intelligence, including its origins, core concepts, and goals.

To get started, here are some foundational resources that offer a general understanding before we delve deeper into specific areas.

AI is commonly said to have originated at the Dartmouth Conference in 1956, where leading thinkers gathered to discuss the possibilities of machine intelligence. While this event popularized AI, the concept of thinking machines dates back to the 1930s, with early work from pioneers like Alan Turing, John von Neumann, and Claude Shannon, who laid the groundwork for modern computing and artificial intelligence.

Rather than a detailed history, below is a timeline of key milestones in AI and links to more in-depth resources.

1.2 Expert Systems

Before exploring modern AI, it's essential to understand the foundational work from the 1950s to 1980s that set the stage for today’s advancements. During AI's early years, much of the focus was on developing expert systems and logical reasoning frameworks. These systems, such as IBM's Deep Blue and other rule-based engines, were designed to simulate human expertise in specific domains, relying heavily on symbolic reasoning and decision rules.

The following is one of the better sources for a Deep Dive into Expert Systems.
Despite its age, this paper offers valuable insights. Early Expert Systems: Where Are They Now?

Expert Systems: A Forgotten Area of AI

Beyond this video, Lex Fridman is an exceptional researcher and also has a podcast series I highly recommend.

Skip to 2:45 mark

1.3 Machine Learning (ML)

Terms like Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) are often used interchangeably, but each has specific distinctions. Broadly speaking, machine learning can be thought of as a process of solving optimization problems. By adjusting parameters and learning from data, ML algorithms can identify patterns, make predictions, and support decision-making in a variety of contexts. Each approach to ML has its own strengths and biases, and some methods are better suited for specific types of problems.

The simplest way to categorize machine learning is by its approach to learning from data, also known as the learning paradigm. I will keep most of this short, sweet, and resource plentiful.

Simple Explanation

1.3.1 Supervised Learning

Supervised learning involves training an algorithm on a labeled dataset, where each data point includes both input features and the correct output (label). The algorithm makes predictions, calculates the error between its predicted output and the actual output, and then adjusts its internal parameters (weights) to minimize this error over time, improving its predictions.

Decision Trees

Decision Trees are simple and powerful. The best solution to a problem is typically the simplest. Greate Example

Simple Explanation

Advantages and Disadvantages

Naive Bayes

Naive Bayes is a simple but powerful algorithm for classification tasks. It’s based on the assumption that features (like words in an email) are independent of each other, which makes it a “naive” approach. Despite its simplicity, it works well for many problems, including spam filtering.

Advantages and Disadvantages

Overview of Naive Bayes

Why is it Naive

NB Algorithm Explained

Nearest Neighbor

The *k-Nearest Neighbor* algorithm is like asking your closest friends for advice. To make a prediction, it looks at the "k" closest data points in the dataset and classifies based on the majority of those neighbors. It’s simple, intuitive, and works well for many problems, but can be computationally expensive with large datasets.

The following is one of the better sources for a Deep Dive into Expert Systems.
Despite its age, this paper offers valuable insights. Early Expert Systems: Where Are They Now?

Simple Explanation

Business application

Advantages and disadvantages

Artificial Neural Networks (ANN)

Artificial Neural Networks are the backbone of Deep Learning. These networks are inspired by the human brain, with layers of "neurons" that process information. They excel at recognizing patterns in large datasets, from images to natural language, and form the foundation for many cutting-edge technologies like facial recognition and language translation.

Advantages and Disadvantages

Perceptron, the simplest neural network

Neural Networks

Early Neural Networks

Support Vector Machines (SVM)

SVMs are like boundary setters. They aim to find the optimal boundary (or hyperplane) that best separates data into categories. The "support vector" refers to the data points that are closest to this boundary and help define it. SVMs are particularly powerful for complex classification tasks.

Great Intro into support vector machines

Quick Introduction

Why SVM over other ML algorithms

SVM Applications

1.3.2 Unsupervised Learning

Unsupervised learning involves training an algorithm on a dataset without labeled outputs. The model tries to learn the underlying patterns or structure in the data, which is particularly useful for tasks like clustering, anomaly detection, and dimensionality reduction.

Introduction to Unsupervised Learning

k-Means Clustering

The *k-Means Clustering* algorithm groups data points into clusters based on their similarities. It works by partitioning the dataset into "k" clusters, assigning each data point to the cluster with the nearest centroid. This method is commonly used for customer segmentation and image compression.

Deep Dive into k-Means

Difference between KNN and K-Means

k-Means Explained

Applications of k-Means

Hierarchical Clustering

*Hierarchical Clustering* builds a hierarchy of clusters, which can be visualized using a tree-like diagram called a dendrogram. It’s useful when the number of clusters is unknown and provides insight into how clusters relate at various levels. This approach is often applied in social network analysis and gene expression data.

Overview and Use Cases

Hierarchical Clustering Explained

K-Means vs Hierarchical

Principal Component Analysis (PCA)

*Principal Component Analysis (PCA)* is a dimensionality reduction technique that transforms data into a lower-dimensional space while preserving as much information as possible. It’s commonly used for reducing features in high-dimensional data for visualization or to improve model performance.

Introduction to PCA

PCA Application Example

PCA in Dimensionality Reduction

Anomaly Detection

*Anomaly Detection* algorithms identify unusual patterns or data points that deviate from the norm. These techniques are valuable in identifying fraud, defects, or faults in various domains, especially when anomalies occur infrequently.

Anomaly Detection Overview

Longer Video, great content

Anomaly detection 101

Association Rule Learning

*Association Rule Learning* is used to discover interesting relationships or patterns within data, typically in transactional datasets. The Apriori and Eclat algorithms are two popular methods that help identify associations between items in large datasets, often applied in market basket analysis.

What is Association Rule Learning

Apriori Algorithm

Data Mining Application

1.3.3 Semi-Supervised Learning

Semi-supervised learning is a technique where an algorithm is trained on a dataset that contains a small amount of labeled data and a larger amount of unlabeled data. This approach combines elements of supervised and unsupervised learning, leveraging the labeled data to guide the model in making predictions while learning from the structure within the unlabeled data.

Self-Training

Self-training is one of the simplest and most intuitive semi-supervised learning methods. Detailed Overview

Self-Training Explanation

Another Explanation

Practical Implementation

Co-Training

Co-training uses multiple views of the data to train separate models that teach each other. Each model makes predictions on unlabeled data, and their most confident predictions are used to create additional training data for the other models. This approach works particularly well when the different views of the data provide complementary information.

Deeper Explanation

Generative Models

Generative models in semi-supervised learning learn the underlying distribution of the data, allowing them to generate new samples and better understand the structure of unlabeled data. These models can use this understanding to improve their performance on the labeled portion of the dataset.

DeepGram Explanation

Deep Generative Model

Graph-Based Methods

Graph-based methods utilize the structure within data points, representing them as nodes in a graph and using edges to indicate similarity. These methods spread labels through the graph, leveraging connections between labeled and unlabeled data to make predictions.

Overview

Graph Based Explanation

1.3.4 Reinforcement Learning (RL)

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment, receiving rewards or penalties based on its actions. Over time, the agent learns to take actions that maximize cumulative reward, making RL particularly useful in scenarios like robotics, game playing, autonomous driving and lately for training LLMs.

Reinforcement Learning Methods

To start, let's first cover some of the basic components of RL. Markov Decision Processes (MDPs) form the mathematical foundation of RL. An MDP defines the environment as a set of states, actions, rewards, and transitions, where the next state depends only on the current state and action. This setup allows agents to learn optimal policies for decision-making under uncertainty.

Markov Decision Process Explained

Much more indepth, David Silver is another world class researcher in RL

(Model Free) Policy Optimization

*Policy Optimization* methods focus on directly improving the policy, which defines the agent's behavior. Common algorithms like REINFORCE, PPO (Proximal Policy Optimization), and Actor-Critic methods iteratively adjust the policy to maximize expected rewards. These techniques are effective in complex environments where learning a model of the environment is infeasible.

One of the better explanations of Policy Optimization I have seen.

PPO Algorithm

Proximal Policy Optimization (PPO) Explained

PPO Paper

Actor-Critic

Actor-Critic Algorithms

(Model Free) Gradient Free

*Gradient-Free* methods, like evolutionary strategies and genetic algorithms, search for optimal policies without using gradient information. These methods are often used in RL when gradient-based optimization is challenging, such as in high-dimensional or non-differentiable environments. We will cover some of the breakthroughs and applications of these algorithms in another section of the site.

Q-Learning Paper

Great Intro into Q-Learning

Decent Overview, also the speech and visual are both generative AI!

SARSA vs Q-Learning

(Model Based) Learning the Model

In *Model-Based* RL, the agent learns a model of the environment's dynamics and uses it to plan actions. Algorithms like Dyna-Q integrate planning and learning by updating the model and generating simulated experiences, helping the agent to achieve faster convergence. When the *Model* is already provided or accurately approximated, planning algorithms like Value Iteration and Policy Iteration can be applied. These techniques use the model to compute an optimal policy without additional interaction, making them valuable in structured environments like simulations.

Intro to Model Based RL

1.3.4 Additional Learning Techniques

There are other types of learning (meta learning, transfer learning, imitation learning)

1.4 Deep Learning

Deep Learning is a subset of machine learning that uses neural networks with many layers (hence "deep") to model complex patterns in large datasets. It's the backbone of advances in computer vision, natural language processing, speech recognition, and more. Deep learning has been transformative in AI, enabling systems to learn directly from data, often without needing human-defined features.

Machine Learning vs Deep Learning

Artifical Neural Networks

Artificial Neural networks are computational models inspired by the human brain, consisting of layers of interconnected "neurons." Each neuron performs a weighted sum of inputs, applies an activation function, and passes the result to the next layer. These networks are trained through backpropagation, adjusting weights to minimize error in predictions.

Neural Network Architectures

I have come across this playlist throughout many of my classes, great resource

Convolutional Neural Networks (CNNs)

CNNs are designed for processing structured grid data, like images. By using convolutional layers to detect spatial hierarchies (such as edges, shapes, and textures), CNNs are especially effective in image recognition and computer vision tasks.

Detailed Guide to CNNs

Understanding CNNs

Recurrent Neural Networks (RNNs) and LSTMs

RNNs are specialized for sequential data, where each data point depends on previous points, making them suitable for time series and text data. Long Short-Term Memory networks (LSTMs) improve upon RNNs by addressing the issue of vanishing gradients, allowing them to capture long-term dependencies.

Deep Dive into LSTMs

Intro to RNNs and LSTMs

Transformers and Attention Mechanisms

Transformers are the foundation of modern natural language processing. They use an "attention" mechanism to weigh the importance of different words in a sequence, enabling models like GPT and BERT to understand language context and semantics on a large scale.

Understanding Transformers

Transformer Attention Explained

Generative Adversarial Networks (GANs)

GANs consist of two networks, a generator and a discriminator, competing against each other. The generator creates data, while the discriminator tries to distinguish between real and generated data. This approach is effective for creating realistic images, art, and synthetic data.

Original GANs paper

Intro to GANs

Good Intro Mini Course

Ian Goodfellow was one of the key authors of the seminal 2014 paper introducing Generative Adversarial Networks (GANs). This is an excellent starting point for understanding the foundations of GANs.

Autoencoders and Variational Autoencoders (VAEs)

Autoencoders are neural networks trained to compress and reconstruct data, often used for noise reduction, feature learning, and anomaly detection. Variational Autoencoders (VAEs) introduce a probabilistic approach, making them useful for generating new, unique data points.

Autoencoders vs Variational Autoencoders

Autoencoders

Variational Autoencoders

Deep Learning Frameworks: TensorFlow and PyTorch

TensorFlow and PyTorch are two of the most popular deep learning frameworks, providing essential tools for building and training neural networks. TensorFlow, developed by Google, and PyTorch, by Facebook, each have strengths that make them suitable for different types of deep learning tasks.

In programming, frameworks are platforms that provide tools and structures to help build AI systems more easily. Libraries are collections of pre-written code that developers can use to perform specific tasks, like handling data or doing math. Packages are bundles of libraries and other tools, designed to make specific types of AI tasks, like image recognition or language processing, faster and more efficient. Together, these tools help developers build AI systems without starting from scratch.

TensorFlow

PyTorch

Domains in AI

While Machine Learning (ML) and Deep Learning (DL) are technical approaches in AI, domains like Natural Language Processing (NLP) and Computer Vision (CV) represent application areas. These domains utilize ML and DL techniques to solve specific challenges, such as understanding language or recognizing images.

Natural Language Processing (NLP)

NLP is a field focused on enabling computers to understand, interpret, and generate human language. NLP techniques use ML and DL to perform tasks such as translation, sentiment analysis, and text summarization, making it crucial in applications like chatbots, search engines, and personal assistants.

What is NLP

NLP vs NLU vs NLG

Great Serious covering a wide selection of NLP topics

Computer Vision (CV)

Computer Vision enables computers to interpret and process visual information from the world. By leveraging DL models like CNNs, CV applications can recognize objects, detect faces, and analyze scenes, making it essential for fields like robotics, medical imaging, and autonomous driving.

Intro to Computer Vision

Computer Vision Series

Computer Vision Applications

Speech Recognition

Speech Recognition involves transcribing spoken language into text. ML and DL models process audio signals to identify phonetic patterns, enabling applications like virtual assistants and automated transcription services to accurately understand and respond to voice commands.

Speech Synthesis Overview

Speech Recognition

Intro to Speech Recognition

Robotics

Robotics integrates AI with physical systems, enabling robots to perform complex tasks autonomously. Using ML and CV, robots can perceive their environment, make decisions, and act, making advancements possible in manufacturing, healthcare, and exploration.

2018: Somewhat Relevant

Intro into Robotics

Humanoid Robotics