Posted 2025-03-11Updated 2025-06-30Learning Notes

About Machine Learning ( Part 10: Reinforcement Learning )

Introduction

Reinforcement Learning (RL) is a fascinating branch of machine learning where an agent learns to interact with an environment to maximize long-term cumulative rewards. Unlike supervised learning, RL relies on feedback through interaction instead of labeled data.

The core of RL is built upon Markov Decision Processes (MDPs), which provide a mathematical framework for modeling decision-making under uncertainty.

This blog post explores the key components of RL, including value functions, Q-functions, the Bellman equation, Actor-Critic architectures, PPO, and commonly used tools in real-world RL implementations.

Posted 2025-03-11Updated 2025-06-30Learning Notes

Transformer ( Part 3: Transformer Architecture )

Encoder & Decoder

The Transformer consists of two main parts: an encoder and a decoder. They are connected by Cross-Attention.

Encoder: Processes the input sequence using multiple layers of self-attention and feed-forward networks.
Decoder: Takes the encoder’s output and generates the target sequence using self-attention and cross-attention mechanisms.

The Transformer Architecture:

Posted 2025-03-01Updated 2025-06-30Learning Notes

Transformer ( Part 2: Multi-Head Attention )

Before the Transformer, sequence models like RNNs and LSTMs suffered from long-term dependency issues and low parallelization efficiency. Self-Attention was introduced as an alternative, allowing for parallel computation and capturing long-range dependencies.

However, a single-head Self-Attention mechanism has a limitation:
It can only focus on one type of relationship or pattern in the data.

Multi-Head Attention overcomes this by using multiple attention heads that capture different aspects of the input, improving the model’s expressiveness.

Posted 2025-02-12Updated 2025-06-30Learning Notes

About Machine Learning ( Part 9: Recurrent Neural Network )

Recurrent Neural Networks (RNNs) are a class of neural networks designed for sequential data, making them highly effective for tasks like natural language processing (NLP), time series prediction, and speech recognition. Unlike traditional feedforward networks, RNNs maintain a hidden state that captures temporal dependencies.

How RNNs Work

A traditional feedforward neural network processes inputs independently. However, for sequential tasks, the order of the data is crucial. RNNs address this by maintaining a memory of previous inputs through hidden states.

Posted 2025-02-12Updated 2025-06-30Learning Notes

About Machine Learning ( Part 8: Convolution Neural Networks )

Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision, enabling significant advancements in image recognition, object detection, and segmentation tasks. This blog will explore the key concepts behind CNNs and their working principles.

What is a CNN?

A Convolutional Neural Network (CNN) is a type of deep learning model specifically designed for processing structured grid data, such as images. Unlike traditional fully connected neural networks, CNNs leverage convolutional layers to capture spatial hierarchies in the data.

Posted 2025-02-09Updated 2025-06-30Learning Notes

Transformer ( Part 1: Word Embedding )

Word Embedding is one of the most fundamental techniques in Natural Language Processing (NLP). It represents words as continuous vectors in a high-dimensional space, capturing semantic relationships between them.

Why Do We Need Word Embeddings?

Before word embeddings, one common method to represent words was One-Hot Encoding. In this approach, each word is represented as a high-dimensional sparse vector.

For example, if our vocabulary has 10,000 words, we encode each word as:
$$
\text{dog} = [0, 1, 0, 0, \dots, 0]
$$
However, this method has significant drawbacks:

High dimensionality – A large vocabulary results in enormous vectors.
No semantic similarity – “dog” and “cat” are conceptually related, but their one-hot vectors are completely different.

Word embeddings solve these issues by learning low-dimensional, dense representations that encode semantic relationships between words.

Posted 2025-02-06Updated 2025-06-30Learning Notes

About Machine Learning ( Part 7: Artificial Neural Network )

Bayes’ theorem

$$
P(y|X) = \frac{P(X|y) P(y)}{P(X)}
$$

where:

$P(y|X)$: Posterior probability of class $y$ given input $X$.
$P(X|y)$: Likelihood of seeing $X$ if the class is $y$.
$P(y)$: Prior probability of class $y$.
$P(X)$: Total probability of $X$ (normalization factor).

Bayes Network (Bayesian Network, BN)

A Bayesian network (BN) is a graphical model representing probabilistic dependencies between variables. It consists of:

Nodes: Represent variables (e.g., symptoms, diseases).
Edges: Represent conditional dependencies.

Posted 2025-02-02Updated 2025-06-30Learning Notes

About Machine Learning ( Part 6: KNN vs. K-means )

In machine learning, K-Nearest Neighbors (KNN) and K-means Clustering are two commonly used algorithms. Despite their similar names, they serve different purposes and have distinct working principles.

KNN (K-Nearest Neighbors)

KNN is a supervised learning algorithm used for classification and regression tasks.

The core idea of KNN is:

Given a new data point, find the K most similar instances in the training dataset (neighbors) and use them to predict the output.

KNN is a lazy learning algorithm, meaning it does not require a training phase. Instead, it directly classifies or predicts based on stored data.

Posted 2025-02-02Updated 2025-06-30Learning Notes

About Machine Learning ( Part 5: Support Vector Machine )

Support Vector Machine (SVM)

Support Vector Machines (SVM) are one of the most powerful supervised learning algorithms used for classification and regression tasks.

The Hyperplane

In a binary classification problem, the goal of SVM is to find a hyperplane that best separates two classes. Given a training dataset:

$$
D = { (\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), \dots, (\mathbf{x}_n, y_n) }, \quad \mathbf{x}_i \in \mathbb{R}^d, \quad y_i \in {-1, +1}
$$

$\mathbf{x}_i$: $d$-dimensional feature vector (e.g., pixel values in an image).
$y_i$: Class label ($+1$ for “cat”, $-1$ for “dog”).

Posted 2025-01-21Updated 2025-06-30Learning Notes

About Machine Learning ( Part 4: Decision Tree )

A Decision Tree is a supervised learning algorithm used for both classification and regression tasks. It organizes data into a tree-like structure, where each internal node represents a decision based on a feature, and each leaf node provides a prediction. Decision trees are simple, interpretable, and capable of handling both categorical and numerical data.

Classification Tree

A Classification Tree is a decision tree used for classifying data into distinct categories or classes. The main objective of a classification tree is to predict the category or class to which a given input belongs based on various features.