Classification Problem

In machine learning, when we are predicting a discrete label, such as determining whether an email is spam or not, we are dealing with a classification problem. Logistic regression is commonly used for binary classification tasks, where the goal is to predict one of two classes, typically represented as 0 or 1.

The logistic function (also called the sigmoid function) is the core of logistic regression, as it maps input features to probabilities between 0 and 1. These probabilities represent the likelihood of the sample belonging to a particular class.

The logistic function is defined as:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

Where $z = \omega_0 + \mathbf{\omega}^T \mathbf{x}$, the linear combination of the input features $\mathbf{x}$ and the model’s parameters $\mathbf{\omega}$.

Posted 2025-01-07Learning Notes

About Machine Learning ( Part 2: Linear Regression )

Dataset

In prediction tasks, we often use independent features to predict a dependent variable. If we have a dataset:

$$
{ x_d^{(i)}, t^{(i)} }
$$

where:

$x_d^{(i)}$: The $d$-th feature of the $i$-th instance in the dataset.
$t^{(i)}$: The target value (dependent variable) for the $i$-th instance.
$i = 1, \dots, N$: $i$ indexes the instances, and $N$ is the total number of instances in the dataset. ( Here $i$ is not power )
$d = 1, \dots, D$: $d$ indexes the features, and $D$ is the total number of independent features.

Each feature in the dataset can be expressed as:

$$
x_d^{(i)}
$$

For simplicity, the following focuses on a single feature $x$, meaning $D = 1$.

Posted 2025-01-06Learning Notes

About Machine Learning ( Part 1: Gradient Descent )

Data Science

Target Variable

The target variable is the variable the model aims to predict or explain. It’s also called the dependent variable or label.

Attributes

Attributes are the features or variables that describe each instance in a dataset. They are also known as features, columns, or independent variables.

Instances

Instances represent individual samples or data points in a dataset. They are also referred to as samples, rows, or observations.