Supervised Learning Algorithms
Linear Regression
Overview:
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.
It assumes a linear relationship between the input variables and the target variable.
Key Concepts:
Simple Linear Regression: Modeling the relationship between two variables using a straight line.
Multiple Linear Regression: Extending linear regression to multiple independent variables.
Logistic Regression
Overview:
Logistic regression is a classification algorithm used to model the probability of a binary outcome.
It predicts the probability that a given input belongs to a particular class.
Key Concepts:
Logistic Function: Transformation of the linear combination of input features into a probability score.
Sigmoid Function: The logistic function, which maps any real-valued number to a value between 0 and 1.
k-Nearest Neighbors (kNN)
Overview:
k-Nearest Neighbors is a non-parametric classification algorithm used for both classification and regression tasks.
It makes predictions based on the majority vote (for classification) or the average (for regression) of the k-nearest neighbors in the feature space.
Key Concepts:
Distance Metrics: Measures used to compute the similarity or distance between data points, such as Euclidean distance or Manhattan distance.
Hyperparameter k: The number of nearest neighbors considered when making predictions.
Decision Trees and Random Forests
Overview:
Decision trees are a versatile supervised learning algorithm used for classification and regression tasks.
Random forests are an ensemble learning method that builds multiple decision trees and combines their predictions to improve accuracy and reduce overfitting.
Key Concepts:
Tree Structure: A hierarchical structure consisting of nodes (decision points) and edges (branches) that represent feature splits.
Entropy and Information Gain: Criteria used to determine the best feature to split on at each node, based on the reduction in uncertainty.
Ensemble Learning: Combining the predictions of multiple models to achieve better performance than any individual model.
Support Vector Machines (SVM)
Overview:
Support Vector Machines are powerful supervised learning algorithms used for classification and regression tasks.
They find the optimal hyperplane that separates data points into different classes while maximizing the margin between classes.
Key Concepts:
Margin: The distance between the hyperplane and the nearest data points (support vectors).
Kernel Trick: Technique used to transform the input features into a higher-dimensional space, allowing SVMs to handle non-linear decision boundaries.
Regularization Parameter (C): Hyperparameter that controls the trade-off between maximizing the margin and minimizing classification errors.
Defition of terms
Model:
In the context of machine learning, a model is a mathematical representation of a real-world process or phenomenon. It is created by training an algorithm on data to make predictions or decisions. Models can be used for various tasks such as classification, regression, clustering, and more.
Regression:
Regression is a statistical method used to model the relationship between a dependent variable (target) and one or more independent variables (features). It is commonly used for predicting continuous numeric values. Linear regression is a specific type of regression where the relationship between variables is assumed to be linear.
Classification:
Classification is a supervised learning task where the goal is to categorize input data into one of several predefined classes or categories. It is commonly used for predicting categorical outcomes or labels. For example, classifying emails as spam or non-spam, or classifying images of animals into different species.
k-Nearest Neighbors (kNN):
k-Nearest Neighbors is a non-parametric classification algorithm that makes predictions based on the majority vote of the k-nearest neighbors of a data point in the feature space. It is often used for both classification and regression tasks and does not require a training phase.
Decision Trees:
Decision trees are a versatile supervised learning algorithm used for classification and regression tasks. They partition the feature space into regions and make predictions by traversing the tree from the root node to a leaf node based on feature values.
Random Forests:
Random forests are an ensemble learning method that builds multiple decision trees and combines their predictions to improve accuracy and reduce overfitting. Each tree in the forest is trained on a random subset of the training data and features, and the final prediction is determined by a majority vote or averaging.
Support Vector Machines (SVM):
Support Vector Machines are supervised learning algorithms used for classification and regression tasks. They find the optimal hyperplane that separates data points into different classes while maximizing the margin between classes. SVMs can handle both linear and non-linear decision boundaries using kernel functions.
Last updated