Solve hard margin Support Vector Machine optimization using Stochastic Gradient Descent

Introduction

This post considers the very basics of the SVM problem in the context of hard margin classification and linearly separable data. There is minimal discussion of soft margins and no discussion of kernel tricks, the dual formulation, or more advanced solving techniques. Some prior knowledge of linear algebra, calculus, and machine learning objectives is necessary.

1. Support Vector Machines

The first thing to understand about SVMs is what exactly a “support vector” is. To understand this, it is also important to understand what the goal of SVM is, as it is slightly different from Logistic Regression and other non-parametric techniques.

SVM aims to draw a decision boundary through linearly separable classes such that the boundary is as robust as possible. This means that the position of the boundary is determined by those points which lie nearest to it. The decision boundary is a line or hyperplane that has as large a distance as possible from the nearest training instance of either class, as shown in the plot below. …


Image for post
Image for post
Image by author

Use linear regression and Newton’s method to maximize a company’s production output

Introduction

Economics and data science can look very similar at times. Many techniques that have been used by economists for decades are imperative to data science and machine learning. The intersection comes not only from the close relationship of both fields to statistics, but also from the mathematics that drives their modeling processes.

In this post I will show the usefulness of applying economic methods to a data science-like problem. Although the technique used in this blog post is slightly more complicated than necessary for the problem, practicing on an easy task is a great way to learn a complicated procedure. …


The fundamental reasons for minimizing binary cross entropy (log loss) with probabilistic classification models

Introduction

This post discusses why logistic regression necessarily uses a different loss function than linear regression. First, the simple yet inefficient way to solve logistic regression will be presented, then the slightly less simple but much more efficient way will be explained and compared.

The simple way

Linear regression is the predecessor of logistic regression for most people studying statistics or machine learning. …


Image for post
Image for post
By NASA — NASA Human Space Flight Gallery (image link), Public Domain, https://commons.wikimedia.org/w/index.php?curid=181762

The simple statistics that demonstrate the high probability of the Challenger disaster occurring

Introduction

I recently watched the 2020 Netflix docuseries entitled Challenger: The Final Flight, which tells the fascinating yet heartbreaking story of the tragedy of NASA’s Challenger space shuttle. After watching the series, I was inspired to explore the simple statistical modeling that describes the event. For those who have yet to watch the documentary or who are unfamiliar with Challenger’s story, I’ll start with a brief overview.

Netflix: Challenger: The Final Flight

The 1980s marked a very exciting time for NASA and space exploration. NASA was experiencing success after success in their relatively new Space Shuttle program. Astronauts were being sent to and from orbit on the same vessel, edging NASA closer and closer to their goal of commercial space flight. …


Image for post
Image for post
Kepler’s first light. Image credit: NASA/Ames/J. Jenkins, https://www.nasa.gov/content/keplers-first-light

Use local outlier factors for both outlier and novelty detection in order to perform imbalanced classification on Kepler data

Introduction

The presence of imbalanced class sizes when discriminating class membership in a body of data can be a large problem if one’s results are not interpreted appropriately. Achieving high accuracy, the so-called “white whale” of most classification problems, becomes a trivial task if an imbalance is not properly addressed. Although it is often better to optimize metrics such as sensitivity and specificity, this can be difficult with many of the popular supervised learning models. For this reason, one might consider turning to unsupervised/semi-supervised methods instead.

One common application of unsupervised/semi-supervised learning is anomaly detection. In this specific context, unsupervised learning focuses on outlier detection, or identifying anomalies within the known data, while semi-supervised learning focuses on novelty detection, or looking for anomalies that come from new data. …


Select features for multi-variate regression analysis by calculating the distances of transformed correlation coefficients

Introduction

Performing multiple regression analysis from a large set of independent variables can be a challenging task. Identifying the best subset of regressors for a model involves optimizing against things like bias, multicollinearity, exogeneity/endogeneity, and threats to external validity. Such problems become difficult to understand and control in the presence of a large number of features. Professors will often tell you to “let theory be your guide” when going about feature selection, but that is not always so easy.

This blog considers the issue of multicollinearity and suggests a method of avoiding it. Proposed here is not a “solution” to collinear variables, nor is it a perfect way of identifying them. …


Image for post
Image for post
Image by author

Use Markov-Chains to generate synthetic documents for correcting against imbalanced class sizes in text analysis

Introduction

Classification problems in supervised machine learning are often troubled by the issue of imbalanced class sizes. Given binary classified data, an imbalanced stratification of the two classes will bias the predictions of a model fit to it. A model trained on data made up of 1,000 samples labeled class “0” and 100 samples labeled class “1” could naively predict class “0” for every test instance and report 90% accuracy. Such an accuracy score is deceptive, as the model is not actually “learning” any trends from the data. This can cause serious problems in deployment. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store