Building Logistic Regression from Scratch in Python
In this blog post, we'll dive into the logistic regression model, a fundamental algorithm for binary classification. We'll be writing our logistic regression function and optimization function from scratch in Python, without using libraries such as scikit-learn. This will give you a deeper understanding of the inner workings of logistic regression.
Logistic Regression Overview
Logistic Regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). The logistic function is defined as:
Code Implementation
Let's start by implementing the logistic function, followed by the cost function which we aim to minimize.
import numpy as np
# Logistic Function
def logistic_function(x, beta):
z = np.dot(x, beta[1:]) + beta[0]
return 1 / (1 + np.exp(-z))
# Cost Function
def cost_function(x, y, beta):
m = len(y)
total_cost = -(1 / m) * np.sum(
y * np.log(logistic_function(x, beta)) + (1 - y) * np.log(
1 - logistic_function(x, beta)))
return total_cost
Now, let's implement the optimization function using Gradient Descent to find the optimal parameters β.
# Gradient Descent Function to minimize the cost function
def gradient_descent(x, y, beta, learning_rate, iterations):
m = len(y)
cost_history = np.zeros(iterations)
for i in range(iterations):
beta[0] = beta[0] - (learning_rate/m) * np.sum(
logistic_function(x, beta) - y)
beta[1:] = beta[1:] - (learning_rate/m) * np.dot(
x.T, logistic_function(x, beta) - y)
cost_history[i] = cost_function(x, y, beta)
return beta, cost_history
Prediction Function
# Prediction Function
def predict(x, beta):
'''
Returns the probability that each observation belongs to class 1
'''
return logistic_function(x, beta)
# Threshold Function
def classify(predictions, threshold=0.5):
'''
Classifies the predictions into class 0 or 1 based on a specified threshold
'''
classes = np.zeros_like(predictions)
classes[predictions >= threshold] = 1
return classes
In the predict function, we use the logistic function with the optimized coefficients to compute the probabilities of belonging to class 1 for new data. In the classify function, we apply a threshold (default is 0.5) to these probabilities to obtain binary class predictions. If the probability is greater than or equal to 0.5, the function classifies the observation as class 1; otherwise, it classifies the observation as class 0.
Let's put it all together and run our logistic regression on some data.
# Training data (x and y)
# Assume there are 20 data points
x = np.array([[2], [3], [10], [19], [23], [10], [18], [22], [7], [5],
[24], [29], [30], [34], [35], [28], [33], [40], [42], [45]])
y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) # 0s and 1s indicating the two classes
new_x = np.array([[15], [25], [8], [36], [48]])
# Ensure x is in the correct shape for our functions
x = np.hstack((np.ones((x.shape[0], 1)), x)) # Adding a column of ones for the intercept term
new_x = np.hstack((np.ones((new_x.shape[0], 1)), new_x)) # Adding a column of ones for the intercept term
# Initial coefficients
beta = np.zeros(x.shape[1])
# Set learning rate and number of iterations
learning_rate = 0.01
iterations = 1000
# Define the logistic function, cost function, and gradient descent function as before
# Run Gradient Descent
optimized_beta, cost_history = gradient_descent(x, y, beta, learning_rate, iterations)
# Get probabilities
probabilities = predict(new_x, optimized_beta)
# Get binary class predictions
binary_predictions = classify(probabilities)
# Output the binary predictions
print(binary_predictions)
x is a 20x2 matrix, where each row represents a data point, and there are two columns (one for the intercept term and one for the single feature).
y is a vector of length 20, where each entry is the class label (0 or 1) for the corresponding data point in x.
new_x is a 5x2 matrix representing new data points we want to make predictions on, formatted similarly to x.
After running the logistic regression training, making predictions, and classifying the new data points, the binary_predictions vector will contain the predicted class labels for the new_x data points.
Conclusion
In this blog post, we implemented a logistic regression model from scratch in Python. We defined the logistic function, cost function, and used gradient descent to optimize the model parameters. This exercise provides a clear understanding of the logistic regression algorithm, which is the foundation for more complex machine learning models.