Linear Regression Case Studies: Real-World Applications and Insights


Linear regression is one of the most fundamental statistical methods used in data analysis and predictive modeling. It allows us to explore relationships between variables and predict outcomes based on input data. In this blog post, we will dive into some practical case studies where linear regression has been used to solve real-world problems. These case studies will showcase how linear regression can be applied across various industries such as finance, healthcare, marketing, and more.


What is Linear Regression?

Before diving into case studies, let’s quickly review what linear regression is. Linear regression models the relationship between a dependent variable (target) and one or more independent variables (predictors) using a straight line. The model tries to fit the best line that minimizes the sum of squared errors between the observed values and predicted values.

In simple linear regression, the relationship between the dependent variable Y and the independent variable X is expressed as:

Y=β0+β1X+ϵ

Where:

  • Y = Dependent variable
  • X = Independent variable
  • β0 = Intercept (constant term)
  • β1 = Slope (coefficient for X)
  • ϵ = Error term (residuals)

Case Study 1: Predicting Housing Prices

Industry: Real Estate

In the real estate industry, linear regression is frequently used to predict the price of a house based on various features like size, location, number of bedrooms, etc. This is an excellent example of how linear regression helps both buyers and sellers make informed decisions.

Problem:

A real estate agency wants to predict the price of a house based on its size (square footage) and the number of bedrooms. The agency has data on previous home sales.

Solution:

Using linear regression, we can model the relationship between the price of the house (dependent variable) and the size of the house and number of bedrooms (independent variables). The resulting model will help predict the price of a house for potential buyers and sellers.

Sample Code (Python)

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Sample data
data = {
    'Size (sqft)': [1500, 1800, 2400, 3000, 3500],
    'Bedrooms': [3, 3, 4, 4, 5],
    'Price ($)': [400000, 450000, 500000, 600000, 650000]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Define independent variables (Size and Bedrooms)
X = df[['Size (sqft)', 'Bedrooms']]

# Define dependent variable (Price)
y = df['Price ($)']

# Create the linear regression model
model = LinearRegression()
model.fit(X, y)

# Predicting house price based on new data
predicted_price = model.predict([[2000, 3]])  # Predict price for 2000 sqft, 3 bedrooms

print(f"Predicted House Price: ${predicted_price[0]:,.2f}")

Interpretation:

  • The model will give us the price prediction based on the house's size and the number of bedrooms.
  • The coefficients β1 and β2 (for size and bedrooms) will tell us how much each factor influences the price.

Case Study 2: Predicting Employee Salaries

Industry: Human Resources

In the field of human resources, linear regression can be used to predict employee salaries based on experience, education level, and other factors. This is a practical example for businesses that want to make data-driven decisions about compensation.

Problem:

A company wants to predict an employee’s salary based on their years of experience and educational qualification. The company has data from previous employees.

Solution:

By using linear regression, the company can create a model that predicts salaries based on these predictors. This model helps businesses ensure competitive compensation while maintaining internal equity.

Sample Code (Python)

# Sample data
data = {
    'Experience (years)': [1, 2, 3, 4, 5],
    'Education Level': [1, 2, 2, 3, 3],  # 1=High School, 2=Bachelor's, 3=Master's
    'Salary ($)': [40000, 45000, 50000, 60000, 70000]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Define independent variables (Experience and Education)
X = df[['Experience (years)', 'Education Level']]

# Define dependent variable (Salary)
y = df['Salary ($)']

# Create the linear regression model
model = LinearRegression()
model.fit(X, y)

# Predicting salary based on new data
predicted_salary = model.predict([[6, 3]])  # Predict salary for 6 years of experience, Master's degree

print(f"Predicted Salary: ${predicted_salary[0]:,.2f}")

Interpretation:

  • The model uses the number of years of experience and education level to predict salary.
  • The coefficients will show how much salary increases with each additional year of experience and higher education level.

Case Study 3: Marketing Campaign Effectiveness

Industry: Marketing

Linear regression can help companies assess the effectiveness of their marketing campaigns by evaluating how different factors (e.g., budget, social media engagement) influence sales.

Problem:

A company wants to know how its advertising budget and social media engagement impact sales. They want to know if increasing the budget for online ads will result in increased sales.

Solution:

By using linear regression, the company can model the relationship between advertising budget, social media engagement, and sales. This helps the company understand the return on investment (ROI) from their marketing activities.

Sample Code (Python)

# Sample data
data = {
    'Ad Budget ($)': [1000, 2000, 3000, 4000, 5000],
    'Social Media Engagement': [150, 200, 250, 300, 350],
    'Sales ($)': [12000, 15000, 18000, 22000, 25000]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Define independent variables (Ad Budget and Social Media Engagement)
X = df[['Ad Budget ($)', 'Social Media Engagement']]

# Define dependent variable (Sales)
y = df['Sales ($)']

# Create the linear regression model
model = LinearRegression()
model.fit(X, y)

# Predicting sales based on new data
predicted_sales = model.predict([[3500, 320]])  # Predict sales for $3500 ad budget and 320 social media engagements

print(f"Predicted Sales: ${predicted_sales[0]:,.2f}")

Interpretation:

  • The model predicts sales based on the advertising budget and social media engagement.
  • The coefficients will show how much the sales are expected to increase with each additional dollar spent on advertising or each additional unit of engagement.

Case Study 4: Healthcare: Predicting Patient Recovery Time

Industry: Healthcare

In healthcare, linear regression is often used to predict patient outcomes based on various clinical factors. One example is predicting recovery time for patients after surgery based on factors like age, pre-existing conditions, and type of surgery.

Problem:

A hospital wants to predict how long it will take for a patient to recover after surgery. The hospital has data on patients' age, the type of surgery they had, and other medical factors.

Solution:

Using linear regression, the hospital can build a predictive model that estimates recovery time, helping to plan patient care and hospital resources.

Sample Code (Python)

# Sample data
data = {
    'Age (years)': [25, 40, 60, 30, 50],
    'Surgery Type': [1, 2, 2, 1, 3],  # 1=Minor, 2=Moderate, 3=Major
    'Recovery Time (days)': [7, 14, 30, 10, 25]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Define independent variables (Age and Surgery Type)
X = df[['Age (years)', 'Surgery Type']]

# Define dependent variable (Recovery Time)
y = df['Recovery Time (days)']

# Create the linear regression model
model = LinearRegression()
model.fit(X, y)

# Predicting recovery time based on new data
predicted_recovery = model.predict([[35, 2]])  # Predict recovery time for 35 years old, Moderate surgery

print(f"Predicted Recovery Time: {predicted_recovery[0]:,.2f} days")

Interpretation:

  • The model predicts recovery time based on age and surgery type.
  • The coefficients for each variable show how they influence the recovery time.