- Savidu Dias

# To Noobs with Love: Machine Learning: Simple Linear Regression

Welcome to my blog series on Machine Learning, where I plan on writing about the basic concepts of machine learning all the way up to some really advanced stuff (hopefully). As a machine learning noob myself, I will write these posts as I get to study more about the concepts of machine learning. Consider these to be my own notes as I study machine learning. Let’s start off with Simple Linear Regression.

## What is Simple Linear Regression?

You might be familiar with plotting line graphs with one X axis and one Y axis. The values in the X axis are sometimes called “independent variables”, while the values in the Y axis are called “dependent variables”. Simple Linear Regression plots one independent variable X against one dependent variable Y in a line graph.

To explain things in a more formal way, Simple Linear Regression is a statistical method that allows us to summarize the relationship between two variables.

## Simple Linear Regression Formula

__Regression Analysis__ is a major part of data science, which is the process used to find equations that match a particular data set. Consider the following chart showing the relationship between the years of experience of a bunch of employees in a company, and their salary.

This type of representation is called a “scatter plot”. Each cross (x) in this diagram represents a single employee, where the X axis represents their years of experience, and the Y axis represents their salary. If we study the graph closely, we can see that the data appears to form a straight line.

When such data appears to form a straight line, we can use Simple Linear Regression to predict the salary of a future employee based on their experience. If we recall the algebra you learned when you were 5 years old, you’ll remember that the equation for a straight line is **y = mx + c**. However, statistics generally prefer to use the following equation.

**y = b0 + b1x**

y represents the **dependent variable**

x represents the **independent variable**

b0 and b1 are constants and are parameters (or coefficients) that need to be estimated from the data.

b0 is known as the **intercept**. This is the point in which the straight line touches the Y axis. In our example, this would be the predicted salary of a fresh graduate joining with no experience.

b1** **indicates the **slope of the line**. This shows the increase in salary per year.

## Best Fitting Line

**The red line from the graphs above is known as the “Best Fitting Line”. This line represents the model for Simple Linear Regression.**

The task of developing a Simple Linear Regression model is to come up with a best fitting line that represents a collection of data.

**Once we draw this line, the model can easily predict the salary of a new employee based on how much experience they have.**

**The best fitting line above shows that a new employee with x1 years of experience should get a salary of y1.**

**How do we draw this best fitting line?**

**Let’s take a look at one employee. yi represents the salary of the employee, and yi^ represents what their salary should be according to the model. In technical terms:**

**yi is the actual observation****yi^ is the modeled observation**

**To figure out how good this line is, we take the sum of (yi - yi^)2 for all plotted values in the graph. This can be represented by this equation.**

Linear regression draws all possible lines, gets the sum of all of them and uses this information to find the line having the **minimum** value for the sum of squares. This is called the **ordinary least squares method**.

And that is all you have to know about Simple Linear Regression! Here’s a quick recap of all you need to know about predicting values with Simple Linear Regression Model:

Get your dataset

Plot them on a graph

Figure out the best fitting line

Predict stuff