*This is part of my notes for the Edexcel S1 exam, which I will be taking in
2017*

In statistics, we make **observations** or **measurements** about a variable. This
information is known as **data**.

There are two main types of variable, **quantitative variables** and
**qualitative variables**.

The former is used with numerical values, while the latter is used for non-numerical information.

For example, someone’s height would be a quantitative variable, however their eye colour would be qualitative.

There are also **discrete** and **continuous** variables. They’re used to
classify variables into those that can take a specific value or those that can
take any value in a range, respectively.

# Frequency table

A frequency distribution is used to represent large quantities of information. They show the values of a variable, along with how often it occurs.

For instance, it could be used to show how many people of each age there are in a room.

Age | Number of people |
---|---|

18 | 7 |

19 | 15 |

20 | 1 |

21 | 13 |

25 | 1 |

# Cumulative frequency

A problem with frequency tables is that it is not often immediately obvious what the running totalof frequencies is. For instance, with the above table, we would have to do some addition to work out the number of people aged 20 or under. A cumulative frequency table shows this by adding another column.

Age | Number of people | Cumulative frequency |
---|---|---|

18 | 7 | 7 |

19 | 15 | 22 |

20 | 1 | 23 |

21 | 13 | 36 |

25 | 1 | 37 |

# Grouped data

This is when the frequencies in a table are assosciated with a class instead of a single observation.

For example:

Age | Number of people |
---|---|

18-19 | 22 |

20-21 | 14 |

25+ | 1 |

# Measures of location

A set of data can be described as a single number. This is called a **measure of
location** and is usually called an **average**. There are three main types, the
**mean**, the **mode**, and the **median**.

## Mean

This is the sum of all observations divided by the total number of observations.

It is given by $$ \frac{\sum_{}^{}x}{n} $$

or
`$$\frac{\sum_{}^{}fx}{\sum_{}^{}f}$$`

`\(\bar{x}\)`

is used to represent the mean of a sample, while `\(\mu\)`

is used to
represent the mean of a population.

### Combining means

Combining the means of two sets is done like so: $$ \frac{n_1\bar{x}_1 + n_2\bar{x}_2}{n_1 + n_2} $$

## Mode

This is the most common variable.

## Median

This is the middle value of an ordered set of data.

# Coding

Coding is used to make numbers easier to work with. In the exam, it will usually be of this form:

$$ y = \frac{x - a}{b} $$

To find the mean of the original data, all you need to do is find the mean of the coded data and equate it to the coding used and then solve.