github twitter email rss
Stats Notes: Representation and Summary of Data: Location
Feb 23, 2017
3 minutes read

This is part of my notes for the Edexcel S1 exam, which I will be taking in 2017

In statistics, we make observations or measurements about a variable. This information is known as data.

There are two main types of variable, quantitative variables and qualitative variables.

The former is used with numerical values, while the latter is used for non-numerical information.

For example, someone’s height would be a quantitative variable, however their eye colour would be qualitative.

There are also discrete and continuous variables. They’re used to classify variables into those that can take a specific value or those that can take any value in a range, respectively.

Frequency table

A frequency distribution is used to represent large quantities of information. They show the values of a variable, along with how often it occurs.

For instance, it could be used to show how many people of each age there are in a room.

Age Number of people
18 7
19 15
20 1
21 13
25 1

Cumulative frequency

A problem with frequency tables is that it is not often immediately obvious what the running totalof frequencies is. For instance, with the above table, we would have to do some addition to work out the number of people aged 20 or under. A cumulative frequency table shows this by adding another column.

Age Number of people Cumulative frequency
18 7 7
19 15 22
20 1 23
21 13 36
25 1 37

Grouped data

This is when the frequencies in a table are assosciated with a class instead of a single observation.

For example:

Age Number of people
18-19 22
20-21 14
25+ 1

Measures of location

A set of data can be described as a single number. This is called a measure of location and is usually called an average. There are three main types, the mean, the mode, and the median.

Mean

This is the sum of all observations divided by the total number of observations.

It is given by $$ \frac{\sum_{}^{}x}{n} $$

or $$\frac{\sum_{}^{}fx}{\sum_{}^{}f}$$

\(\bar{x}\) is used to represent the mean of a sample, while \(\mu\) is used to represent the mean of a population.

Combining means

Combining the means of two sets is done like so: $$ \frac{n_1\bar{x}_1 + n_2\bar{x}_2}{n_1 + n_2} $$

Mode

This is the most common variable.

Median

This is the middle value of an ordered set of data.

Coding

Coding is used to make numbers easier to work with. In the exam, it will usually be of this form:

$$ y = \frac{x - a}{b} $$

To find the mean of the original data, all you need to do is find the mean of the coded data and equate it to the coding used and then solve.


Back to posts