Introducing the Large Data Set to students

This week Will has written about the first statistics lesson, with students being introduced to the LDS for the first time.

As a team, we felt that it was crucial for students to start their work on the LDS in the first week of the course. In their first week of term they had 2 lessons – their welcome to y12 & introduction to mathematical proof and then this lesson.

Being able start using the LDS early in the course, and revisit it regularly is the best way in which students will become familiar with it, and will mean that we won’t have to devote specific curriculum time to learning it as a stand-alone topic.

So onto LDS lesson 1: We jumped in at the deep end – the objective was to use the LDS to perform some calculations as well as to learn some new maths, so I picked means and standard deviations. Planned correctly this would also allow us to use paper copies of the LDS, rather than having to get into a computer lesson so early on.

Every students started the lesson with their own copy of the LDS on their desk – and asked them to look through it and tell me what is was. For such an open question I had some very reasonable comments: “it’s got all the countries in the world on”, “everything is grouped by region”, “there are statistics about the people – births, deaths, ages, and about the country and stuff” – and I was quite pleased to get that much information back, as we could’ve had a wall of silence.

We started with measures of central tendency – introducing some new notation for the mean, and summation notation. The trick here was to give them a good enough explanation to be able answer the question, but to be vague enough that they have to use the LDS to help. Eg 1 required students to find the total before dividing by 5 countries, and Eg 2 specifically required students to look at their copy of the LDS to work out the denominator (i.e. how many countries are there in Sub-Saharan Africa?).


After I was satisfied that everyone was able to follow the examples I gave them their first task – calculating the means of all the other regions.


Students can be expected to have to calculate the mean from summary statistics, as well as from a list of data, which is why I gave students a lot of the totals (for the regions with more countries) as well as for the smaller regions. We had a bit of disagreement with a couple of the means – most specifically the population mean. Excel was giving me the mean of 19.43, whereas when the students typed  they got a smaller answer – on inspecting the LDS we discovered that some on the countries did not have data for that field, so we should have actually been dividing by 224 instead. (That is going to be something to watch out for in the future.)

Once I’d unveiled the graph featuring all the means I asked for someone to tell me something about the birth rates. One of our more confident students was quick off the mark: “Sub-Saharan Africa has a significantly higher birth rate than any other region”. Brilliant! I was hoping for a response like that, but I wanted a reason… and I wasn’t disappointed: “Sub-Saharan Africa possibly has less access to contraception that other regions”. I then asked why both European regions might have the lowest birth rate, someone suggested that “it is cultural, that in Europe families have fewer children”. It is those types of thoughts that it I want to foster as we continue to dig through the data set.

Next we moved onto standard deviations – I would not have done that in the past, combined these 2 topics into the first lesson, but I wanted to make sure that we covered something new, especially as it will allow us to discuss average and spread of various groups of data in our next lesson. Some students found the new formulae quite complicated.  We had started with a very simple example so we could talk about calculating the sum of the squares in simple terms, but it was actually only when we started pulling figures from the LDS did the students who were to that point a bit at sea, actually start to understand how to calculate standard deviations.


And that is pretty much where we left it – I handed out sheets that had the summary statistics on (for  and ) and allowed the students to choose what they wanted to look at. I expect that most of them will continue to examine the birth rates, but it will be interesting if anyone turns up to the next stats lesson having calculated with other statistics.


Their homework was to spend another 45 minutes working on the data set, bearing in mind the following questions:

  • What do you notice?
  • Can you spot any trends?
  • Can you draw any conclusions / make any inferences?

At the beginning of our next stats lesson we can discuss peoples’ findings, before starting to explore how we can use Excel to handle the data.