Introducing the Large Data Set to students

This week Will has written about the first statistics lesson, with students being introduced to the LDS for the first time.

As a team, we felt that it was crucial for students to start their work on the LDS in the first week of the course. In their first week of term they had 2 lessons – their welcome to y12 & introduction to mathematical proof and then this lesson.

Being able start using the LDS early in the course, and revisit it regularly is the best way in which students will become familiar with it, and will mean that we won’t have to devote specific curriculum time to learning it as a stand-alone topic.

So onto LDS lesson 1: We jumped in at the deep end – the objective was to use the LDS to perform some calculations as well as to learn some new maths, so I picked means and standard deviations. Planned correctly this would also allow us to use paper copies of the LDS, rather than having to get into a computer lesson so early on.

Every students started the lesson with their own copy of the LDS on their desk – and asked them to look through it and tell me what is was. For such an open question I had some very reasonable comments: “it’s got all the countries in the world on”, “everything is grouped by region”, “there are statistics about the people – births, deaths, ages, and about the country and stuff” – and I was quite pleased to get that much information back, as we could’ve had a wall of silence.

We started with measures of central tendency – introducing some new notation for the mean, and summation notation. The trick here was to give them a good enough explanation to be able answer the question, but to be vague enough that they have to use the LDS to help. Eg 1 required students to find the total before dividing by 5 countries, and Eg 2 specifically required students to look at their copy of the LDS to work out the denominator (i.e. how many countries are there in Sub-Saharan Africa?).

LDS1

After I was satisfied that everyone was able to follow the examples I gave them their first task – calculating the means of all the other regions.

LDS2

Students can be expected to have to calculate the mean from summary statistics, as well as from a list of data, which is why I gave students a lot of the totals (for the regions with more countries) as well as for the smaller regions. We had a bit of disagreement with a couple of the means – most specifically the population mean. Excel was giving me the mean of 19.43, whereas when the students typed  they got a smaller answer – on inspecting the LDS we discovered that some on the countries did not have data for that field, so we should have actually been dividing by 224 instead. (That is going to be something to watch out for in the future.)

Once I’d unveiled the graph featuring all the means I asked for someone to tell me something about the birth rates. One of our more confident students was quick off the mark: “Sub-Saharan Africa has a significantly higher birth rate than any other region”. Brilliant! I was hoping for a response like that, but I wanted a reason… and I wasn’t disappointed: “Sub-Saharan Africa possibly has less access to contraception that other regions”. I then asked why both European regions might have the lowest birth rate, someone suggested that “it is cultural, that in Europe families have fewer children”. It is those types of thoughts that it I want to foster as we continue to dig through the data set.

Next we moved onto standard deviations – I would not have done that in the past, combined these 2 topics into the first lesson, but I wanted to make sure that we covered something new, especially as it will allow us to discuss average and spread of various groups of data in our next lesson. Some students found the new formulae quite complicated.  We had started with a very simple example so we could talk about calculating the sum of the squares in simple terms, but it was actually only when we started pulling figures from the LDS did the students who were to that point a bit at sea, actually start to understand how to calculate standard deviations.

LDS3

And that is pretty much where we left it – I handed out sheets that had the summary statistics on (for  and ) and allowed the students to choose what they wanted to look at. I expect that most of them will continue to examine the birth rates, but it will be interesting if anyone turns up to the next stats lesson having calculated with other statistics.

LDS4

Their homework was to spend another 45 minutes working on the data set, bearing in mind the following questions:

  • What do you notice?
  • Can you spot any trends?
  • Can you draw any conclusions / make any inferences?

At the beginning of our next stats lesson we can discuss peoples’ findings, before starting to explore how we can use Excel to handle the data.

Advertisements

Will’s Thoughts on the Large Data Set

Will Davies has been working with us on the scheme of work for the new A-level. Over the last few years  he has predominantly taught the statistics content for the A-level courses. Here are his thoughts on the large data set:

“When the new specifications were announced the introduction of these “large data sets” (LDS) left me sceptical, and unsure of exactly how we were going to work with them. With time came a lot more clarity; actually being able to pick over the data sets that were released with the sample assessment material meant we could start to see how they were going to be assessed, and how they might fit into our teaching.

And I have come to this conclusion: the LDS is my joint-favourite thing about the new A-Level – the other aspect being that we’ve been able to tear up the old order of topics and build a curriculum that we feel teaches maths in the most logical order and in the best manner. Being able to combine the applied topics in with the pure topics they depend on is key: e.g. binomial theorem and binomial expansion, as well as teaching variable acceleration immediately after calculus is taught.

I have read on Twitter a lot of negativity about the LDS, and I am unsure why. My instinct says that the reason is because the LDS is being perceived as a separate topic that needs to be taught in addition to other content (that we’re already unsure whether we can fit it all in satisfactorily). As a department, from very early in the process we realised that this shouldn’t be the case – the LDS is not a separate topic, it is instead the tool that you use to teach all the data-handing parts of the course.

Every time you do an example – it comes from the LDS.

Every time you set an exercise in class – it comes from the LDS.

Every time you set a homework – it comes from the LDS.

The more the students immerse themselves in the LDS the more familiar they become with it. Homeworks can be to do some calculations or create some charts (and email them to use in advance where appropriate) then as a group we can discuss next lesson. My other big idea for embedding the LDS into our lessons is to have at least once per week a Show-me / Tell-me starter (regardless of whether the lesson is going to be on stats or not). Students will be encouraged to do a little investigation themselves, then getting the class to discuss together discuss the potential causes (e.g. our outliers). This will be way in which we can as a class build up a bank of interesting observations of our LDS, just like the observation we made when we were examining the MEI sample assessment material.

MEIThis question from the MEI sample A Level assessment – we were drawn to the very long tail at the bottom of the Sub-Saharan Africa box-plot, and wondered which countries were causing this. Looking at the LDS we quickly came up with 3 countries with very low birth rates: Saint Helena, Mauritius, Seychelles – all island nations. Which feels like a nice fact – that the island nations of Sub Saharan Africa have significantly lower birth rates than other countries in that region.

This brings me onto our choice of exam board – the data sets are not provided in the exam, yet students are expected to be able to use some very specific knowledge of them in order to gain some marks in their exams. With the large LDSs (like Edexcel’s weather data) you could study that for a couple of years and maybe still have examined at the key pieces of data.

So, MEI has the smallest large data set (covering information about the 237 countries of the world), and that brings its own advantages – it is printable. The bulk of it fits on 3 A3 pages, and I have created a single A4 page that expands on the Dependency status of relevant countries. So now all our students have a hard copy of their data set to use – meaning that we don’t always have to be in an IT room when we’re working on it. The other major advantage is that on presenting students with the data set they immediately felt that because it actually wasn’t “too big” that knowing it well was going to be achievable.20170908_153408

When it comes to using technology there are various ways in which we plan to incorporate this with the LDS. The ClassWiz calculator is clearly going to be key as, as is learning a bit about Excel. Filters, sorting and a deep look into the inbuilt statistical formulae will all need to take place – not just for the sake of the LDS, but Excel skills are incredibly useful. We’re also going to look to support/enhance teaching & learning by graphing some of the data in Geogebra and Gnumeric. (Gnumeric is apparently a very good tool for creating box-plots although I am yet to explore that any further). I have also built in Excel a sampler tool that will create random samples from the LDs, although it still needs perfecting. When it is complete I will share it here.

When it comes to assessments, starting work on the LDS from lesson 2 means we will be able to include it in assessments from half term 1 – to start with we will make sure we write the assessments so we know that students have seen (in one form or other) what we will be asking about, then we can progressively choose more and more obscure statistics to include. Finally we plan to set students extended projects to do. These like likely asked them to choose some aspect of the data set, be it a group of regions or a groups of fields, calculate some statistics, create some charts, draw some conclusions, and to write up a little report on their findings.

Identifying the smallest data set, and revisiting it weekly for 2 years will give students the best chance of becoming as familiar as they can be with the LDS (aside from dedicating too much curriculum time to it). I suppose the bottom line is that we feel that using the LDS to teach all data topics is going to be such an improvement on using (essentially random) examples that are using a similar approach with our GCSE statistics. In lessons our year 9s and 10s are currently populating their own data-set (containing information about themselves). They have really enjoyed the data collection (although I did receive a complaint from the English classroom underneath the standing long-jump) – now to analyse it!”

Thoughts on Large Data Sets

One of the thoughts that came out of my most recent meeting with Simon was that the choice of exam board will be influenced by the large data set. I had previously been of the opinion that I could leave the choice until January 2018, seeing if any more specimen/mock papers became available and analysing question types. However this would mean not spending as long familiarising students with the specific large data set for whichever exam board we choose. As a result of this I have downloaded the data sets for AQA, Edexcel and both specifications of OCR. I should point out that I am not a statistician, I have taught S1 once and try to avoid it if I can!

I have started to look at the data sets to see which is most useable, and which students will be able to best gain insight into for reproduction in their exams. We want to be revisiting the data constantly, so that students are really familiar with it. This means that portability is important as we will not always be able to access computer facilities.

AQA – Purchased quantities of household food & drink by Government Office Region and Country

The data given is split into 10 regions (under separate tabs), with the average amounts of various foods and drinks per person per week. There is also a tab with averages for the whole of England. Having spent some time in Excel playing around with the data it is possible to fit each region onto a single sheet of A3 paper (total of 11 sheets).

AQA 1Looking at the questions in the specimen paper, students are expected to be able to recall information about the average amounts of certain food groups from different regions. This is something that could only be known by someone who has done extensive work with the data set before, and given the sheer scale of the data is unlikely to be something that you could repeat for all of the different food groups.

AQA2Later questions involving the data set give a small excerpt and ask questions about these. These are much more accessible to students who do not have as much familiarity, but will be easier for those who are aware of the context. For example there is question about the total amount of confectionery purchased, which does not state that it is based on averages.

Total Marks based on Large Data Set in AS Spec Paper: 9 (Out of 80 on paper 2, 160 across the AS)

OCR A – Method of Travel / Age Structure

The OCR A specification looks at the methods of travel to work, broken down into regions, taken from the national census in 2001 and 2011 (separated into two sheets). There is also data about the ages of the residents of the regions (2 further separate sheets). Each tab can be set to cover three A3 pages, so a total of 12 will be needed for a portable copy.

OCRIn the question pictured here it would be advantageous to be familiar with the data set, particularly for part (ii), as there are different codes for the authorities based on their type. If you knew this then you would know how to separate the authorities further and would merely have to explain this.

For the other question based on the data set (not pictured), a summary table has been created. It is not as obvious what the benefits to knowing that data are here, although general familiarity and having looked at possible summary statistics will help.

Total Marks based on Large Data Set in AS Spec Paper: 8 (Out of 75 on paper 1 and 150 across the AS)

OCR B (MEI) – Population data and Olympic success

The first thing to note here is that the MEI specification (OCR B) has taken a very different position to the other boards. There will be three different data sets that will be used in rotation. The data sets that will be used for ‘live’ specifications are not available yet.

MEIThe data set that is available for the specimen papers is far less ‘large’ than the others, reducing to two A3 sheets. The question included here really grabbed me as being interesting – what were the outliers in Sub-Saharan Africa? On inspection, the data that stood out was that from islands, rather than countries on the continent.

This data set seems much more manageable than the others, and over two years I would expect students to be able to become very familiar with it.

Total Marks based on Large Data Set in AS Spec Paper: 7 (Out of 70 on paper 2 and 140 across the AS)

Edexcel – Weather Data

Edexcel’s weather data consists of 5 weather stations in the UK and 3 from abroad, with readings from both 1987 and 2015. I have been able to fit the data for each station, for a single year, on one A3 sheet (total 16 sheets).

EdexcThe questions based on this data set again seemed to not require much detailed knowledge of the readings. In the question shown here it is only the fact that there is one reading per day that will help with part (b).

Of course, as Edexcel has not been accredited yet, this may change.

Total Marks based on Large Data Set in AS Spec Paper: 11 (Out of 60 on paper 2 and 160 across the AS)

Summary

While the use of the data set will only form part of my decision on which exam board to use, I have found the process of sifting through the data sets, and the questions that relate to them, extremely useful. It has also shown me the benefits of this approach. In starting to look at the data sets it is already noticeable how the data is starting to feel familiar. I think that this will develop much more ownership of the data and make structuring easier. Now students know they are expected to know the data set, they are more likely to see the value in using it as part of exercises.