Exploring Climate Data With Google Spreadsheets

Looking at large data sets

Its important that students learn not to be overwhelmed when they look at a computer screen full of numbers-- and if they don't have the skills to productively tackle a spreadsheet and extract useful information it is very easy to feel overwhelmed. But large data sets are fast becoming commonplace in every workplace and even in everyday life. Finding useful information in a page full of numbers is an important 21st century work skill, and students will need to become efficient at it if they want jobs in anything from healthcare to running a warehouse or administering a school district.

Teachers have to record and track grades, which requires that you have good data management skills-- there are lots of examples you can give students that they can easily relate to.

Luckily Excel and Google Spreadsheets give us a lot of easy tools to explore large data sets, so this isn't as difficult as it used to be. And it can even be fun-- that's the goal of this exercise, to discover ways to enjoy exploring numbers.

Why teach with spreadsheets?

Exploratory data analysis vs. hypothesis testing

When teaching science, we need to make students aware that scientists use data for two very different kinds of activities:

1. We all think of hypothesis testing when we think of science. This is a very important activity that allows us to use the results of experiments and surveys to statistically differentiate between alternative hypotheses based on evidence. This is what we generally refer to as "the scientific method," and it has been instrumental in the rapid advancement of scientific knowledge in the 20th century.

2. But scientists also do a lot of exploratory data analyses, generally to find preliminary patterns in data that help them to construct their hypotheses. This second type of science is not talked about as much, but it is the way in which we come up with the new ideas that form the basis for our experiments and surveys. After all, the hypotheses that we test have to come from somewhere!

As scientists, we often take a set of data and look for patterns, come up with ideas about what these patterns may indicate, and then go out and perform new experiments or surveys to test our ideas using the scientific method of hypothesis testing. The important point is that exploring existing data sets can lead to new insights that can open new areas of research. The data that we explore to generate new hypotheses, however, are not then used to test the hypotheses-- that would be circular. You need to collect new data, or use independently collected existing data sets to test your hypotheses.

Exploratory data skills are very useful job skills for non-scientists as well. Even if you never conduct research, you need to learn to efficiently summarize large data sets and look for useful patterns that can help make a workplace more efficient.

Public Databases

There are many existing data sets that you can use for your lessons that can be found online and downloaded for your use. These are a rich source of information and can give students experience in handling realworld data. Here are some examples of both visual mapping data and numerical data:

1. Examples of map files compatible with Google Earth and Google Maps
a. Climate and weather maps
b. Streamflow and drought maps
c. Pollutants

2. Examples of data formatted as spreadsheets and graphs 

Exploring weather data

In today's exercise we will explore existing data on longterm weather in Kansas. These data were originally obtained from the Kansas Weather Library and then reorganized into smaller date sets that are each a decade in duration.

Begin by choosing a decade and deciding how to summarize the information in a way that is easy to communicate to non-scientists. For example, you can write a paragraph, create a chart, make a simple table, etc. This is the type of science reporting that you see on the nightly weather segment on local TV-- the meteorologists have to distill down a large amount of information and complex scientific modeling, and then communicate it in an accurate but understandable fashion to a lay audience. This is your chance to see if you can do the same.

The class can take the results of their individual decade-long analyses and look at century long patterns as a way of demonstrating that data sets can be explored in a stepwise fashion to look at larger trends. This will give you a chance to discuss why it is important to make sure that the data from the decade-long data sets are all analyzed in the same fashion so that they can be combined into a larger analysis.

For example, one student may have created a graph, another may have written a summary paragraph, a student may have used the mean for an average while another used the mode. There is no single acceptable way to explore data sets, which makes it different from hypothesis testing; in exploratory data analysis, students can summarize their data several different ways to see which is most informative. Since we are not doing statistical hypothesis testing, it is perfectly acceptable to look at the data from different perspectives during an exploratory analysis (for a statistician, what this means is that exploratory analyses are not restricted by the danger of multiple testing in a statistical hypothesis test, since in exploratory analyses there are no probability values associated with it).

It is a useful exercise to have students discuss their data summaries, learn that there are a lot of different approaches that can be used, and then decide which approaches are the most useful, interesting and appealing ways to summarize the data. They can then decide on a way to standardize their data summaries so that they can be combined to create a larger analysis that looks at the century-long data set. A good exercise is to look at the different pictures you get of Kansas weather from the different decade-long data sets, since a decade is a relatively small sample size and the averages and amount of variation will be different from decade to decade for each of the measures (highs, lows, precipitation, etc.). Compare the decade-long summaries to what you see in a century-long data set and use this to discuss the difference between small samples and large samples, and between "weather" and "climate."

Today we will do an exploratory analysis of the Kansas State University Weather Data Library longterm weather record. We are not going to do any formal hypothesis testing, but rather, we will look for patterns and explore different potential lines of research.

Step 1: Choose Your Decade

Below is a table with links to Google Spreadsheets containing longterm weather data for Lawrence, Kansas, obtained from the Kansas State University Weather Data Library. Pick a decade and click on the link to go to the Google Spreadsheet with your data.

Lawrence, Kansas

1. Google Spreadsheets are essentially a simplified form of Excel. They can be downloaded into Excel by clicking File>Download As>Excel. Likewise, Excel spreadsheets can be uploaded into Google Spreadsheets.

Download the file for your decade as an Excel Spreadsheet and save it on your computer desktop.

2. Upload the Excel Spreadsheet for your decade into a Google Spreadsheet in your personal Google Account. Go to your Google Drive and click on the upload icon and select the file from your computer's desktop. The first time you upload into Google Docs/Spreadsheets you will need to turn the conversion on in order to upload it as a file that can be edited (otherwise it will be a read only file).

Step 2: Explore Your Decade

Use the tools in Google Spreadsheet to explore your decade. You will be using your exploratory analysis to engage in a dialog with other teams about who had the most interesting weather. 

You will be the one to choose what you wish to explore in your decade-long weather record and how you will present your data to your colleagues. Some possibilities include the snowiest day, the wettest year, the lowest/highest temperature, average January low temperatures, average July high temperatures...there are many ways of investigating the database and summarizing your findings. Come up with the summary statistics that you believe are the most informative and be prepared to explain your choice during the class presentation.

Explore your decade by sorting and summarizing your data. To sort a column:

You can find the sum or average of a subset of your data by highlighting the values, then looking in the lower right hand corner for the sum-- you click on the popup for the average.

The chart editor is simple to use-- and creates attractive graphs that are easy to download as jpg files and insert into Google Presentations.

Step 3: Research Your Decade

Using online resources, research out interesting weather events (or just something fun) about your decade. Were there any extreme events like droughts, floods, heat waves, cold snaps, etc. that are of historical significance? Did you discover evidence of historical events in your data explorations in Step 2? You can add information you found from your online search to your presentation. Make sure to cite your sources (including all images).

Step 4: Prepare Your Presentation

You will now create 3-4 slides in the class Google Presentation to summarize the weather in your decade. Everyone will then present their slides in a round-robin style presentation.

You will all be given a link to the class Google Presentation. Google Presenations work a lot like a simplified version of Powerpoint (and can be downloaded as Powerpoint files). 

1. Sign into your Google Account and use the instructions that you were given in the workshop to log onto the class Google Docs-Presentation. Everyone will be working online together to co-create the same slideshow, so please be respectful and becareful not to accidentally delete or change others work.

2. Summarize your research from Step 3. Create an engaging slide that will help your audience understand what is most memorable about the weather during your decade.

3. Summarize the data from your database investigation in Step 2. You can use means, ranges, charts, and other summary statistics to make it easy for your audience to understand your overview.

4. Be prepared to give a 5 minute oral presentation when your turn comes up in the slideshow. Try to develop a cogent argument based on your data and the research you have done that explains why you think your decade was the hottest or coldest or snowiest or driest...what extreme event was most noteworthy? Or was your decade simply the most boring in Kansas history?

Example Group Presentation

Here is the Google Presentation that was created by the group during the ATOMS workshop:

Wild Kansas Weather