Thursday, 24 December 2015

Statistical or Dynamical? Which Model To Choose

In an earlier post ‘Model This and Model That’, we looked at some of the models used in the modelling and prediction of ENSO, collated and plotted by the IRI/CPC. We learnt a little about what statistical models and what dynamical models are, and touched on some of the reasons for these different types of models. The modeller's knowledge and experience, and the basis of what the model is for, all contribute to the type of model that is built. Another large factor in deciding what type of model to build comes down to resources. These resources include man power, computer power, time and availability of parameter data. For example, temperature is the most widely available meteorological data, as well as having the longest record of observations, whereas solar radiation is harder to obtain.

You may have noticed that there are more dynamical than statistical models used in the IRI/CPC plots. To be precise, there are 17 dynamical, and 9 statistical models used. So what does this say..? Is it as simple as dynamical models are better than statistical ones because there are more? Here are some of the advantages and disadvantages of both statistical and dynamical models to give you more of an understanding in why one would be built and used over the over.

Statistical


Advantages:
  • Models are relatively quick and simple to build and run
  • Little to no knowledge of underlying physical principles is required
  • Simple analytical methods allows fairly easy model inversion

Disadvantages: 
  • Model incorporates many parameter assumptions under certain observation conditions, meaning confidence of extrapolation is hard to justify
  • The model does not enable further understanding of the physical processes


Dynamical


Advantages:
  • Can be applied to a wide range of conditions
  • Incorporates great complexity of processes via the use of numerical solutions
  • Increases understanding of physical processes

Disadvantages: 
  • Needs powerful computers to run complex models, which still can take long time to run
  • All relevant processes  and corresponding variables need to be accounted for in the model
  • Complicated to invert model due to difficulty in obtaining analytical solutions

Over the past few decades, a large proportion of models have improved in their ability to predict El Niño episodes (Guilyardi et al., 2012). These improvements are down to a combination of reasons including more advanced technology to observe and record data at a higher spatial resolution. Such improvements ultimately lead to more understanding of the physical processes of ENSO, which also enables improvements to the models to be made. It is thought that the reason behind those who haven’t significantly improved their skill is due to the additional processes that the models are now simulating. Such processes include; the carbon cycle; ecosystems; the indirect effect of aerosols; and, the interaction between stratosphere and troposphere (Guilyardi et al., 2012). However, the short term consequence of these additional processes and model complications not initially adding anything to the model, gives potential areas to explore and improve understanding of in the near future. This is where the conflict between over simplifying or over complicating a model comes into play. Simple models can be very powerful tools, but sometimes maybe they are just missing the mark when it comes to usefulness. Statistical models are usually simpler than dynamical models, as we found in the advantages and disadvantages of the model types above.

In the early 1990’s the statistical and dynamical models used for the study of ENSO showed comparable skills (Barnston et al., 1994). This was reinforced later in the 90’s by model predictions of the exceptionally strong El Niño in 1997/98. The forecasts from twelve statistical and dynamical models were studied, with results concluding that skills were again of similar levels (Landsea and Knaff 2000). None of the dynamical models conclusively performed better than the El Niño–Southern Oscillation Climatology and Persistence (ENSO–CLIPER) model, a simple statistical model, which was used as the baseline for comparison of model skill levels. Hence, it could be argued that statistical models where the preferred type due to the ease and lower associated cost of development.

A more recent study of the statistical and dynamical models used in the IRI/CPC plots from 2002 to 2011 was undertaken, with 8 statistical and 12 dynamical. The study found that despite only analysing the models over a short period of time (9 years), the skill of dynamical models have now exceeded those of statistical models, specifically for months March to May when shifts in ENSO are most likely, therefore making predictions most difficult (Barnston et al., 2012). Yet, it is also acknowledged that the short period of time in which the models were studied means that it is hard to prove the findings statistically robust, but within its limited time frame intriguing results nonetheless. Dynamical models have received greater funding than statistical ones over the past few decades, meaning that the majority of statistical models analysed have not been drastically altered in many years. This may play a part in why the dynamical models have shown to have better predictive power than their statistical equivalents. However, dynamical models are proving the potential they hold due to their capability of modelling the non linearity and rapid change of state of ENSO (Barnston et al., 2012). Without denying the value of statistical models, it seems that dynamical models will be at the forefront of modelling El Niño, especially with the continuing development in technology meaning the power of computers increase, and the associated costs decrease.

So I’ll finish by going off topic, but to wish you a very Merry Christmas! Without even doing any scientific analysis, I can (unfortunately) say that to a 5% significance level there is sufficient evidence to reject the hypothesis that it will snow on Christmas day of 2015 (if you’re in England that is). 

Source: Buzzfeed

Here’s hoping you didn’t place a bet that I advertised near the beginning of this blog... however if you did, remember, I did warn you that I wasn’t to blame if you lost! Enjoy the festivities wherever you may be, and whatever you may do, and we’ll catch up again in the New Year!



Saturday, 12 December 2015

Over To You

In the spirit of COP21 some of my fellow Environmental Modellers have been suggesting ways that we can, as individuals, contribute to the ethos of green living – Any Earth Left has been giving some great ways how we can make a positive impact on reducing our emissions as a consumer, and The Global Hot Potato has been providing some yummy environmentally friendly recipes to whet your appetite.

So, how else can I play my part I hear you ask?

Well, a team of climate scientists over at the University of Oxford have developed a novel way that you can contribute to the challenge we face of climate change. Albeit not by reducing emissions, and without even leaving the comfort of your home. Climateprediction.net is ‘the world’s largest climate modelling experiment for the 21st century’, which boasts a community of volunteers to run climate models from their home computers, via computing platform BOINC (The Berkeley Open Infrastructure for Network Computing).

As I’ve mentioned in earlier posts, the importance of modelling the environment is paramount to understanding, learning, and predicting, what has, is, and will happen in the future given certain conditions. When this comes to the climate, we have to do so in large scale, hence, the number of climate models to be run is vast. Vast enough, that even supercomputers struggle. Instead, teams in the Environmental Change Institute, the Oxford e-Research Centre, and the Atmospheric, Oceanic and Planetary Physics, departments at University of Oxford have adopted a technique called ensemble modelling which means thousands of people each run a tiny part of the climate model on their personal computers (you don’t need anything fancy, but there are a few system requirements), and then send back the results for them to be interpreted. The site assures volunteers that the greenhouse gas emission generated by leaving your computer running for longer than you might otherwise do is very small.

There are some really interesting projects currently underway including:

  • Assessing the risk of Atlantic meridional overturning circulation (AMOC) collapse in the coming century (see RAPID-RAPIT)

  • Investigating the response of rainfall, evaporation, and river run off, to changes in land use and the carbon cycle (see HYDRA)

  • What the impact of stratospheric aerosol particles and solar radiation management would be (see Geoengineering)

Over the past few months, in certain lectures, it’s often been said that even Climate Change students don’t see themselves as modellers. With this facility from the University of Oxford, now everyone can call themselves modellers, and I think even more impressively, climate modellers. 


Tuesday, 8 December 2015

Model This and Model That

Continuing from my previous post where we looked at ENSO forecasts into the spring of 2016, we shall now look at some of the different models used. First though, what is a model? Succinctly put, a model is a simplified version of a complex reality. Ok, good start, but let’s go deeper within this field and try to learn what statistical models, and dynamical models, are.

                                                                                  

Statistical Models


A statistical model allows us to infer things about a process from its observed data, and is represented through a set of variables. The model is constructed through 1 dependent variable, and at least 1 independent variable. The dependent variable (aka the response/outcome), is what is being studied and is the result of a combination of the independent (aka explanatory) variables. In a simplified case, this may mean that each independent variable has no relationship with any of the other independent variables.  Or, is more often the case, there are relationships between them, and these interactions are statistically known as correlations. Both dependent and independent variables can be observed and recorded, and what with a set of assumptions about the data, parameters (the unknown constants in the equation) can be estimated. An important part of a statistical model to be aware of is that it is non-deterministic. What this means is that some of the variables are stochastic, i.e. essentially random. Hence, a statistical model uses random variables to model the components of the process that are not currently fully understood scientifically.

Let’s take a look at 3 of the statistical models used by the IRI as shown in my previous post:

CPC MRKOV – Is the National Centres for Environmental Prediction/Climate Prediction Centre (NCEP/CPC) Markov Model. It is a linear statistical model with three multivariate empirical orthogonal functions (EOF), of observed sea surface temperature, surface wind stress, and sea level. EOF analysis is the decomposition of a signal in terms of both temporal and spatial patterns. 

CPC CCA – Is the Climate Prediction Centre Canonical Correction Analysis Model. It is a multivariate linear statistical model with predictors mean sea level pressure and sea surface temperature. Canonical correction analysis is a method to find the maximum correlation of a linear relationship between two multidimensional variables.

FSU REGR – Is the Florida State University Regression Statistical Model. It is a multiple linear regression model with predictor variables of upper ocean heat content, wind stress, and sea surface temperatures. 


Dynamical Models


A dynamical model, as with a statistical model, represents the relationship between a set of variables, however it uses functions to explain how the variables change over time given their current state. These functions which are related to their derivatives are known as differential equations, and can be thought of as the rate of change.

An extremely famous example of a non-linear dynamical model is the Lorenz model. Edward Lorenz took the Navier-Stokes (fluid dynamics) equations and simplified them to get 3 differential equations, with only 3 variables (I won’t go into the equations now, but take a look here if you wish to learn more about them). However, the point is they seemingly look very simple to solve, but plotting the solutions in 3D leads to an image such as the following:



If you were to place your finger at any point on the line in the above plot, and wanted to move say 1 position to the left (i.e. only 1 of your 3 variables in the equations change by 1 unit) it would take a lot longer than expected. You can’t jump over the white space between the lines; you have to trace your finger round the line until you reach your ‘destination’. Now, you can see that this is going to take a lot longer than you would have thought! This is a system of chaotic behaviour, and is commonly known as the butterfly effect. A small change in one variable can result in massive changes in a later state.

Now let’s take a look at 3 of the dynamical models used by the IRI as shown in my previous post:

NASA GMAO – Is the NASA Global Modelling and Assimilation Office, Goddard Earth Observing System Model (GEOS-5). It is constructed from an atmospheric model, catchment land surface model, and an ocean model, which are all coupled together by the use of the Earth System Modelling Framework.

UKMO – Is the UK Met Office General Circulation Dynamical Model. It is a coupled ocean-atmosphere GCM known as the GloSea (Global Seasonal) model. It is comprised of 3 models - an atmosphere, an ocean and a land surface model.

LDEO – Is the Lamont-Doherty Earth Observatory Model. It is an improved version of the original simple coupled ocean-atmosphere dynamical model by Zebiak and Cane, 1987.


Statistical-Dynamical Models


There also exist statistical-dynamical models, which, as is in the name, use a combination of both statistical and dynamical methods for different components. They usually take the statistical approach for parameters such as wind speed and direction, whilst using a dynamical method for modelling Newton’s Laws of Motion for energy diffusion.


Why all the models?


You may wonder why there are so many different approaches to modelling ENSO, and even within the different approaches, why so many different models exist. Well, as can be seen from looking at just a few of the different types of models above, there are numerous combinations of predictor variables are used, and all models incorporate different assumptions. Depending on the modellers’ knowledge, experience, and what their aim of modelling is, depends on what these variables and assumptions are. Hence, there exists such a variety of models. Models cannot be categorised as right or wrong, however they can be shown to be more or less predictive in comparison to other models. But even this may hold true for only a certain period of time.