A few Interpolation methods

(Collected and organized from ArcMap Help files)

 

Introduction

Surface interpolation is any formal technique that uses values at sampled locations to predict values at unsampled locations.

The values may describe any quantitative geographic phenomenon. Common examples include elevation, rainfall, ozone concentration, temperature, and soil chemistry.

Interpolation methods fall into two broad groups. One is a group of deterministic interpolators. This group makes predictions from mathematical formulas that form weighted averages of nearby known values. Different methods use different ways to form the weighted averages. This group includes Inverse Distance Weighted, Global and Local polynomials, and Radial Basis Functions.

The other group uses weighted averages as well, but also probability models to make predictions. This group includes Kriging and all of its different submethods, including Universal and Indicator Kriging. Because these methods use probability calculations they are called stochastic interpolators.

All methods use the idea of a prediction search neighborhood, where you look at the dozen or so known values that are nearest to the prediction location and discard the rest of the data. This is done for each prediction location, so all the data are used when making an interpolated surface.


Inverse Distance Weighted

 Inverse Distance Weighted (IDW) interpolation implements a basic law of geography—things that are close to one another are more alike than things that are far apart.

To predict a value for any unmeasured location, IDW uses the measured values surrounding the prediction location. Those measured values closest to the prediction location have more influence on the predicted value than those that are farther away (hence the name “inverse distance weighted”). Which values are included in the calculation can be determined by specifying and customizing the search neighborhood (read more), which is the region of the map around a selected point, in which data points are considered for the extrapolation.

IDW assumes that each measured point has some local influence that diminishes with distance. In the figure below, you can see three different curves that show how fast the influence of a point decays with distance from the prediction location.

In IDW, the predictive influence (weight) of a measured value depends on its distance from the prediction location. The strength of the dependency can be adjusted.

 

In the blue curve, all locations get the same weight, regardless of how far they are from the prediction location. In the green curve, there is a mild decrease in a point’s influence as it gets farther from the prediction location. In the red curve, there is the most dramatic decrease in a point’s influence as it gets farther from the prediction location.

Notice that as the distance approaches zero, the relative weight approaches one. This means that if one measured point is very close to the prediction location, it will receive almost all of the weight. Thus, IDW is an "exact" interpolator, meaning that the predictions will be exactly equal to the data value when predictions occur at locations where data have already been collected

 

Global polynomial

Global polynomial interpolation fits a smooth surface to the sampled data points. In contrast to IDW, it does not use local information. If you are familiar with regression, global polynomial interpolation fits a polynomial regression to the x- and y-coordinates.

Suppose that you collected the elevation data in the following figure.

Click to enlarge

The black points are measured elevation values.

 

A first-order polynomial fits a rigid plane to the data. Visualize fitting a flat sheet of paper to the elevation points. Of course, the elevation values will include lots of little dips and bumps besides the general trend seen in the figure above. The flat surface of a global polynomial smoothes out all of the little bumps. Because the surface is rigid, it will not pass exactly through the sampled data points. This means that the global polynomial is not an exact interpolator; rather, it smooths over fine-scale details.

 

Click to enlarge

A first order global polynomial surface in cross-section. The surface (red line) captures coarse-scale pattern in the data. It does not pass through the sampled data (green points).

 

A flat piece of paper will not represent a landscape with a valley. In that case, you can choose a second-order polynomial that lets you “bend” the piece of paper once in each direction.

Click to enlarge

A second-order polynomial allows a single bend in the surface.

 

Likewise, a third-order polynomial allows two bends and so forth. You can choose up to a tenth-order polynomial in the Geostatistical Analyst.

Global Polynomial interpolation is the only method in Geostatistical Analyst that does not use a search neighborhood. If you add the idea of a search neighborhood to Global Polynomial interpolation, you get Local Polynomial interpolation, which we will talk about next.

Local polynomial

 As you just saw, global polynomial interpolation creates a surface from a single polynomial formula. Local polynomial interpolation creates a surface from many different formulas, each of which is optimized for a neighborhood.

The neighborhood shape, maximum and minimum number of points, and sector configuration can be specified. In addition, as with IDW, the sample points in a neighborhood can be weighted by their distance from the prediction location. Thus, local polynomial interpolation produces surfaces that better account for local variation.

A first-order local polynomial fits a single plane through the data points in the search neighborhood, but keeps only the fitted value at the prediction location. The neighborhood then slides over to the next prediction location (each neighborhood thus largely overlaps the ones around it) and the process is repeated. In each case, only the value at the prediction location is kept.

A second-order local polynomial fits a surface with a bend in it to each search neighborhood, a third-order local polynomial fits a surface with two bends to each neighborhood, and so on. Local polynomials are more flexible than global ones. For example, consider the case of a landscape that slopes, levels out, and then slopes again.

Click to enlarge

A different plane is fitted to each of the neighborhoods (blue outlines) that are centered on the prediction locations (yellow points).

 

A single global polynomial will not fit this landscape very well. Local polynomial interpolation, however, can fit a different plane to each neighborhood centered on a prediction location. As the interpolator considers each location in turn, the neighborhoods overlap. The value used for each prediction is that of the fitted polynomial at the center of the search neighborhood.

Although it is more flexible than global polynomial interpolation, local polynomial interpolation is not an exact interpolator like IDW.

Radial basis functions

You can think of the surface created by radial basis functions as a rubber membrane that is fitted to each of the measured data points while minimizing the total curvature of the surface. Because the surface must pass through each sampled point, radial basis functions are exact interpolators.

Geostatistical Analyst uses five radial basis functions. They are similar, but create slightly different surfaces because they use different math to fit the surface to the sample points.

Interpolation using radial basis functions is shown by the purple surface; think of it as a fairly stiff rubber sheet that bends and folds to fit exactly to the sample data points.

 

Kriging

Kriging presents a different way to think about prediction than do the deterministic interpolators. In Kriging, a predicted value depends on two factors: a trend and an additional element of variability. This is an intuitive idea with plenty of analogies in the real world. For instance, if you go from the ocean to the top of a mountain, you have an upward trend in elevation. However, there is likely to be variation on the way—you will go both up and down when crossing valleys, streams, knobs and other features.

In Kriging, the trend part of a prediction is called the trend. The fluctuation part is called spatially-autocorrelated (what’s autocorrelation?) random error. "Error" doesn't mean a mistake—it just means a fluctuation from the trend. "Random" means that the fluctuation (error) away from the trend is not known in advance—it could be up or down in elevation, it could be above or below the average climb of the stock market. "Spatially-autocorrelated" means that, while the fluctuations are not known exactly in advance, they have tendencies to be above the average or below the average together whenever they are in close proximity. This is positive spatial autocorrelation. It is also possible to have negative spatial correlation, where if one site is above the average, a nearby site tends to be below the average. Two assumptions are made about the spatially-autocorrelated random error. The first assumption is that it is 0 on average. In other words, some fluctuations will be on one side of the trend and some will be on the other side, but the differences, on average, will cancel each other out. The second assumption is that the autocorrelation of the error is purely spatial; it depends only on distance and not on any other property (such as position) of a location. This assumption is technically known as "stationarity."

Ordinary Kriging is done when one assumes there is no trend in the data, or, if there is one, it is weak enough that you can ignore it. Assuming that there is no trend in the data is mathematically equivalent to assuming that the data have a constant mean value. The points that make up the interpolated surface are the mean of all points in the search neighborhoods.

Universal Kriging assumes there is a trend in the data, but the terms of the trend function are not known in advance. The data values are thought of as random errors that fluctuate around the unknown trend. The random errors are autocorrelated, meaning they tend to be above or below the trend in a way similar to their neighbors. The points that make up the interpolated surface are the mean of all points in the search neighborhoods, plus the trend.

Using a pre-known predicted threshold value to predict the probability that points are above or below some threshold value, is called Indicator Kriging.

Krigging also works with more than one variable, considering a different trend for each variable. This is called Cokriging – predicting a surface based on a probability model using more than one predictor. So, in addition to autocorrelation for the errors you have cross-correlation between the errors and for the two variable types. Cokriging can be applied all the forms of Kriging - Ordinary Cokriging, Universal Cokriging, etc.

 

EXTRAS

 

What’s autocorrelation?

We usually think of correlation as the tendency for two types of variables to be related. For example, the stock market tends to go up when interest rates go down, so we say that they are negatively correlated. However, we can also say that the stock market is positively autocorrelated, or correlated with itself. In the stock market, two closing values will tend to be more alike if they are one day apart than if they are one year apart. Specifically, we can call this temporal autocorrelation, because it depends on a time relationship (nearness to itself in time tends to mean nearness to itself in value). Spatial autocorrelation is the same idea applied to distance. It is a statistical formulation of the first law of geography—things close together in space tend to be more alike than things far apart.

 

The search neighborhood

A basic law of geography says that things that are close to one another are more alike than things that are farther apart. As sampled locations get farther from the prediction location, their influence on the prediction location decreases. To speed up interpolation calculations, we can ignore those far-off sampled points. Also, if they are located in an area that is very different from the prediction location, their influence, however slight, would be undesirable.

It is common practice to specify a search neighborhood to limit the number of measured values that are used to predict an unknown value. The shape of the neighborhood defines the search boundaries. You can establish other parameters as well, placing further restrictions on which locations within the neighborhood will be used.

In the following image, five measured points will be used to predict a value for the unmeasured yellow point.

Click to enlarge

The five measured points are shown in red. The search neighborhood is outlined in blue. The black points are ignored—we have decided they are too far away to matter.

 

You can change the shape of the search neighborhood. If there are no directional influences in the data, you want to give equal weight to sample points regardless of their direction from the prediction location. This means that you probably want your neighborhood to be a circle. On the other hand, if there is directional influence in your data (such as might be caused by water draining downhill), then you may want to make an ellipse with the major axis running uphill/downhill.

Click to enlarge

If water flow is relevant to the analysis, then points uphill from the prediction location may have more influence than points perpendicular to the drainage—even when they are farther away. An ellipse models this situation better than a circle.

 

Once a shape is specified, you can restrict which sample points within the neighborhood are used. You do this by specifying the maximum and minimum numbers of points to use and by dividing the neighborhood into sectors. If the neighborhood is sectored, then the maximum and minimum constraints are applied to each part.

Click to enlarge

The neighborhood is divided into four sectors and a minimum of one datum per sector has been specified. For one of the sectors, the search must expand beyond the neighborhood to find a datum.

 

Searching for neighbors by sector

When you use a sectored neighborhood and set the Neighbors to Include to 5, Geostatistical Analyst tries to use five data points from every sector. It must use at least two (because of the Include at Least setting). Let’s see how this works in the present case.

 

 

In the sector labeled 1, the five nearest points (corrected for the geometric distortion created by the ellipse) are colored brown and yellow. One point in the neighborhood has not been used because you only asked for five.

In Sector 2, all four points are used. If there were a fifth point, it would also be used.

Sector 3 only has one point, so the search is extended beyond the neighborhood to the nearest point in the sector (the green one). At least two points must be used.

In Sector 4, the two points at the boundary of the neighborhood are used.

Interpreting the data point colors

The rate of decay of influence is reflected in the data point colors. In the left-hand graphic, where the power value is 3.8, there is one red point—one point with an influence of more than 10%. Most of the others are green—their influence is 3% or less. (Ten of the fifteen points used to predict the test location have an influence of 3% or less, and six of these have an influence of just 1%.) The influence of the single nearest point to the test location is quite large—at least 57%.

In the right-hand graphic, where the power value is 1.2, three points have an influence of more than 10%, and nine of the ten others are between 5% and 10%. The influence is much more evenly distributed among the sample points.