A few Interpolation
methods
(Collected and organized from ArcMap Help
files)
Introduction
Surface
interpolation is any formal technique that uses values at sampled locations to
predict values at unsampled locations.
The values may describe any
quantitative geographic phenomenon. Common examples include elevation,
rainfall, ozone concentration, temperature, and soil chemistry.
Interpolation methods fall
into two broad groups. One is a group of deterministic interpolators. This
group makes predictions from mathematical formulas that form weighted averages
of nearby known values. Different methods use different ways to form the
weighted averages. This group includes Inverse Distance Weighted, Global and
Local polynomials, and Radial Basis Functions.
The other group uses
weighted averages as well, but also probability models to make predictions.
This group includes Kriging and all of its different submethods, including Universal and Indicator Kriging. Because these methods use probability calculations
they are called stochastic interpolators.
All methods use the idea of
a prediction search neighborhood, where you look at the dozen or so known
values that are nearest to the prediction location and discard the rest of the
data. This is done for each prediction location, so all the data are used when
making an interpolated surface.
Inverse Distance Weighted
Inverse
Distance Weighted (IDW) interpolation implements a
basic law of geography—things that are close to one another are more alike than
things that are far apart.
To predict a value for any
unmeasured location, IDW uses the measured values
surrounding the prediction location. Those measured values closest to the
prediction location have more influence on the predicted value than those that
are farther away (hence the name “inverse distance weighted”). Which values are
included in the calculation can be determined by specifying and customizing the
search neighborhood (read more), which is the
region of the map around a selected point, in which data points are considered
for the extrapolation.
IDW assumes that each measured point
has some local influence that diminishes with distance. In the figure below,
you can see three different curves that show how fast the influence of a point
decays with distance from the prediction location.

In
IDW, the predictive influence (weight) of a measured value
depends on its distance from the prediction location. The strength of the
dependency can be adjusted.
In the blue
curve, all locations get the same weight, regardless of how far they are from
the prediction location. In the green curve, there is a mild decrease in a
point’s influence as it gets farther from the prediction location. In the red
curve, there is the most dramatic decrease in a point’s influence as it gets
farther from the prediction location.
Notice that as the distance
approaches zero, the relative weight approaches one. This means that if one
measured point is very close to the prediction location, it will receive almost
all of the weight. Thus, IDW is an "exact"
interpolator, meaning that the predictions will be exactly equal to the data
value when predictions occur at locations where data have already been
collected
Global polynomial
Global
polynomial interpolation fits a smooth surface to the sampled data points. In
contrast to IDW, it does not use local information.
If you are familiar with regression, global polynomial interpolation fits a
polynomial regression to the x- and y-coordinates.
Suppose that you collected
the elevation data in the following figure.
The
black points are measured elevation values.
A first-order
polynomial fits a rigid plane to the data. Visualize fitting a flat sheet of
paper to the elevation points. Of course, the elevation values will include
lots of little dips and bumps besides the general trend seen in the figure
above. The flat surface of a global polynomial smoothes out
all of the little bumps. Because the surface is rigid, it will not pass
exactly through the sampled data points. This means that the global polynomial
is not an exact interpolator; rather, it smooths over
fine-scale details.
A first order global polynomial surface in cross-section. The surface (red line)
captures coarse-scale pattern in the data. It does not pass through the sampled
data (green points).
A flat
piece of paper will not represent a landscape with a valley. In that case, you
can choose a second-order polynomial that lets you “bend” the piece of paper
once in each direction.
A
second-order polynomial allows a single bend in the surface.
Likewise, a
third-order polynomial allows two bends and so forth. You can choose up to a
tenth-order polynomial in the Geostatistical Analyst.
Global Polynomial
interpolation is the only method in Geostatistical
Analyst that does not use a search neighborhood. If you add the idea of a
search neighborhood to Global Polynomial interpolation, you get Local
Polynomial interpolation, which we will talk about next.
Local polynomial
As
you just saw, global polynomial interpolation creates a surface from a single
polynomial formula. Local polynomial interpolation creates a surface from many
different formulas, each of which is optimized for a neighborhood.
The neighborhood shape,
maximum and minimum number of points, and sector
configuration can be specified. In addition, as with IDW,
the sample points in a neighborhood can be weighted by their distance from the
prediction location. Thus, local polynomial interpolation produces surfaces
that better account for local variation.
A first-order local
polynomial fits a single plane through the data points in the search
neighborhood, but keeps only the fitted value at the prediction location. The
neighborhood then slides over to the next prediction location (each
neighborhood thus largely overlaps the ones around it) and the process is
repeated. In each case, only the value at the prediction location is kept.
A second-order local
polynomial fits a surface with a bend in it to each search neighborhood,
a third-order local polynomial fits a surface with two bends to each
neighborhood, and so on. Local polynomials are more flexible than global ones.
For example, consider the case of a landscape that slopes, levels out, and then
slopes again.
A
different plane is fitted to each of the neighborhoods (blue outlines) that are
centered on the prediction locations (yellow points).
A single
global polynomial will not fit this landscape very well. Local polynomial
interpolation, however, can fit a different plane to each neighborhood centered
on a prediction location. As the interpolator considers each location in turn,
the neighborhoods overlap. The value used for each prediction is that of the
fitted polynomial at the center of the search neighborhood.
Although it is more
flexible than global polynomial interpolation, local polynomial interpolation
is not an exact interpolator like IDW.
Radial basis functions
You can
think of the surface created by radial basis functions as a rubber membrane
that is fitted to each of the measured data points while minimizing the total
curvature of the surface. Because the surface must pass through each sampled
point, radial basis functions are exact interpolators.
Geostatistical Analyst uses five radial basis
functions. They are similar, but create slightly different surfaces because
they use different math to fit the surface to the sample points.

Interpolation
using radial basis functions is shown by the purple surface; think of it as a
fairly stiff rubber sheet that bends and folds to fit exactly to the sample
data points.
Kriging
Kriging
presents a different way to think about prediction than do the deterministic
interpolators. In Kriging, a predicted value depends
on two factors: a trend and an additional element of variability. This is an
intuitive idea with plenty of analogies in the real world. For instance, if you
go from the ocean to the top of a mountain, you have an upward trend in
elevation. However, there is likely to be variation on the way—you will go both
up and down when crossing valleys, streams, knobs and other features.
In Kriging,
the trend part of a prediction is called the trend. The fluctuation part is
called spatially-autocorrelated (what’s autocorrelation?)
random error. "Error" doesn't mean a mistake—it just means a
fluctuation from the trend. "Random" means that the fluctuation
(error) away from the trend is not known in advance—it could be up or down in
elevation, it could be above or below the average climb of the stock market.
"Spatially-autocorrelated" means that, while the fluctuations are not
known exactly in advance, they have tendencies to be above the average or below
the average together whenever they are in close proximity. This is positive
spatial autocorrelation. It is also possible to have negative spatial
correlation, where if one site is above the average, a nearby site tends to be
below the average. Two assumptions are made about the spatially-autocorrelated
random error. The first assumption is that it is 0 on average. In other words,
some fluctuations will be on one side of the trend and some will be on the
other side, but the differences, on average, will cancel each other out. The
second assumption is that the autocorrelation of the error is purely spatial;
it depends only on distance and not on any other property (such as position) of
a location. This assumption is technically known as "stationarity."
Ordinary Kriging is done when one assumes there is no trend in the
data, or, if there is one, it is weak enough that you can ignore it. Assuming
that there is no trend in the data is mathematically equivalent to assuming
that the data have a constant mean value. The points that make up the
interpolated surface are the mean of all points in the search neighborhoods.
Universal Kriging assumes there is a trend in the data, but the terms
of the trend function are not known in advance. The data values are thought of
as random errors that fluctuate around the unknown trend. The random errors are
autocorrelated, meaning they tend to be above or below the trend in a way
similar to their neighbors. The points that make up the interpolated surface are the mean of all
points in the search neighborhoods, plus the trend.
Using a pre-known predicted
threshold value to predict the probability that points are above or below some
threshold value, is called Indicator Kriging.
Krigging also works with
more than one variable, considering a different trend for each variable. This
is called Cokriging – predicting a surface based on a
probability model using more than one predictor. So, in addition to
autocorrelation for the errors you have cross-correlation between the errors
and for the two variable types. Cokriging can be
applied all the forms of Kriging - Ordinary Cokriging, Universal Cokriging,
etc.
EXTRAS
We usually think of
correlation as the tendency for two types of variables to be related. For
example, the stock market tends to go up when interest rates go down, so we say
that they are negatively correlated. However, we can also say that the stock
market is positively autocorrelated, or correlated with itself.
In the stock market, two closing values will tend to be more alike if they are
one day apart than if they are one year apart. Specifically, we can call this
temporal autocorrelation, because it depends on a time
relationship (nearness to itself in time tends to mean nearness to itself in
value). Spatial autocorrelation is the same idea applied to distance. It is a
statistical formulation of the first law of geography—things close together in
space tend to be more alike than things far apart.
A basic law
of geography says that things that are close to one another are more alike than
things that are farther apart. As sampled locations get farther from the
prediction location, their influence on the prediction location decreases. To
speed up interpolation calculations, we can ignore those far-off sampled
points. Also, if they are located in an area that is very different from the
prediction location, their influence, however slight, would be undesirable.
It is common practice to
specify a search neighborhood to limit the number of measured values that are
used to predict an unknown value. The shape of the neighborhood defines the
search boundaries. You can establish other parameters as well, placing further
restrictions on which locations within the neighborhood will be used.
In the following image,
five measured points will be used to predict a value for the unmeasured yellow
point.
The
five measured points are shown in red. The search neighborhood is outlined in blue.
The black points are ignored—we have decided they are too far away to matter.
You can
change the shape of the search neighborhood. If there are no directional
influences in the data, you want to give equal weight to sample points
regardless of their direction from the prediction location. This means that you
probably want your neighborhood to be a circle. On the other hand, if there is
directional influence in your data (such as might be caused by water draining
downhill), then you may want to make an ellipse with the major axis running
uphill/downhill.
If
water flow is relevant to the analysis, then points uphill from the prediction
location may have more influence than points perpendicular to the drainage—even
when they are farther away. An ellipse models this situation better than a
circle.
Once a
shape is specified, you can restrict which sample points within the
neighborhood are used. You do this by specifying the maximum and minimum
numbers of points to use and by dividing the neighborhood into sectors. If the
neighborhood is sectored, then the maximum and minimum constraints are applied
to each part.
The
neighborhood is divided into four sectors and a minimum of one datum per sector
has been specified. For one of the sectors, the search must expand beyond the
neighborhood to find a datum.
Searching for neighbors by sector
When
you use a sectored neighborhood and set the Neighbors to Include to 5, Geostatistical Analyst tries to use five data points from
every sector. It must use at least two (because of the Include at Least
setting). Let’s see how this works in the present case.
|
|
|
|
In the sector
labeled 1, the five nearest points (corrected for the geometric distortion
created by the ellipse) are colored brown and yellow. One point in the
neighborhood has not been used because you only asked for five.
In Sector 2, all four
points are used. If there were a fifth point, it would also be used.
Sector 3 only has one
point, so the search is extended beyond the neighborhood to the nearest point
in the sector (the green one). At least two points must be used.
In Sector 4, the two points
at the boundary of the neighborhood are used.
Interpreting the data point colors
The rate of decay of
influence is reflected in the data point colors. In the left-hand graphic,
where the power value is 3.8, there is one red point—one point with an
influence of more than 10%. Most of the others are green—their influence is 3%
or less. (Ten of the fifteen points used to predict the test location have an
influence of 3% or less, and six of these have an influence of just 1%.) The influence of the single nearest point to the test
location is quite large—at least 57%.
In the right-hand graphic,
where the power value is 1.2, three points have an influence of more than 10%,
and nine of the ten others are between 5% and 10%. The influence is much more
evenly distributed among the sample points.
|
|
|
|