gmtregress(1) GMT gmtregress(1)
NAME
gmtregress - Linear regression of 1-D data sets
SYNOPSIS
gmtregress [ table ] [ -Amin/max/inc ] [ -Clevel ] [ -Ex|y|o|r ] [
-Fflags ] [ -N1|2|r|w ] [ -S[r] ] [ -Tmin/max/inc | -Tn ] [
-W[w][x][y][r] ] [ -V[level] ] [ -aflags ] [ -bbinary ] [ -dnodata ] [
-eregexp ] [ -ggaps ] [ -hheaders ] [ -iflags ] [ -oflags ]
Note: No space is allowed between the option flag and the associated
arguments.
DESCRIPTION
gmtregress reads one or more data tables [or stdin] and determines the
best linear regression model y = a + b* x for each segment using the
chosen parameters. The user may specify which data and model compo-
nents should be reported. By default, the model will be evaluated at
the input points, but alternatively you can specify an equidistant
range over which to evaluate the model, or turn off evaluation com-
pletely. Instead of determining the best fit we can perform a scan of
all possible regression lines (for a range of slope angles) and examine
how the chosen misfit measure varies with slope. This is particularly
useful when analyzing data with many outliers. Note: If you actually
need to work with log10 of x or y you can accomplish that transforma-
tion during read by using the -i option.
REQUIRED ARGUMENTS
None
OPTIONAL ARGUMENTS
table One or more ASCII (or binary, see -bi[ncols][type]) data table
file(s) holding a number of data columns. If no tables are given
then we read from standard input. The first two columns are
expected to contain the required x and y data. Depending on
your -W and -E settings we may expect an additional 1-3 columns
with error estimates of one of both of the data coordinates, and
even their correlation.
-Amin/max/inc
Instead of determining a best-fit regression we explore the full
range of regressions. Examine all possible regression lines
with slope angles between min and max, using steps of inc
degrees [-90/+90/1]. For each slope the optimum intercept is
determined based on your regression type (-E) and misfit norm
(-N) settings. For each segment we report the four columns
angle, E, slope, intercept, for the range of specified angles.
The best model parameters within this range are written into the
segment header and reported in verbose mode (-V).
-Clevel
Set the confidence level (in %) to use for the optional calcula-
tion of confidence bands on the regression [95]. This is only
used if -F includes the output column c.
-Ex|y|o|r
Type of linear regression, i.e., select the type of misfit we
should calculate. Choose from x (regress x on y; i.e., the mis-
fit is measured horizontally from data point to regression
line), y (regress y on x; i.e., the misfit is measured verti-
cally [Default]), o (orthogonal regression; i.e., the misfit is
measured from data point orthogonally to nearest point on the
line), or r (Reduced Major Axis regression; i.e., the misfit is
the product of both vertical and horizontal misfits) [y].
-Fflags
Append a combination of the columns you wish returned; the out-
put order will match the order specified. Choose from x
(observed x), y (observed y), m (model prediction), r (residual
= data minus model), c (symmetrical confidence interval on the
regression; see -C for specifying the level), z (standardized
residuals or so-called z-scores) and w (outlier weights 0 or 1;
for -Nw these are the Reweighted Least Squares weights) [xymr-
czw]. As an alternative to evaluating the model, just give -Fp
and we instead write a single record with the model parameters
npoints xmean ymean angle misfit slope intercept sigma_slope
sigma_intercept.
-N1|2|r|w
Selects the norm to use for the misfit calculation. Choose
among 1 (L-1 measure; the mean of the absolute residuals), 2
(Least-squares; the mean of the squared residuals), r (LMS; The
least median of the squared residuals), or w (RLS; Reweighted
Least Squares: the mean of the squared residuals after outliers
identified via LMS have been removed) [Default is 2]. Tradi-
tional regression uses L-2 while L-1 and in particular LMS are
more robust in how they handle outliers. As alluded to, RLS
implies an initial LMS regression which is then used to identify
outliers in the data, assign these a zero weight, and then redo
the regression using a L-2 norm.
-S[r] Restricts which records will be output. By default all data
records will be output in the format specified by -F. Use -S to
exclude data points identified as outliers by the regression.
Alternatively, use -Sr to reverse this and only output the out-
lier records.
-Tmin/max/inc | -Tn
Evaluate the best-fit regression model at the equidistant points
implied by the arguments. If -Tn is given instead we will reset
min and max to the extreme x-values for each segment and deter-
mine inc so that there are exactly n output values for each seg-
ment. To skip the model evaluation entirely, simply provide
-T0.
-W[w][x][y][r]
Specifies weighted regression and which weights will be pro-
vided. Append x if giving 1-sigma uncertainties in the x-obser-
vations, y if giving 1-sigma uncertainties in y, and r if giving
correlations between x and y observations, in the order these
columns appear in the input (after the two required and leading
x, y columns). Giving both x and y (and optionally r) implies
an orthogonal regression, otherwise giving x requires -Ex and y
requires -Ey. We convert uncertainties in x and y to regression
weights via the relationship weight = 1/sigma. Use -Ww if the
we should interpret the input columns to have precomputed
weights instead. Note: residuals with respect to the regression
line will be scaled by the given weights. Most norms will then
square this weighted residual (-N1 is the only exception).
-V[level] (more a|)
Select verbosity level [c].
-acol=name[^<i>a|] (more a|)
Set aspatial column associations col=name.
-bi[ncols][t] (more a|)
Select native binary input.
-bo[ncols][type] (more a|)
Select native binary output. [Default is same as input].
-d[i|o]nodata (more a|)
Replace input columns that equal nodata with NaN and do the
reverse on output.
-e[~]^<i>apattern^<i>a | -e[~]/regexp/[i] (more a|)
Only accept data records that match the given pattern.
-g[a]x|y|d|X|Y|D|[col]z[+|-]gap[u] (more a|)
Determine data gaps and line breaks.
-h[i|o][n][+c][+d][+rremark][+rtitle] (more a|)
Skip or produce header record(s).
-icols[+l][+sscale][+ooffset][,^<i>a|] (more a|)
Select input columns and transformations (0 is first column).
-ocols[,a|] (more a|)
Select output columns (0 is first column).
-^ or just -
Print a short message about the syntax of the command, then
exits (NOTE: on Windows just use -).
-+ or just +
Print an extensive usage (help) message, including the explana-
tion of any module-specific option (but not the GMT common
options), then exits.
-? or no arguments
Print a complete usage (help) message, including the explanation
of all options, then exits.
ASCII FORMAT PRECISION
The ASCII output formats of numerical data are controlled by parameters
in your gmt.conf file. Longitude and latitude are formatted according
to FORMAT_GEO_OUT, absolute time is under the control of FOR-
MAT_DATE_OUT and FORMAT_CLOCK_OUT, whereas general floating point val-
ues are formatted according to FORMAT_FLOAT_OUT. Be aware that the for-
mat in effect can lead to loss of precision in ASCII output, which can
lead to various problems downstream. If you find the output is not
written with enough precision, consider switching to binary output (-bo
if available) or specify more decimals using the FORMAT_FLOAT_OUT set-
ting.
EXAMPLES
To do a standard least-squares regression on the x-y data in points.txt
and return x, y, and model prediction with 99% confidence intervals,
try
gmt regress points.txt -Fxymc -C99 > points_regressed.txt
To just get the slope for the above regression, try
slope=`gmt regress points.txt -Fp -o5`
To do a reweighted least-squares regression on the data rough.txt and
return x, y, model prediction and the RLS weights, try
gmt regress rough.txt -Fxymw > points_regressed.txt
To do an orthogonal least-squares regression on the data crazy.txt but
first take the logarithm of both x and y, then return x, y, model pre-
diction and the normalized residuals (z-scores), try
gmt regress crazy.txt -Eo -Fxymz -i0-1l > points_regressed.txt
To examine how the orthogonal LMS misfits vary with angle between 0 and
90 in steps of 0.2 degrees for the same file, try
gmt regress points.txt -A0/90/0.2 -Eo -Nr > points_analysis.txt
REFERENCES
Draper, N. R., and H. Smith, 1998, Applied regression analysis, 3rd
ed., 736 pp., John Wiley and Sons, New York.
Rousseeuw, P. J., and A. M. Leroy, 1987, Robust regression and outlier
detection, 329 pp., John Wiley and Sons, New York.
York, D., N. M. Evensen, M. L. Martinez, and J. De Basebe Delgado,
2004, Unified equations for the slope, intercept, and standard errors
of the best straight line, Am. J. Phys., 72(3), 367-375.
SEE ALSO
gmt(1), trend1d(1), trend2d(1)
COPYRIGHT
2017, P. Wessel, W. H. F. Smith, R. Scharroo, J. Luis, and F. Wobbe
5.4.2 Jun 24, 2017 gmtregress(1)
gmt5 5.4.2 - Generated Wed Jun 28 16:32:36 CDT 2017
