The methods available in the Statistica Generalized Additive Models facilities are implementations of techniques developed and popularized by Hastie and Tibshirani (1990). A detailed description of these and related techniques, the algorithms used to fit these models, and discussions of recent research in this area of statistical modeling can also be found in Schimek (2000).

Additive models. The methods described in this section, and available in the Statistica Generalized Additive Models facilities, represent a generalization of multiple regression (which is a special case of general linear models). Specifically, in linear regression, a linear least-squares fit is computed for a set of predictor or X variables, to predict a dependent Y variable. The well known linear regression equation with m predictors, to predict a dependent variable Y, can be stated as:

Y = b0 + b1*X1 + ... + bm*Xm

Where Y stands for the (predicted
values of the) dependent variable, X1through Xm
represent the m values for the predictor variables, and b0, and b1 through bm
are the regression coefficients estimated by multiple regression. A generalization
of the multiple regression model would be to maintain the additive nature
of the model, but to replace the simple terms of the linear equation bi*Xi
with fi(Xi)
where fi is a non-parametric
function of the predictor Xi.

Generalized linear
models. To summarize the basic idea, the generalized
linear model differs from the general
linear model (of which multiple regression is a special case) in two
major respects: First, the distribution of the dependent or response variable
can be (explicitly) non-normal, and does not have to be continuous, e.g.,
it can be binomial;
second, the dependent variable values are predicted from a linear combination
of predictor variables, which are "connected" to the dependent
variable via a link
function. The general linear model for a single dependent variable
can be considered a special case of the generalized linear model: In the
general linear model the dependent variable values are expected to follow
the normal
distribution, and the link function is a simple identity function
(i.e., the linear combination of values for the predictor variables is
not transformed).

To illustrate, in the general linear model a response variable Y is linearly associated with values on the X variables while the relationship in the generalized linear model is assumed to be

Y = g(b0 + b1*X1 + ... + bm*Xm)

where g(…) is a function. Formally, the inverse function of g(…), say gi(…), is called the link function; so that:

gi(muY) = b0 + b1*X1 + ... + bm*Xm

where mu-Y stands for the expected value of Y.

Note that Statistica contains a designated module to estimate the parameters of the generalized linear model for a wide variety of distributions and link functions. See Generalized Linear/Nonlinear Models for details.

Distributions and link functions. In the Generalized Additive Models module, you can choose from a wide variety of distributions for the dependent variable, and link functions for the effects of the predictor variables on the dependent variable (see McCullagh and Nelder, 1989; Hastie and Tibshirani, 1990; see also GLZ Introductory Overview - Computational Approach for a discussion of link functions and distributions):

Normal, Gamma, and Poisson distributions:

Log link: f(z) = log(z)

Inverse link: f(z) = 1/z

Identity link: f(z) = z

Binomial distributions:

Logit link: f(z)=log(z/(1-z))

Generalized additive models. We can combine the notion of additive models with generalized linear models, to derive the notion of generalized additive models, as:

gi(muY) = Si(fi(Xi))

In other words, the purpose of generalized additive models is to maximize the quality of prediction of a dependent variable Y from various distributions, by estimating unspecific (non-parametric) functions of the predictor variables which are "connected" to the dependent variable via a link function.

Estimating
the non-parametric function of predictors via scatterplot smoothers.
A unique aspect of generalized
additive models are the non-parametric functions fi
of the predictor variables Xi.
Specifically, instead of some kind of simple or complex parametric functions,
Hastie and Tibshirani (1990) discuss various general scatterplot
smoothers that can be applied to the X variable values, with the target
criterion to maximize the quality of prediction of the (transformed) Y
variable values. One such scatterplot smoother is the cubic
smoothing splines smoother, which generally produces a smooth generalization
of the relationship between the two variables in the scatterplot.

To summarize, instead of estimating single
parameters (like the regression weights in multiple
regression), in generalized additive models, we find a general unspecific
(non-parametric) function that relates the predicted (transformed) Y values
to the predictor values.

A specific example: The generalized additive logistic model. Let us consider a specific example of the generalized additive models: A generalization of the logistic (logit) model for binary dependent variable values. As also described in detail in the context of the Nonlinear Estimation and Generalized Linear/Nonlinear Models modules of Statistica, the logistic regression model for binary responses can be written as follows:

y=exp(b0+b1*x1+...+bm*xm)/{1+exp(b0+b1*x1+...+bm*xm)}

Note that the distribution of the dependent
variable is assumed to be binomial, i.e., the response variable can only
assume the values 0 or 1 (e.g., in a market research study, the purchasing
decision would be binomial: The customer either did or did not make a
particular purchase). We can apply the logistic link function to the probability
p (ranging between 0

p' = log {p/(1-p)}

By applying the logistic link function, we can now rewrite the model as:

p'
= b0 + b1*X1

Finally, we substitute the simple single-parameter additive terms to derive the generalized additive logistic model:

p' = b0 + f1(X1) + ... + fm(Xm)

An example application of this model can be found in Hastie and Tibshirani (1990).

Fitting
generalized additive models. Detailed descriptions of how generalized
additive models are fit to data can be found in Hastie and Tibshirani
(1990), as well as Schimek (2000, p. 300). In general there are two separate
iterative operations involved in the algorithm, which are usually labeled
the outer and inner
loop. The purpose of the outer loop is to maximize the overall fit of
the model, by minimizing the overall likelihood of the data given the
model (similar to the maximum
likelihood estimation procedures as described in, for example,

Interpreting the results. Many of the standard results statistics computed by the Generalized Additive Models module are similar to those customarily reported by linear or nonlinear model fitting procedures. For example, Statistica will compute predicted and residual values for the final model, and display various graphs of the residuals to help the user identify possible outliers, etc. Refer also to the description of the residual statistics computed by the Generalized Linear/Nonlinear Models module for details.

The main result of interest, of course, is
how the predictors are related to the dependent variable. Statistica will
compute scatterplots
showing the smoothed predictor variable values plotted against the partial residuals,
i.e., the residuals after removing the effect of all other predictor variables.

This plot allows you to evaluate the nature
of the relationship between the predictor with the residualized (adjusted)
dependent variable values (see Hastie & Tibshirani, 1990; in particular
formula 6.3), and hence the nature of the influence of the respective
predictor in the overall model.

Degrees of freedom. To reiterate, the generalized additive models approach replaces the simple products of (estimated) parameter values times the predictor values with a cubic spline smoother for each predictor. When estimating a single parameter value, we lose one degree of freedom, i.e., we add one degree of freedom to the overall model. It is not clear how many degrees of freedom are lost due to estimating the cubic spline smoother for each variable. Intuitively, a smoother can either be very smooth, not following the pattern of data in the scatterplot very closely, or it can be less smooth, following the pattern of the data more closely. In the most extreme case, a simple line would be very smooth, and require us to estimate a single slope parameter, i.e., we would use one degree of freedom to fit the smoother (simple straight line); on the other hand, we could force a very "non-smooth" line to connect each actual data point, in which case we could "use-up" approximately as many degrees of freedom as there are points in the plot. The user interface for the Generalized Additive Models module allows you to specify the degrees of freedom for the cubic spline smoother; the fewer degrees of freedom you specify, the smoother is the cubic spline fit to the partial residuals, and typically, the worse is the overall fit of the model. The issue of degrees of freedom for smoothers is discussed in detail in Hastie and Tibshirani (1990).

A word of caution. Generalized additive models are very flexible, and can provide an excellent fit in the presence of nonlinear relationships and significant noise in the predictor variables. However, note that because of this flexibility, one must be extra cautious not to over-fit the data, i.e., apply an overly complex model (with many degrees of freedom) to data so as to produce a good fit that likely will not replicate in subsequent validation studies. Also, compare the quality of the fit obtained from the analysis with Statistica Generalized Additive Models to the fit obtained via Statistica Generalized Linear/Nonlinear Models. In other words, evaluate whether the added complexity (generality) of generalized additive models (regression smoothers) is necessary in order to obtain a satisfactory fit to the data. Often, this is not the case, and given a comparable fit of the models, the simpler generalized linear model is preferable to the more complex generalized additive model. These issues are discussed in greater detail in Hastie and Tibshirani (1990).

Another issue to keep in mind pertains to the interpretability of results obtained from (generalized) linear models vs. generalized additive models. Linear models are easily understood, summarized, and communicated to others (e.g., in technical reports). Moreover, parameter estimates can be used to predict or classify new cases in a simple and straightforward manner. Generalized additive models are not easily interpreted, in particular when they involve complex nonlinear effects of some or all of the predictor variables (and, of course, it is in those instances where generalized additive models may yield a better fit than generalized linear models). To reiterate, it is usually preferable to rely on a simple well understood model for predicting future cases, than on a complex model that is difficult to interpret and summarize.

Implementation
of method in Statistica.
The methods available in the Statistica Generalized Additive Models
facilities are implementations of techniques developed and popularized
by Hastie and Tibshirani (1990). Specifically, Statistica provides a convenient
user interface to the popular GAMFIT program available at the StatLib
library of the Department of Statistics at Carnegie Mellon University.

See Generalized Additive Models Program Overview and Generalized Additive Models Index for further details.