 Intrinsically Nonlinear Regression Models - Discontinuous Regression Models

Piecewise linear regression. It is not uncommon that the nature of the relationship between one or more independent variables and a dependent variable changes over the range of the independent variables. For example, suppose we monitor the per-unit manufacturing cost of a particular product as a function of the number of units manufactured (output) per month. In general, the more units per month we produce, the lower is our per-unit cost, and this linear relationship may hold over a wide range of different levels of production output. However, it is conceivable that above a certain point, there is a discontinuity in the relationship between these two variables. For example, the per-unit cost may decrease relatively less quickly when older (less efficient) machines have to be put on-line in order to cope with the larger volume. Suppose that the older machines go on-line when the production output rises above 500 units per month; we can specify a regression model for cost-per-unit as:

y = b0 + b1*x*(x ≤ 500) + b2*x*(x > 500)

In this formula, y stands for the estimated per-unit cost; x is the output per month. The expressions (x ≤ 500) and (x > 500) denote logical conditions that evaluate to 0 if false, and to 1 if true. Thus, this model specifies a common intercept (b0), and a slope that is either equal to b1 (if x ≤ 500 is true, that is, equal to 1) or b2 (if x > 500 is true, that is, equal to 1).

Instead of specifying the point where the discontinuity in the regression line occurs (at 500 units per months in the example above), one could also estimate that point. For example, one might have noticed or suspected that there is a discontinuity in the cost-per-unit at one particular point; however, one may not know where that point is. In that case, simply replace the 500 in the equation above with an additional parameter (e.g., b3). Nonlinear Estimation would then estimate the point of discontinuity. For more details, see Piecewise linear regression.

Breakpoint regression. You could also adjust the equation above to reflect a "jump" in the regression line. For example, imagine that, after the older machines are put on-line, the per-unit-cost jumps to a higher level, and then slowly goes down as volume continues to increase. In that case, simply specify an additional intercept (b3), so that:

y = (b0 + b1*x)*(x ≤ 500) + (b3 + b2*x)*(x > 500)

Nonlinear Estimation includes a predefined breakpoint regression model that can be chosen as a dialog option. However, unlike the model shown above, that option will fit different regression models to different ranges of the dependent y variable.

Comparing groups. The method described here to estimate different regression equations in different domains of the independent variable can also be used to distinguish between groups. For example, suppose in the example above, there are three different plants; to simplify the example, let us ignore the breakpoint for now. If we coded the three plants in a grouping variable by using the values 1, 2, and 3, we could simultaneously estimate three different regression equations by specifying:

y = (xp=1)*(b10 + b11*x) + (xp=2)*(b20 + b21*x) + (xp=3)*(b30 + b31*x)

In this equation, xp denotes the grouping variable containing the codes that identify each plant, b10, b20, and b30 are the three different intercepts, and b11, b21, and b31 refer to the slope parameters (regression coefficients) for each plant. You could compare the fit of the common regression model without considering the different groups (plants) with this model in order to determine which model is more appropriate.