In standard multiple regression we estimate the regression coefficients by "finding" those coefficients that minimize the residual variance (sum of squared residuals) around the regression line. Any deviation of an observed score from a predicted score signifies some loss in the accuracy of our prediction, for example, due to random noise (error). Therefore, we can say that the goal of least squares estimation is to minimize a loss function; specifically, this loss function is defined as the sum of the squared deviation about the predicted values (the term loss was first used by Wald, 1939). When this function is at its minimum, then we get the same parameter estimates (intercept, regression coefficients) as we would in Multiple Regression; because of the particular loss functions that yielded those estimates, we can call the estimates least squares estimates.

Phrased in this manner, there is no reason why you cannot consider other loss functions. For example, rather than minimizing the sum of squared deviations, why not minimize the sum of absolute deviations? Indeed, this is sometimes useful in order to "de-emphasize" outliers. Relative to all other residuals, a large residual will become much larger when squared. However, if one only takes the absolute value of the deviations, then the resulting regression line will most likely be less affected by outliers.

Nonlinear Estimation contains several function minimization methods that can be used to minimize any kind of loss function. When the least squares loss function is used, the estimation of the parameters for nonlinear models is often more efficient, in particular with large data sets. In that case, the Levenberg-Marquardt algorithm (see also User-Specified Regression, Least Squares) should be used.

For more information, see also:

Maximum likelihood and probit/logit models

Function Minimization Algorithms

Start Values, Step Sizes, Convergence Criteria

Penalty Functions, Constraining Parameters

Levenberg-Marquardt algorithm (for nonlinear least squares)

Hessian Matrix and Standard Errors