Distributions & Simulation Example

Overview. The Distributions & Simulation module is used to evaluate the fit of theoretical distributions to observed data. In addition, you can simulate data from those theoretical distributions with the choice of incorporating the correlation structure of the data. Although seemingly simple, this module enables you to accurately model the current processes that generate the data, and from there you can simulate from those processes allowing you to evaluate the performance of a system.  

For this example, assume that we are manufacturing a hinge. There are four parts to this hinge. If the sum of the first three parts is greater than the width of the last part, then the product is defective.

Instead of having to wait to accrue the required data, we can fit theoretical distributions to the observed data, simulate from those distributions, and then draw conclusions based upon the simulation, for example, determining the percentage of defectives, etc.  

Specifying the Analysis. Open the SimulationRiskData.sta data file, and start the Distributions & Simulation module.

Ribbon bar. Select the Home tab. In the File group, click the Open arrow and select Open Examples to display the Open a STATISTICA Data File dialog box. The SimulationRiskData.sta data file is located in the Datasets folder. Then, select the Statistics tab. In the Base group, click More Distributions to display the Distributions & Simulation Startup Panel.

Classic menus. Open the data file by selecting Open Examples from the File menu; the data file is located in the Datasets folder. Then, from the Statistics menu, select Distributions & Simulation to display the Distributions & Simulation Startup Panel.

On the Quick tab, select Fit Distribution.

Click the OK button to display the Fit Distributions dialog box. On the Quick tab, click the Variables button, and in the variable selection dialog box, select variables 1-4 as the Continuous variables.

Click OK to close the variable selection dialog box.

In the Fit Distributions dialog box, select the Continuous variables tab to view the available distributions to fit to the observed data. On this tab, you can select which distributions you want to fit to the observed data. For this example, we will fit all distributions to each variable (already selected by default). Click the OK button to run the analysis.

Once the analysis is complete, the Fit Distributions Results dialog box is displayed. Select the Save Fit tab to view the results and see which distribution was considered the best fit for each selected variable. By default, Part1 is selected. According to the p value of the K-S test, the Johnson SB distribution is the best fit for Part1.

Click the >> button to scroll through the distributional fit results of Part2, Part3 and Part4. According to the K-S test, the Gaussian Mixture is the best fit for Part2 and Part3, and the Johnson SB distribution is the best for Part4.

Select the Quick tab. This tab contains options to create graphs that can help you visualize the results of the analysis. Select Part1 if it is not already selected by default in the Variables drop-down list. From the Distribution drop-down list, select Johnson. Click the Empirical CDF plot button to display the Empirical Cumulative Distribution Function plot.

Next, click the Q-Q plot button to display the Quantile-Quantile plot.

Note that you can use the Interactive Graphics Controls at the bottom of the graph window to adjust the transparency of the markers.

Both of these plots show that the Johnson distribution is a good fit to the observed data for Part1. You can do the same for the remaining variables.

For this example, we will continue with the analysis. Click the Run simulation button to display the Simulation Methods dialog box.

Since the four parts are not independent of one another, we want to incorporate the correlation structure. To do this, select Iman Conover as the simulation method. Set the Number of Samples to 100,000. Next, click the Simulate button.

Once the results spreadsheet is displayed, select the Data tab (in STATISTICA) and in the Mode group, select the Input check box (ribbon bar), or select Input Spreadsheet from the Data menu (classic menus).

Since a defect is defined as an item where the sum of the first three parts is greater than the fourth part, we want to create a new variable that describes this relationship of defects to the four parts. We will create a new variable and use the following spreadsheet formula: v4-v1-v2-v3.

Right-click on any of the variable headers in the spreadsheet, and from the shortcut menu select Add Variables.

In the Add Variables dialog box, double-click in the After edit box. Select Part4, and click OK. In the Add Variables dialog box, in the Long name (label or formula with Functions) edit box, enter the formula =v4-v1-v2-v3. Leave all other defaults, and click OK. A new variable called NewVar has been created in the spreadsheet.

To view the distribution of the new variable, we will create a histogram of the data.

Ribbon bar. Select the Graphs tab. in the Common group, click Histogram to display the 2D Histograms Startup Panel.

Classic menus. From the Graphs menu, select Histograms to display the 2D Histograms Startup Panel.

Click the Variables button, select NewVar, and click the OK button. Click OK in the 2D Histograms Startup Panel.

We can see that there is a small percentage of cases that are defective, that is, less than 0. (A defect is defined as the sum of the first 3 parts being greater than the forth, v1+v2+v3>v4. With some simple math, this is rewritten as: v4-v1-v2-v3<0.) The percentage of defectives from the simulated data is about 10%. Further results might be gleaned from the simulated data, such as computation of certain quality or process-related statistics such as cpk, etc. The simulated results can then be used to guide engineers to change certain aspects of the production process.