Data Miner Recipes - Options Tab

The Options tab of Data Miner Recipes is used to set global options for recipes. Since most of these options are applied to the Data preparation step, they should be set prior to starting work on a new recipe. Modifications to the values on this tab only apply to the current recipe unless the Save defaults button is clicked.

Data Miner Recipes automatically samples data for analyses on large data sets. Sampling is used when the file size is large or the number of variables times the number of cases is large. This tab contains options that define a large data set.

Global settings.

Attach input file to Project Workbook on saving if file size is less than. Statistica includes the data file used in the Data preparation step in the Report on saving the project. By default, data files that are 20 kilobytes or smaller are included. The maximum file size that will be included is 10,000 kilobytes. The input file will be attached to the Project Workbook on save, but will NOT be embedded into the Project Workbook. To review the input file after opening the project, select View data file from the Report button drop-down list (located on the Steps tab).

Use default sampling. By default, large data sources are sampled when the file size is greater than 10 megabytes or the number of variables times the number of cases is greater than 200,000. Clear this check box if you want to define different values using the options described below.

Data size for efficient processing user defined.

Enable automatic sampling if file size exceeds. This option only impacts the recipe when the Use default sampling check box (described above) is not selected. When you want to change the definition of a large data set for automatic sampling, enter the value here. By default it is set to 10 megabytes. A large data set can be defined as 1 to 500 megabytes.

Enable automatic sampling if Number of Variables * Number of Cases exceeds. This option only impacts the recipe when the Use default sampling check box is not selected. When you want to change the definition of a large data set for automatic sampling, enter the value here. By default it is set to 200,000. A large data set can be defined as 10,000 to 2,147,483,647.

Generate C/C++ code for models. Select this check box to create C/C++ code for the models that were generated on the Model building step. The code can be viewed in the Deployment step.

Reset defaults. Click this button to reset the values to their original default settings.

Save defaults. Click this button to retain the settings for future recipes.