Data Miner Recipes - Steps Tab

The Steps tab of Data Miner Recipes dialog box is used to create new projects or edit existing ones. The upper-left pane, the Step-nodes pane, consists of options and user configurations for creating data mining models, and the right pane contains tabs specific to the step node in use. In addition there is also an information panel in the lower-left corner of the dialog box, which displays useful instructions regarding the current step and how to proceed to the next step.

The information in this topic pertains to the Step-nodes panel and buttons (i.e., Save recipe, Report, Next step, etc.) that are always available on the Steps tab regardless of which tab is selected. For topics on tab-related options, see Data Preparation - Data Preparation tab, Data Preparation - Advanced tab) and Data Miner - Annotations tab.

Step-nodes. The step-node panel is located on the upper-left side of the Steps tab. It can have up to four major nodes: Data preparation, Data for analysis, Data redundancy, and Target variable. The Target variable node has a branching structure with the parent node connecting to four child nodes including Important Variables, Model building, Evaluation and Deployment.

Each node (or step) can exist in one of three states at most (depending on whether its completion is arbitrary or not). Each state is represented by an icon:  , or  . The   indicates a wait state, meaning a step cannot be started because it is dependent on a previous step which has not been completed. The indicates a ready state, meaning you are ready to start the step because any previous steps have been completed. The   indicates a completed step. Note that you must click the Next step button to change (ready state) to   (completed state). The change will only be made if the step is complete.

The Data Miner Recipes steps are arranged in a logical and sequential order. Following this order ensures that all the information required for successful completion of any given step is in place when the step is started. For example, in any model building task, data are used as examples for the model to learn the underlying process relating the input and target variables. Therefore, you can only start the Target variables step when you have successfully completed the Data preparation and Data for analysis steps. Below is a summary list of the Data Miner Recipe steps and the states in which they can exist. Note that not all steps have three states.

 

Step (parent node)

Step (child node)

Data preparation

None

No

Yes

Yes

Data for analysis

None

Yes

Yes

Yes

Data redundancy

None

Yes

Yes

Yes

Target variable

Important variables

No

Yes

Yes

 

Model building

Yes

Yes

Yes

 

Evaluation

Yes

Yes

Yes

 

Deployment

Yes

Yes

Yes

If required, the initial state of a step is  (with the exception of the Data preparation step which is ). You can change the state from to by clicking the Next step button. If the step is complete, the status of the step will be changed from to and the following step will be changed from  to . If you click the Next step button when the step is not complete, a message will be displayed prompting you to complete the step.

Save recipe. Click this button to save the current project to a Data Miner project file (with extension *.dmrproj). When the project is saved, Data Miner stores a variety of options (for the completed steps) so that they can be reloaded when the project is next opened.

Report. Click the Report button to display a menu that contains commands for generating various spreadsheets and graphs for viewing data and results and to create a report for the current project and its configurations

View data file. Select View data file from this menu to generate the data spreadsheet for the analysis variables (i.e., the variables you selected in the Data preparation step).

Results, all steps. Select this command to display results for all the completed steps. Results for each step are described below.

Data preparation. Displays a spreadsheet of the variables selected for analysis. For each selected variable the variable number, name, long name and type (i.e., continuous or categorical) is reported.

Data for analysis. Displays a spreadsheet of basic statistics (e.g., mean, standard deviation, skewness, kurtosis, min, max) for all continuous inputs and targets. Also displays a spreadsheet with information about the inputs and targets specified for the analysis.

Data redundancy. Displays a correlation matrix for all inputs, and a spreadsheet that contains the name of the redundant variables, roles and redundancy criterion and other related information.

Note: the following steps are completed for each target.

Target variable. Displays a summary spreadsheet of best predictors, a MD values graph and a spreadsheet with MD values for selected predictors, a matrix scatterplot, a spreadsheet of eliminated predictors, and a spreadsheet of predictors remaining at this step.

Summary report. Displays sensitivity spreadsheets for each of the models, a summary spreadsheet for neural network models, a summary report for the Model building steps, and a list of models spreadsheet (with statistics for validation sample).

Summary report. Select this command to generate a summary report for each completed step.

Undo. Click this button to return to the state of the project at the point of run and validate (before you clicked the Next step button).

Redo. Click this button to return to the next state of the project at the last run and validate.

Clear step. Click this button to permanently erase all actions taken in a current step. To clear a step, select the step first (by clicking on the appropriate step name in the Step-node panel), and then click the Clear step button. Note that clearing a step will also invalidate (or clear) all the subsequent steps.

Next step. Click this button to validate the current step. This will enable you to move to the next step should the step be complete. If the current step is not complete, a message will be displayed that notifies you that you need to complete the step. If the current step is complete, the yellow will change to a green . Note that you cannot proceed from one step to another when the step-node is a red .

You can click the down arrow button adjacent to the Next step button to display a menu of possible run and validate options. Select Run & validate to run and validate the current step. Select Run to completion to run the entire data miner project.