Can I select random subsets of data?

Yes, you can. Subsets of data can be created via both simple random sampling and systematic random sampling. Select Random Sampling from the Data menu to display the Create a Random Sample dialog.

Select the Simple random sampling option button on the Simple Sampling tab to obtain a subset via random sampling. You have two choices in regard to how the subset is created: either via the Percentage of cases within the original spreadsheet or an Approximate number of cases. If you select the With replacement check box, once a case is selected to be included into the subset, that case will be placed back into the pool of available choices for the remaining cases in the subset (hence an individual case can appear more than once in the resulting subset).

Select the Systematic random sampling option button to obtain a subset via systematic random sampling. For instance, if you enter a 5 in the K= box, STATISTICA will randomly select a case within the first five cases and then finish obtaining the subset by selecting each fifth case in the spreadsheet after the originally selected case.

Select the Split node random sampling option button to randomly divide the selected observations in the current data file into two data files; this option is particularly useful in the context of data mining projects in order to create separate data sets for training and testing of models for predictive data mining.