Create a Random Sample - Simple Sampling Tab

Select the Simple Sampling tab of the Create a Random Sample dialog box to access options to specify either simple random sampling from the input data (see also, Probability Sampling and EPSEM samples), Systematic random sampling, or Split node random sampling (splitting of the data file into two data files). Use the options on the Stratified Sampling tab to draw stratified samples using one or more stratification variables.

Note that regardless of the specific option selected on this tab, sampling will include only those cases and variables selected via the Cases and Variables buttons (the buttons that are available at the top of this dialog regardless of which tab is selected).

Options for simple sampling.

Simple random sampling. Select the Simple random sampling option button to create a probability sample (subset) via random sampling. You have two choices regarding how the sampling fraction for drawing the sample will be determined: via either the percentage of cases within the original spreadsheet or as an approximate number of cases; select the respective option (Calculate based on percentage of cases or Calculate based on approximate N) on the Options tab to select either method of determining the sampling fraction.

% =/N =. Specify the approximate percentage of cases or the approximate number of cases to be used when creating your subset according to the respective option (Calculate based on percentage of cases or Calculate based on approximate N) specified on the Options tab.

With replacement. If you select the With replacement check box, once a case is selected to be included into the subset, that case will be placed back into the pool of available choices for the remaining cases in the subset (hence, an individual case can appear more than once in the resulting subset). Oversampling makes it possible for you to specify more cases to be returned than exist in the input. If the number of cases is 50, you can specify 75 or 150% cases back from sampling.

Exact. Select this check box to ensure that the exact % of cases specified are returned.

Systematic random sampling. Select the Systematic random sampling option button to create the probability sample (subset) via systematic random sampling. For instance, if you enter a 5 into the K= box, STATISTICA will randomly select a case within the first five cases and then finish obtaining the subset by selecting each fifth case in the spreadsheet after the originally selected case.

Split node random sampling. Select this option button to randomly divide the selected observations in the current data file into two data files; this option is particularly useful in the context of data mining projects in order to create separate data sets for training and testing of models for predictive data mining.

% =/N =. Specify the approximate percentage of cases or the approximate number of cases the first of two subsets (when splitting the file) according to the respective option (Calculate based on percentage of cases or Calculate based on approximate N) specified on the Options tab.

Use casewise MD deletion. Select this check box to apply casewise deletion of missing data; then, only observations (cases) will be selected that have no missing data for any of the variables selected into the subset variable list via the Variables button at the top of the dialog box.