Text Free Import Options

Click the V7 Free Import button in the Advanced delimited importing options dialog box to display the Text Free Import Options dialog box.

Imported file. The name of the selected text file to import is displayed here.

File size. Under File size, the number of variables must be accurately specified. 

No. of vars. In this field, specify the number of variables in the ASCII (text) file that is being imported.

No. of cases. In this field, specify the number of cases in the ASCII (text) file that is being imported. If you are uncertain about the exact number of cases, you can overestimate. STATISTICA will detect the actual length of the file during import. Note that each line in the ASCII source file can be up to 4,000 characters in length. This limit only applies to the individual line length and not the total length of a "case" of data: each imported case can be represented by many lines of data in the source file.

Import case names. Select this check box if you want case names to be obtained from the first field of each record in the ASCII file. If this variable contains more than 20 characters, only the first 20 will be used as the case name. If the field contains numeric values instead of text, the case names will be created as "text images" of these values.

Start import at row. In this field, specify what row of the text file to begin the import.

Format statement (to identify types of values). This option explicitly defines for Statistica the exact contents of the input ASCII file. (This is an important distinction: the format statement is a set of instructions for interpreting the structure of the input file, not a definition of the Statistica variables to be created.) The format statement is entered as a list of formats in the form nX where n is an integer multiplier indicating the number of times the format is to be repeated (no multiplier = 1) and X is the column type (e.g., 40F means 40 fields containing numeric (here float) values).

Column Type Specifiers:

A - Text (Alphanumeric)

F - Float (also R - Real)

D - Double Float (also DR - Double Real)

I - Integer (also NI - Normal Integer) - values ± 32767

S - Short Integer (also SI) - values ± 127

LI - Long Integer (also J) - values ± 2,140,000,000

L - Logical

The format specified as logical is expected to contain text designators of true and false. The following three conventions will be recognized by Statistica when the data are imported:

TRUE or FALSE if the field length is 5 or more.

YES or NO if it is 3 or 4 characters long.

Y, N, or T, F respectively, if it is 1 character long.

Values of TRUE or YES will be imported as 1, FALSE or NO will be imported as 0.

Text fields (type A) must also include a length value from 1 to 255 immediately following the letter A, indicating the maximum possible length of text in this field.

The slash character (/) can be used to indicate that the remainder of the current line in the input file should be ignored (i.e., skip to the next line in the input file). If the multiplier precedes a list of formats enclosed in parentheses, then the list of formats within the parentheses will be repeated the number of times specified by the multiplier. Example: 2(2L a5) is equivalent to L L A5 L L A5 and specifies two Logical variables followed by a Text variable that can hold up to five characters, then two more Logical variables and a final Text variable (again up to five characters).

Separators. Under Separators, you can define the characters used in the input file as delimiters. (The final list of separators to be used will be the combination of the set of selected Basic and any Additional separators.)

Basic. In this drop-down list, select the type of delimiter used in the input file from four predefined sets of separators (CR stands for carriage return, LF stands for line feed, and FF stands for form feed).

undefined. This setting means that Statistica will not use a predefined set (see Additional button, below).

blank chars. This set includes: <space>, <tab>, <FF>, <LF>, and <CR/LF>.

standard set. This set includes: comma (,), semicolon (;), <space>, <tab>, and <CR/LF>.

non-numeric. This set includes all characters except: 0-9, period (.), minus (-), and plus (+).

Additional. Click the Additional button to display the Text Free Separators dialog box, from which you can select the delimiter(s).

Treat multiple separators as MD. When you select this check box, Statistica will interpret each pair of adjacent separators as an occurrence of missing data (an absent value) and will place the default missing data value (-999999998) in the position between the adjacent separators. If this option is cleared, then multiple separator characters are treated as one separator, and missing data must be explicitly coded into the ASCII file as a unique value (for instance -999999998). This option is particularly useful if, for instance, individual values in the data file are separated by spaces, with a variable number of spaces between values. If spaces are used as separators, then each pair of spaces would be seen as an occurrence of missing data, and the resulting file would be full of missing values.

Use quotation marks as text boundaries. Select this check box if double (") or single (') quotation marks are used as text boundaries and the specified separator characters appear within the values of text variables in the input file (e.g., "John Jones, Ph.D.", uses the comma both as part of the text and as a separator after it). In this case, when Statistica imports the data, the quotation marks will be recognized only as boundaries around the text values, keeping the text values and the embedded separator character together (the quotation marks will not be included as part of the imported text values).

Note that if a text string is to contain quotation marks as part of the string itself (e.g., as in the titles of books such as "Moby Dick"), then two methods can be used to import them:

  • Select this option and enclose the entire text string within the alternate quotation mark (opposite of the embedded one); e.g., 'William Shakespeare, "King Lear", Act 2, Scene 1' will be imported as William Shakespeare, "King Lear", Act 2, Scene 1.

  • Select this option and then double the quotation mark wherever it is to be preserved; e.g., "William Shakespeare's ""King Lear"", ""Macbeth"" and ""Hamlet"" will be imported as William Shakespeare's "King Lear", "Macbeth" and "Hamlet".

If this option is cleared, then the character will be interpreted as a separator and not as a part of the text value and the quotation marks will be imported as part of the text; e.g., .'William Shakespeare, "King Lear", Act 2, Scene 1' will be imported as 'William Shakespeare in one column, "King Lear" in another column, Act 2 in a different column and Scene 1' in another column.

Trim leading spaces. Select this check box if the ASCII (text) file you are about to import contains leading blank spaces in some rows, such as shown in the example below where the rows starting with 9 and 8 are offset by a leading blank.

If the Trim leading spaces check box is set, then leading blank spaces will not be (erroneously) interpreted as field separators, and in this example, the 4 (rows) by 3 (columns) data matrix will be properly imported.

File contents. The File contents field displays the contents of the text file to import.

OK. Click the OK button to accept the options selected and import the text file.

Cancel. Click the Cancel button to close the dialog box without importing the text file.