Import Delimited Text Files

Access these options through any one of the following steps:

  • Select the Delimited option button in the Importing file dialog box or the Text File Import Type dialog box and click OK  to display the Import Delimited Text Files dialog box.

  • Select the Specifications tab of the HDFS Import Text node dialog box (accessible from the Feature Finder, the ribbon bar, or the Node Browser) to access these options. In addition, to these options, a File button is included. Click the File button to display the HDFS Browser dialog box, where you can browse to a Hadoop file to import.

  • Select the Downstream tab of the Big Data Analytics Model Execution dialog box to access these options.

Variable Delimiting. Under Variable Delimiting, specify the custom characters (or string) used in the input file as the delimiter. You can select the type of delimiter used from four predefined separators - Tab, Space, Comma, or Semicolon - or specify other separators by selecting the Others option button and entering the desired separator in the adjacent text box. For more extensive field separator options, see the Text Free Import Options topic.

Match all. This check box is available when the Others option button is selected. Select this check box to specify that Statistica should use all the characters in the Others text box as a single string to represent a delimiter; otherwise, each character in the text box will be seen as separate delimiters. For example, if you enter "ENDCOLUMN" in the Others text box and select the Match all check box, then any occurrence of the word "ENDCOLUMN" in the file will be seen as the end of current column of text.

Ignore consecutive delimiters. Select this check box to specify that Statistica should recognize two or more consecutive delimiters as only one.

Import Options. Under Import Options, specify options related to row and data handling.

Take variable names from first row in the file. Select this check box to create variable names from the first row in the text file.

Number of leading cases to skip. Select this check box to specify which row of the text file to begin the import. When this check box is selected, the text box adjacent to it becomes available. Enter the row number in this text box from which you want to begin the import.

Maximum number of cases to import. Select this check box to specify how many rows of the text file to import. When this check box is selected, the text box adjacent to it becomes available. In this text box, enter the maximum number of rows you want to import. Note that when this check box is selected, "1" is the default in the adjacent text box. Hence, 1 must be your "minimum" maximum.

Decimal separator character. Select this text box to specify the character used as the decimal separator in the text file (e.g., '.' or ','). When this check box is selected, the text box adjacent to it becomes available. In this text box, enter the character you want to use as the decimal separator.

Missing data text string. Select this check box to specify a custom character or string that you want to be imported as missing data. When this check box is selected, the text box adjacent to it becomes available. In this text box, enter the character or string of characters you want to be imported as missing data.

Skip blank lines. Select this check box to avoid importing empty rows.

Advanced. Click the Advanced button to display the Advanced delimited importing options dialog box. Use the options in this dialog box to define various escape characters you may want to use for the import.

Variable Options. Under Variable Options, enter the specifications for the variable(s). Note. Click on a column in the preview window to first select a variable to edit. Each column may be defined as an import variable. You can select multiple columns to specify the same options for all selected columns.

Name. Select this check box to specify a name for the selected variable. When this check box is selected, the text box adjacent to it becomes available. In the text box, enter the name to be used as the variable name for the selected column in the preview window.

Number. Select this check box to specify the variable order. When this check box is selected, the text box adjacent to it becomes available. In the text box, enter the number for the order of the selected column as it will appear in the spreadsheet. For example, if you select the first column and enter "2" in this field, then the column will be imported as the second variable.

Missing data value. Select this check box to specify the value to be used as missing data for the variable. When this check box is selected, the text box adjacent to it becomes available. In the text box, enter the value to be used.

Data type. Click the arrow on the right to display the drop-down list of data types. From the list, select the data type for the selected column.

Input format. If data type "Date" has been selected for a variable, this check box becomes available. Select this check box to specify the date format of the column (e.g., "DD/MM/YYYY"). When this check box is selected, the text box adjacent to it becomes available. In the text box, enter the date format for the variable. Note. the Formatting button also becomes available when the Input format check box is selected. See Formatting below

Remove non-numeric characters. Select this check box to remove any text values from the column. Values that are entirely textual will be imported as missing data, and values that contain both numbers and text will have all text characters removed. Note. this check box is only available if data types Double, Integer, and Byte are selected.

Omit. Select this check box to omit the import the selected column.

Formatting. Click the Formatting button to display the Input Date Formatting dialog box, which displays standard date formats to enter.

Case Delimiting. Under Case Delimiting, specify the custom characters (or string) used as the input file's row delimiter.

Carriage return. Select this option button to specify a carriage return as the end-of-line character.

Line feed. Select this option button to specify a line feed as the end-of-line character.

Others. Select this option to specify a custom character or string as the end-of-line delimiter. When this option button is selected, the text box adjacent becomes available. In the text box, enter the custom character or string to specify the end-of-line character or string.

Match all. When the Others option button is selected, this check box becomes available. Select this check box to specify that Statistica should use the text in the Others edit field as a single string representing a row delimiter; otherwise, each character in this edit field will be seen as separate delimiters. For example, if you enter "ENDL" in the Others edit field and select this option, then any occurrence of the word "ENDL" in the file will be seen as the end of current row of text.

Preview. Enter the number of rows in this text box that you want to show in the preview window below the label Text File (filename.txt).

Preview all. Select this check box to preview all rows from the file in the preview window below the label Text File (filename.txt). If this check box is selected, the Preview text box is not available.

Format String. The Format String field will display the parsing string currently being used by the text importer.

Auto refresh. Select this check box to refresh the preview whenever an option is changed.

OK. Click the OK button to accept the options selected and import the text file.

Cancel. Click the Cancel button to close the dialog box without importing the text file.

Also see HDFS Import Text node example.