Statistica Extract, Transform, and Load (ETL) Overview

Overview of Functionality

Statistica ETL combines the capabilities of the STATISTICA system for efficient processing of data from standard databases (Microsoft SQL, Oracle ) as well as specialized process databases (e.g., OSI Pi), with the powerful Statistica data processing capabilities for data filtering, aggregation, alignment, and analyses.

Statistica ETL is the ideal solution for:

  • Building enterprise analysis platforms that integrate process historians with quality control and advanced process monitoring systems

  • Creating specialized data warehouses that align and validate time-stamped (e.g., batch-time data, as they are commonly collected in various process industries) with outcome (e.g., assay) data

  • Building data warehouses for ad-hoc and automated root cause analysis for complex manufacturing processes (e.g., chemical or pharmaceutical manufacturing, power generation, mining, etc.)

  • Creating 21 CFR Part 11 compliant data warehouses for validated reporting, for complex processes

  • Any data warehousing application that requires specialized data validation, pre-processing, aggregation, standardization, or merging of unconventional data, and thus cannot be built with off-the-shelf standard database tools

Extract Data

Statistica ETL provides a secure platform for managing efficiently multiple database connections to various types of databases, including process databases (e.g., via the specialized Statistica OSI PI Connector). Statistica Enterprise stores the metadata describing the nature of the tables that are queried, such as control limits, specification limits, valid data ranges etc. See Statistica Enterprise for more details.

Transform Data

The Statistica ETL module provides unique capabilities for processing and merging data, in particular process data that are difficult to manage using standard database tools.

Statistica ETL's transformation tools can be used for:

  • Aligning time-stamped process data with other data sources, such as process data collected at different time intervals, or only collected once per part, ID, batch, etc.

  • Processing of batch-time data, to achieve equal batch "lengths", and unstacking such data to make them available for subsequent analyses and process monitoring of the maturation process (see also Statistica MSPC for details).

  • Aggregating and/or smoothing of data, so that meaningful subsequent process monitoring methods (e.g., for change-point or trend detection) can be applied to robust or smoothed estimates of process averages within aggregated time intervals.

  • Configuring complex data alignment tasks of multiple diverse data sources into a single ETL object, which can be deployed into Statistica Enterprise, to be applied ad-hoc or as scheduled Statistica ETL tasks, to support a dedicated data warehouse that maintains validated and aligned data for comprehensive process monitoring and optimization.

Load Data

The Statistica ETL solution automates the process of validating and aligning multiple diverse data sources into a single source suitable for ad-hoc or automated analyses. When deployed inside the Statistica Enterprise framework, data can be written back to dedicated database tables, or to Statistica data tables, to provide analysts or process engineers convenient access to real-time performance data, without the need to perform tedious data preprocessing or cleaning before any actionable information can be extracted.

See also: Data Alignment: Time-Indexed vs. ID-Based, Statistica ETL: Time-Indexed - Startup Panel, Statistica ETL: ID-Based - Startup Panel