Data Alignment: Time-Indexed vs. ID-Based

STATISTICA ETL provides two menu options for aligning data:


  • Aggregates data from multiple data sources based on a date/time stamp variable

  • Aligned data by minute, hour, day, week, month, quarter, or year


  • Aggregates data from multiple data sources based on an identifier variable and an optional time variable

  • Optionally aligns data by N equal intervals or N user-specified intervals

Here are three scenarios illustrating these menu options:

1. Time-indexed alignment with a Date/Time variable

In power plants, a database historian (e.g., PI System) often stores data describing the continuous process of power generation. Meanwhile, a separate database (e.g., Oracle, Microsoft SQL Server) stores data describing soot-blowing and/or other operations. While much data is available, it is not easily accessible to the staff and engineers supervising a particular power plant. As a result, most engineers responsible for power plant operations do not have the tools to easily summarize data, generate reports, or perform basic calculations (e.g., root cause analysis. STATISTICA ETL can help engineers collect the information they need in an organized format. Moreover, this data becomes available for analysis, data mining, and reporting.

See also Example 1: Aligning Data from a Cyclone Furnace and Example 2: Comparing Historical Stock Prices.

2. ID-based alignment with an ID variable

In manufacturing industries (e.g., pharmaceutical), a standard merge/alignment of multiple tables by one or more ID variables is often required. STATISTICA ETL can perform different types of merges: One-to-One, One-to-Many, Many-to-One, and Many-to-Many (i.e., Cartesian Join). By default, it aggregates means. However, if the aggregation statistics type is set to "none", a Cartesian join results.

With this functionality, you can:

    • Perform multiple merges (of multiple data sources) in a single operation

    • Compute simple and robust aggregation statistics with data cleaning (e.g., filtering the data into a given range)

    • Perform range checking (for valid ranges) in a single pass

See also Example 3: Aligning Multiple Stages of Process Data.

3. ID-based alignment with an ID variable and a Time variable

This special ID-based scenario uses pre-processing of batch-time data to align with other batch-time or static data. For example, suppose you have batch data organized by batch ID, and each batch has different numbers of observations (e.g., some batches have 30 measurements, others have 29 or 31). STATISTICA ETL has enhanced features to pre-process batch-time data to: re-compute time variable into elapsed time values, and aggregate all batches into a fixed number of elapsed time intervals (normalize batch lengths). You can optionally unstack data to analyze time-dependent changes against static quality outcomes as with Multivariate Statistical Process Control (MSPC).

See also Example 3: Aligning Multiple Stages of Process Data.