Data Normalization

Data normalization corrects the data for a given report table to match data types specified by the standard data model. Supported data types are:

  • int (also downcasts int64)

  • float (also downcasts float64)

  • str

  • date

  • datetime

  • time

  • component

    • Converts component name string to standard component name using component_mapping_df

  • enumeration

    • Converts enumeration value string to standard enumeration value using enumeration_mapping_df

Warning

There are multiple versions of normalization that should be reconciled into one for consistency. Could also be combined with data type validation to ensure that the validation checks are consistent with the normalization process.

Warning

One current implementation (try_normalize_df_data_types in fdMapping) converts values if it can, but leaves the original values if it can’t. Another function (normalize_table_data in fdMapping) uses errors='coerce' to force values to the correct type or NaN. The stricter version was to support directly loading data to the api which would fail on invalid type. Now that we’ve switched to a file output, SME review, and loading as separate steps, it should probably do the try but not force approach.

Normalized data is stored in table_data_normalized and header_data_normalized dataframes.