Column Mapping ================= Column mapping is the process of mapping column names in report tables to fields in the standard data model. This is done automatically during table prediction (either on file import or via user input) and can be done in the `Map Columns` view of the `Process File` page. .. note:: The steps leading up to column mapping provide a curated data table (``selected_table_obj.table_data_edited``) that contains only the desired rows and columns for the selcted table type. Object Model ---------------- Internally, the mapping process uses the ``TableColumnMapping`` object, with the following fields: .. list-table:: TableColumnMapping :widths: 15 40 15 :header-rows: 1 * - **Field** - **Description** - **Shown as** * - ``original_column`` - The column name in the original curated data set. - not shown * - ``edited_column`` - An edited version of the original column name. User can rename or correct the original column name for better clarity, though this will only be saved for the current table. The ``edited_column`` name is used only for display - the mapping process uses the ``original_column`` name. .. warning:: An enhancement would be to remember the mapping between edited and original column names as part of term mapping, along with automatic updates. - `Report Column` * - ``predicted_column`` - The ``predicted_column`` is what the application thinks is the standard data model field that the ``original_column`` maps to (if any). This is based on previously saved mappings in the ``uom_mapping`` configuration cache (from `pvt_uom_mapping.csv`). Column mapping is done on per-table type basis and is not associated with a specific template. For example, once `Pressure` is mapped to `stepPressure` for CCE, it will always be predicted as `stepPressure` for subsequent CCE tables, unless the user manually specifies otherwise. In that case, the previous mapping is marked as rejected and the new mapping will be used for future predictions. **This allows the app to learn from user actions.** A count is maintained for each unique column map that is used. Column mapping is stored in `pvt_column_mapping.csv` and can be viewed/modified in the `Manage Configuration` page. .. note:: The rejection approach ensures that only the latest mapping of a column is used for future predictions. It does not affect previously mapped tables unless they are re-processed. .. warning:: Automatically rejecting previous mappings might be too aggressive in some cases - may want to consider an opt-out feature for users who want to keep previous mappings. .. warning:: There may be unleveraged value in associating column mapping with specific templates as an optional feature (to handle special cases) .. warning:: If there is no previous mapping for this field, an enhancement would be to infer it from text similarity, looking for patterns in the data, etc. - Shown as ✨ beside the `Standard Column` field if the mapped column matches the predicted column. Also shown as ✨ beside the predicted field in the standard column dropdown list. * - ``mapped_column`` - The final mapping to the standard data model field, either made automatically or selected from a dropdown list. - `Standard Column` * - ``original_uom`` - Unit of measure extracted from the original column name, if applicable. - `Report UOM` * - ``predicted_uom`` - What the application thinks is the standard UOM for the column based on (in order): 1. Previous UOM mappings between report and standard UOMs, saved in `pvt_uom_mapping.csv` (excludes rejected mappings). 2. Previous column mappings, which also save the selected UOM. 3. Default UOM for the column. - Shown as ✨ beside the `Standard UOM` field if the mapped UOM matches the predicted UOM. Also shown as ✨ beside the predicted UOM in the standard UOM dropdown list. * - ``mapped_uom`` - The standard model UOM, either set automatically to the predicted UOM or selected from a dropdown list. - `Standard UOM` * - ``std_uom`` - Not used - NA * - ``has_uom`` - **Bool**: Should the column have an associated UOM? Populated from the ``uoms`` collection in the configuration cache. - Controls whether UOM fields are shown. * - ``uom_dimension`` - Dimension-associated UOM, populated from the ``table_columns`` collection in the configuration cache. - Controls what UOMs are displayed in the Standard UOM dropdown list. * - ``default_uom`` - Default UOM to use if none are specified/predicted. Populated from the ``table_columns`` collection in the configuration cache. - NA The mapping object is stored as part of the report backup file whenever the report is saved, so mapping always defaults to previously mapped values for that report table. Map Columns View ---------------- The `Map Columns` view allows the user to map the data table to the standard fields of the selected table type. Automatic column mapping is performed when the view is called up to default the mapping to predicted values, unless the mapping has been previously saved (alongside the saving of the report). This gives the user default predicted values as a starting point, which can be modified as needed. Table type can be changed at any time, which will update the mapping options. .. note:: Changing the table type will **not** trigger any test-specific rules to reshape the data table. If this is required, the `Predict` button resets the table and reruns the template matching logic. .. image:: ../../../../../media/column_mapping_eg.png :alt: Column Mapping Example :align: center Above is an example of the `Map Columns` view, as presented to the user. As the user changes the column mapping, the data table widget updates with the new standard column names, prefixed with ✨. They can edit/correct values in the table by selecting `show form` and making changes there. The form also includes all possible table fields for the test, in case the user wants to add data that was not available in the report import. As seen in the example image, there are four columns beneath the dataframe: * **Report Column**: The column name from the curated data table. * **Report UOM**: A unit of measure extracted from the report column name, if applicable. * **Standard Column**: A dropdown list of possible fields in the standard data model. Will be automatically populated if there is a predicted column mapping. * **Standard UOM**: A dropdown list of possible UOMs, based on the UOM dimension of the column. Will be automatically populated if there is a predicted UOM mapping. The user has the option to save the column mapping here, or wait and allow it to be saved automatically when the report table is saved. This page's saving process is as follows: 1. Saves the mapping between the ``original_column`` and ``mapped_column``. 2. Marks previous mapping for the ``original_column`` as rejected. 3. Saves the mapping between the ``original_uom`` and ``mapped_uom``. 4. Updates the mapping count for the column mapping (how many times this pair of ``original_column`` and ``mapped_column`` has been saved). 5. Marks previous mapping for the ``original_uom`` as rejected. 6. Column and associated UOM mapping is updated in `pvt_column_mapping.csv`. 7. UOM mapping is updated in `pvt_uom_mapping.csv`.