Skip to contents

This function implements a Slowly Changing Dimension Type 2 to merge new and current data while maintaining historical records. The function deactivates the old records and activates new ones, ensuring a history-preserving update strategy. Only the changing records are marked as not active and replaced by new active ones.

Usage

SCD2(newData, currentData, key = names(newData))

Arguments

newData

data.frame. The data frame containing new records.

currentData

data.frame. The data frame containing existing records.

key

character (vector). The columns to be used as key.

Value

A combined data frame with old data marked as not active and new data marked as active.

Details

The function:

  1. Separates active and inactive records from the current data.

  2. Gets the old records that are still present in the new data (i.e., the ones that can remain active).

  3. Gets the records present in new data but not present in still active current data (i.e., the records to activate) and activates them.

  4. Gets the current active records that are not present in the new data (i.e., the records to deactivate) and deactivates them.

Examples

currentData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC, ~IS_ACTIVE, ~START_DATE, ~END_DATE,
  1, "a1", "b1", "c1", TRUE, Sys.time(), as.Date(NA),
  2, "a2", "b2", "c2", TRUE, Sys.time(), as.Date(NA),
  3, "a3", "b3", "c3", TRUE, Sys.time(), as.Date(NA))

newData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC,
  1, "a1", "b1", "c1",
  2, "a2", "b2", "c20",
  3, "a4", "b4", "c4")

mergedData <- SCD2(newData = newData_, currentData = currentData_)