This function implements a Slowly Changing Dimension Type 2 to merge new and current data while maintaining historical records. The function deactivates the old records and activates new ones, ensuring a history-preserving update strategy. Only the changing records are marked as not active and replaced by new active ones.
Usage
SCD2(newData, currentData, key = names(newData))Details
The function:
Separates active and inactive records from the current data.
Gets the old records that are still present in the new data (i.e., the ones that can remain active).
Gets the records present in new data but not present in still active current data (i.e., the records to activate) and activates them.
Gets the current active records that are not present in the new data (i.e., the records to deactivate) and deactivates them.
Examples
currentData_ <- tibble::tribble(
~id, ~colA, ~colB, ~colC, ~IS_ACTIVE, ~START_DATE, ~END_DATE,
1, "a1", "b1", "c1", TRUE, Sys.time(), as.Date(NA),
2, "a2", "b2", "c2", TRUE, Sys.time(), as.Date(NA),
3, "a3", "b3", "c3", TRUE, Sys.time(), as.Date(NA))
newData_ <- tibble::tribble(
~id, ~colA, ~colB, ~colC,
1, "a1", "b1", "c1",
2, "a2", "b2", "c20",
3, "a4", "b4", "c4")
mergedData <- SCD2(newData = newData_, currentData = currentData_)
