Skip to contents

This function drops and merges all the replicated columns from the specified data frame.

Usage

removeReplicatedColumns(dataframe, prefix)

Arguments

dataframe

data.frame. The data frame from which to drop the replicated columns.

prefix

character (string). The prefix with which the name of the replicated columns starts.

Value

The specified data frame with an additional deduplicated column and all the types transformed to string.

Details

All the occurrences of "N/A", "NA", and empty strings (case insensitive) inside the provided data frame are replaced with NAs of type character. Then, all and only the columns starting with the specified prefix are selected and united into a single column with name ending per "_deduplicated". All empty entries in the new deduplicated column are replaced with NAs. Finally, the new column is bound with the other columns of the initial dataframe.

Examples

irisTest_ <- iris
irisTest_$Species_1 <- irisTest_$Species
irisTest_$Species_2 <- irisTest_$Species
irisTest_$Species <- NULL

deduplicatedDataframe_ <- removeReplicatedColumns(
  dataframe = irisTest_,
  prefix = "Species_")