Combine multiple header rows into a column name for a text data frame

Some old text files have header rows for column labels that span multiple rows. In this case, we want to preserve those names but combine the labels into a single column name.

e_read_df_header_span_rows(
  dat_this = NULL,
  skip = 0,
  row_header_span = 1,
  row_header_span_collapse = "_"
)

Arguments

dat_this: data.frame with all text columns
skip: number of rows to skip that are not part of header rows
row_header_span: number of rows that comprise the header column names
row_header_span_collapse: character to separate each row of the header into the single column name

Value

dat_this data.frame with updated columns names

Details

When reading data from text, keep values as "text"
- utils::read.table(..., , stringsAsFactors = FALSE)
When reading data from Excel, keep values as "text" and do not fix duplicate names
- readxl::read_xlsx(..., col_types = "text", .name_repair = "minimal" )

Examples

# data should be text
dat_this <-
  read.csv(
    text = "
X,X,Z
a1,b1,c1
a2,b2,
a3,,
1,2,3
"
  , stringsAsFactors = FALSE
  )
dat_this %>% print()
#>    X X.1  Z
#> 1 a1  b1 c1
#> 2 a2  b2   
#> 3 a3       
#> 4  1   2  3

# return dataset as it is
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 1
  )
#> erikmisc::e_read_df_header_span_rows, returning data as is
#>    X X.1  Z
#> 1 a1  b1 c1
#> 2 a2  b2   
#> 3 a3       
#> 4  1   2  3
# no header row (first row is data), adverse affect when two values are the same
#   and utils::read.table adds suffix of ".1", etc., to value
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 0
  )
#> erikmisc::e_read_df_header_span_rows, possible adverse issues if data on first row has same values (if utils::read.table added ".1" suffix)
#>   V1  V2 V3
#> 1  X X.1  Z
#> 2 a1  b1 c1
#> 3 a2  b2   
#> 4 a3       
#> 5  1   2  3
# skip first row
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 1
  , row_header_span = 1
  )
#>   a1 b1 c1
#> 3 a2 b2   
#> 4 a3      
#> 5  1  2  3
# skip first row, combine first three rows into a column header, collapse with underscore
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 1
  , row_header_span = 3
  , row_header_span_collapse = "_"
  )
#>   a1_a2_a3 b1_b2 c1
#> 5        1     2  3
# First row had multiple of same value, so ".1", ..., were appended;
#   so first remove ".1", then join header rows together
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 4
  , row_header_span_collapse = "_"
  )
#>   X_a1_a2_a3 X_b1_b2 Z_c1
#> 5          1       2    3
# First row is data, so header is row 1 and add new column names
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 0
  )
#> erikmisc::e_read_df_header_span_rows, possible adverse issues if data on first row has same values (if utils::read.table added ".1" suffix)
#>   V1  V2 V3
#> 1  X X.1  Z
#> 2 a1  b1 c1
#> 3 a2  b2   
#> 4 a3       
#> 5  1   2  3
# Skip 3 and rirst row is data, so add new column names
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 3
  , row_header_span = 0
  )
#>   V1 V2 V3
#> 4 a3      
#> 5  1  2  3