Some old text files have header rows for column labels that span multiple rows. In this case, we want to preserve those names but combine the labels into a single column name.

e_read_df_header_span_rows(
  dat_this = NULL,
  skip = 0,
  row_header_span = 1,
  row_header_span_collapse = "_"
)

Arguments

dat_this

data.frame with all text columns

skip

number of rows to skip that are not part of header rows

row_header_span

number of rows that comprise the header column names

row_header_span_collapse

character to separate each row of the header into the single column name

Value

dat_this data.frame with updated columns names

Details

  • When reading data from text, keep values as "text"

    • utils::read.table(..., , stringsAsFactors = FALSE)

  • When reading data from Excel, keep values as "text" and do not fix duplicate names

    • readxl::read_xlsx(..., col_types = "text", .name_repair = "minimal" )

Examples

# data should be text
dat_this <-
  read.csv(
    text = "
X,X,Z
a1,b1,c1
a2,b2,
a3,,
1,2,3
"
  , stringsAsFactors = FALSE
  )
dat_this %>% print()
#>    X X.1  Z
#> 1 a1  b1 c1
#> 2 a2  b2   
#> 3 a3       
#> 4  1   2  3

# return dataset as it is
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 1
  )
#> erikmisc::e_read_df_header_span_rows, returning data as is
#>    X X.1  Z
#> 1 a1  b1 c1
#> 2 a2  b2   
#> 3 a3       
#> 4  1   2  3
# no header row (first row is data), adverse affect when two values are the same
#   and utils::read.table adds suffix of ".1", etc., to value
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 0
  )
#> erikmisc::e_read_df_header_span_rows, possible adverse issues if data on first row has same values (if utils::read.table added ".1" suffix)
#>   V1  V2 V3
#> 1  X X.1  Z
#> 2 a1  b1 c1
#> 3 a2  b2   
#> 4 a3       
#> 5  1   2  3
# skip first row
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 1
  , row_header_span = 1
  )
#>   a1 b1 c1
#> 3 a2 b2   
#> 4 a3      
#> 5  1  2  3
# skip first row, combine first three rows into a column header, collapse with underscore
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 1
  , row_header_span = 3
  , row_header_span_collapse = "_"
  )
#>   a1_a2_a3 b1_b2 c1
#> 5        1     2  3
# First row had multiple of same value, so ".1", ..., were appended;
#   so first remove ".1", then join header rows together
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 4
  , row_header_span_collapse = "_"
  )
#>   X_a1_a2_a3 X_b1_b2 Z_c1
#> 5          1       2    3
# First row is data, so header is row 1 and add new column names
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 0
  )
#> erikmisc::e_read_df_header_span_rows, possible adverse issues if data on first row has same values (if utils::read.table added ".1" suffix)
#>   V1  V2 V3
#> 1  X X.1  Z
#> 2 a1  b1 c1
#> 3 a2  b2   
#> 4 a3       
#> 5  1   2  3
# Skip 3 and rirst row is data, so add new column names
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 3
  , row_header_span = 0
  )
#>   V1 V2 V3
#> 4 a3      
#> 5  1  2  3