Update the correspondence table between statistical classifications A and B when A has been updated to version A*.

updateCorrespondenceTable(
  A,
  B,
  AStar,
  AB,
  AAStar,
  CSVout = NULL,
  Reference = "none",
  MismatchToleranceB = 0.2,
  MismatchToleranceAStar = 0.2,
  Redundancy_trim = TRUE
)

Arguments

A

A string of the type character containing the name of a csv file that contains the original classification A.

B

A string of the type character containing the name of a csv file that contains classification B.

AStar

A string of the type character containing the name of a csv file that contains the updated version A*.

AB

A string of the type character containing the name of a csv file that contains the previous correspondence table A:B.

AAStar

A string of the type character containing the name of a csv file that contains the concordance table A:A*, which contains the mapping between the codes of the two versions of the classification.

CSVout

The preferred name for the output csv files that will contain the updated correspondence table and information about the classifications involved. The valid values are NULL or strings of type character. If the selected value is NULL, the default, no output file is produced. If the value is a string, then the output is exported into two csv files whose names contain the provided name (see "Value" below).

Reference

The reference classification among A and B. If a classification is the reference to the other, and hence hierarchically superior to it, each code of the other classification is expected to be mapped to at most one code of the reference classification. The valid values are "none", "A", and "B". If the selected value is "A" or "B", a "Review" flag column is included in the output (see "Explanation of the flags" below).

MismatchToleranceB

The maximum acceptable proportion of rows in the updated correspondence table which contain no code of the target classification B, among those which contain a code of A, of A*, or of both. The default value is 0.2. The valid values are real numbers in the interval [0, 1].

MismatchToleranceAStar

The maximum acceptable proportion of rows in the updated correspondence table which contain no code of the updated classification A*, among those which contain a code of A, of B, or of both. The default value is 0.2. The valid values are real numbers in the interval [0, 1].

Redundancy_trim

An argument used to facilitate the trimming of the redundant records. The valid logical values are TRUE or FALSE. The default value is TRUE, which removes all redundant records, replacing the values of Acode Alabel and Asupp with the value ‘Multiple’ (to indicate that multiple A records are involved). If the multiple A records are the same, their value will not be replaced. The other values is FALSE, which shows redundant records together with the redundancy flag.

Value

updateCorrespondenceTable() returns a list with two elements, both of which are data frames.

  • The first element is the updated correspondence table A*:B augmented with flags "CodeChange", "Review" (if applicable), "Redundancy", "NoMatchToAStar", "NoMatchToB", "NoMatchFromAStar", "NoMatchFromB", "LabelChange", and with all the additional columns of the A, B, AStar, AB and AAStar files.

  • The second element contains the names of the original classification A, the target classification B, and the updated version A*, as read from the top left-hand side cell of the respective input files.

  • If the value of argument CSVout is a string of type character, the elements of the list are exported into files of csv format. The name of the file for the first element is the value of argument CSVout and the name of the file for the second element is classificationNames_CSVout. For example, if CSVout = "updateCorrespondenceTable.csv", the elements of the list are exported into "updateCorrespondenceTable.csv" and "classificationNames_updateCorrespondenceTable.csv", respectively.

Details

File and file name requirements:

  • The files that correspond to arguments A, B, AStar, AB, AAStar must be in csv format with comma as delimiter. If full paths are not provided, then these files must be available in the working directory. No two filenames provided must be identical.

  • If any of the two files where the output will be stored is read protected (for instance because it is open elsewhere) an error message will be reported and execution will be halted.

Classification table requirements:

  • The files that correspond to arguments A, B and AStar must contain at least one column and at least two rows. The first column contains the codes of the respective classification. The first row contains column headers. The name of the first column is the name of the respective classification (e.g., "CN 2021").

  • The classification codes contained in a classification file (expected in its first column as mentioned above) must be unique. No two identical codes are allowed in the column.

  • If any of the files that correspond to arguments A, B and AStar has additional columns the first one of them is considered as containing the labels of the respective classification codes.

Correspondence and concordance table requirements:

  • The files that correspond to arguments AB and AAStar must contain at least two columns and at least two rows. The first column of the file that corresponds to AB contains the codes of classification A. The second column contains the codes of classification B. Similar requirements apply to the file that corresponds to AAStar. The first row of each of these files contains column headers. The names of the first two columns are the names of the respective classifications.

  • The pairs of classification codes contained in the concordance and the correspondence table files (expected in their first two columns as mentioned above) must be unique. No two identical pairs of codes are allowed in the first two columns.

Interdependency requirements:

  • At least one code of classification A must appear in both the file of concordance table A:A* and the file of correspondence table A:B.

  • At least one code of classification A* must appear in both the file of classification A* and the file of concordance table A:A*.

  • At least one code of classification B must appear in both the file of classification B and the file of correspondence table A:B.

Mismatch tolerance:

  • The ratio that is compared with MismatchToleranceB has as numerator the number of rows of the updated correspondence table which contain a code for A, for A*, or for both, but no code for B and as denominator the number of rows which contain a code for A, for A*, or for both (regardless of whether there is a code for B or not). If the ratio exceeds MismatchToleranceB the execution of the function is halted.

  • The ratio that is compared with MismatchToleranceAStar has as numerator the number of rows of the updated correspondence table which contain a code for A, for B, or for both, but no code for A* and as denominator the number of rows which contain a code for A, for B*, or for both (regardless of whether there is a code for A* or not). If the ratio exceeds MismatchToleranceAStar the execution of the function is halted.

Explanation of the flags

  • For each row of the updated correspondence table, the value of "CodeChange" is equal to 1 if the code of A (or A*) contained in this row maps -in this or any other row of the table- to a different code of A* (or A), otherwise the "CodeChange" is equal to 0. The value of "CodeChange" is empty if either the code of A, or the code of A*, or both are missing.

  • The "Review" flag is produced only if argument Reference has been set equal to "A" or "B". For each row of the updated correspondence table, if Reference = "A" the value of "Review" is equal to 1 if the code of B maps to more than one code of A*, and 0 otherwise. If Reference = "B" the value of "Review" is equal to 1 if the code of A* maps to more than one code of B, and 0 otherwise. The value of the flag is empty if either the code of A*, or the code of B, or both are missing.

  • For each row of the updated correspondence table, the value of "Redundancy" is equal to 1 if the row contains a combination of codes of A* and B that also appears in at least one other row of the updated correspondence table. The value of the flag is empty if both the code of A* and the code of B are missing.

  • When "Redundancy_Trim" is equal to FALSE the "Redundancy_keep" flag is created to identify with value 1 the records that will be kept if trimming is performed.

  • For each row of the updated correspondence table, the value of "NoMatchToAStar" is equal to 1 if there is a code for A, for B, or for both, but no code for A*. The value of the flag is 0 if there are codes for both A and A* (regardless of whether there is a code for B or not). Finally, the value of "NoMatchToAStar" is empty if neither A nor B have a code in this row.

  • For each row of the updated correspondence table, the value of "NoMatchToB" is equal to 1 if there is a code for A, for A*, or for both, but no code for B. The value of the flag is 0 if there are codes for both A and B (regardless of whether there is a code for A* or not). Finally, the value of "NoMatchToB" is empty if neither A nor A* have a code in this row.

  • For each row of the updated correspondence table, the value of "NoMatchFromAStar" is equal to 1 if the row contains a code of A* that appears in the table of classification A* but not in the concordance table A:A*. The value of the flag is 0 if the row contains a code of A* that appears in both the table of classification A* and the concordance table A:A*. Finally, the value of the flag is empty if the row contains no code of A* or if it contains a code of A* that appears in the concordance table A:A* but not in the table of classification A*.

  • For each row of the updated correspondence table, the value of "NoMatchFromB" is equal to 1 if the row contains a code of B that appears in the table of classification B but not in the correspondence table A:B. The value of the flag is 0 if the row contains a code of B that appears in both the table of classification B and the correspondence table A:B. Finally, the value of the flag is empty if the row contains no code of B or if it contains a code of B that appears in the correspondence table A:B but not in the table of classification B.

  • For each row of the updated correspondence table, the value of "LabelChange" is equal to 1 if the labels of the codes of A and A* are different, and 0 if they are the same. Finally, the value of "LabelChange" is empty if either of the labels, or both labels, are missing. Lower and upper case are considered the same, and punctuation characters are ignored when comparing code labels.

  • The argument "Redundancy_trim" is used to delete all the redundancies which are mapping correctly. If the analysis concludes that the A*code / Bcode mapping is correct for all cases involving redundancies, then an action is needed to remove the redundancies. If the selected value is TRUE, all redundant records are removed and kept only one record for each unique combination. For this record retained, the Acodes, the Alabel and the Asupp information is replaced with ‘multiple’. If the multiple A records are the same, their value will not be replaced. If the selected value is FALSE, no trimming is executed so redundant records are shown, together with the redundancy flag.

Sample datasets included in the package

Running browseVignettes("correspondenceTables") in the console opens an html page in the user's default browser. Selecting HTML from the menu, users can read information about the use of the sample datasets that are included in the package. If they wish to access the csv files with the sample data, users have two options:

  • Option 1: Unpack into any folder of their choice the tar.gz file into which the package has arrived. All sample datasets may be found in the "inst/extdata" subfolder of this folder.

  • Option 2: Go to the "extdata" subfolder of the folder in which the package has been installed in their PC's R library. All sample datasets may be found there.

Examples

 {
 ## Application of function updateCorrespondenceTable() with NAICS 2017 being the
 ## original classification A, NACE being the target classification B, NAICS 2022
 ## being the updated version A*, NAICS 2017:NACE being the previous correspondence
 ## table A:B, and NAICS 2017:NAICS 2022 being the A:A* concordance table. The desired
 ## name for the csv file that will contain the updated correspondence table is
 ## "updateCorrespondenceTable.csv", there is no reference classification, and the
 ## maximum acceptable proportions of unmatched codes between the original
 ## classification A and the target classification B, and between the original
 ## classification A and the updated classification A* are 0.5 and 0.3, respectively.

 tmp_dir<-tempdir()
 A <- system.file("extdata", "NAICS2017.csv", package = "correspondenceTables")
 AStar <- system.file("extdata", "NAICS2022.csv", package = "correspondenceTables")
 B <- system.file("extdata", "NACE.csv", package = "correspondenceTables")
 AB <- system.file("extdata", "NAICS2017_NACE.csv", package = "correspondenceTables")
 AAStar <- system.file("extdata", "NAICS2017_NAICS2022.csv", package = "correspondenceTables")

 UPC <- updateCorrespondenceTable(A,
                                  B,
                                  AStar,
                                  AB,
                                  AAStar,
                                  file.path(tmp_dir,"updateCorrespondenceTable.csv"),
                                  "none",
                                  0.5,
                                  0.3,
                                  FALSE)

 summary(UPC)
 head(UPC$updateCorrespondenceTable)
 UPC$classificationNames
 csv_files<-list.files(tmp_dir, pattern = ".csv")
 if (length(csv_files)>0) unlink(csv_files)
    }