Update the correspondence table between two classifications when one of them has been updated.
updateCorrespondenceTable(
A,
B,
AStar,
AB,
AAStar,
Reference = "none",
MismatchToleranceB = 0.2,
MismatchToleranceAStar = 0.2,
Redundancy_trim = TRUE
)A data frame containing the original classification A. The first column holds the codes; optional subsequent columns may hold labels and supplementary info.
A data frame containing the target classification B. Same structure expectations as A.
A data frame containing the updated version A*. Same structure expectations as A.
A data frame containing the previous correspondence table A:B (at least two columns: A-codes, B-codes).
A data frame containing the concordance table A:A* (at least two columns: A-codes, A*-codes).
The reference classification among A and B. Valid values: "none", "A", "B".
If "A" or "B", a "Review" flag column is included (see "Explanation of the flags").
Maximum acceptable proportion of rows in the updated table with no code of B
among those that have a code for A and/or A*. Default 0.2, range [0,1].
Maximum acceptable proportion of rows in the updated table with no code of A*
among those that have a code for A and/or B. Default 0.2, range [0,1].
Logical. If TRUE (default) trims correct redundant records by collapsing A-side fields to "Multiple"
when appropriate; if FALSE, keeps redundancies and adds a Redundancy_keep indicator.
updateCorrespondenceTable() returns a list with two data frames:
updateCorrespondenceTable: the updated correspondence A*:B with flags "CodeChange", "Review" (if applicable),
"Redundancy", "NoMatchToAStar", "NoMatchToB", "NoMatchFromAStar", "NoMatchFromB", "LabelChange",
plus any additional columns coming from A, B, AStar, AB, AAStar.
classificationNames: the names of the classifications (A, B, A*) read from the first column headers.
*Input data frame requirements*
A, B, AStar: at least 1 column and at least 1 row. The first column contains the codes; the first row is the header.
AB, AAStar: at least 2 columns and at least 1 row; the first two columns contain code pairs; the first row is the header.
Codes in the first column of A, B, AStar must be unique (no duplicates).
Pairs (first two columns) in AB and AAStar must be unique (no duplicate pairs).
If additional columns are present in classification or correspondence tables, the second column is treated as a label.
All columns are coerced to character.
*Minimum interdependencies*
At least one code of A must appear in both AAStar and AB.
At least one code of A* must appear in both AStar and AAStar.
At least one code of B must appear in both B and AB.
*Mismatch tolerance*
If the share of rows with NoMatchToAStar == 1 exceeds MismatchToleranceAStar, the function stops with an error.
If the share of rows with NoMatchToB == 1 exceeds MismatchToleranceB, the function stops with an error.
For each row of the updated correspondence table, the value of "CodeChange" is equal to 1 if the code of A (or A*)
contained in this row maps -in this or any other row of the table- to a different code of A* (or A), otherwise the
"CodeChange" is equal to 0. The value of "CodeChange" is empty if either the code of A, or the code of A*, or both are missing.
The "Review" flag is produced only if argument Reference has been set equal to "A" or "B".
For each row of the updated correspondence table, if Reference = "A" the value of "Review" is equal to
1 if the code of B maps to more than one code of A*, and 0 otherwise. If Reference = "B" the
value of "Review" is equal to 1 if the code of A* maps to more than one code of B, and 0 otherwise. The value
of the flag is empty if either the code of A*, or the code of B, or both are missing.
For each row of the updated correspondence table, the value of "Redundancy" is equal to 1 if the row contains
a combination of codes of A* and B that also appears in at least one other row of the updated correspondence table. The
value of the flag is empty if both the code of A* and the code of B are missing.
When "Redundancy_Trim" is equal to FALSE the "Redundancy_keep" flag is created to identify with value 1
the records that will be kept if trimming is performed.
For each row of the updated correspondence table, the value of "NoMatchToAStar" is equal to 1 if there is a
code for A, for B, or for both, but no code for A*. The value of the flag is 0 if there are codes for both A and
A* (regardless of whether there is a code for B or not). Finally, the value of "NoMatchToAStar" is empty if neither A nor B
have a code in this row.
For each row of the updated correspondence table, the value of "NoMatchToB" is equal to 1 if there is a code
for A, for A*, or for both, but no code for B. The value of the flag is 0 if there are codes for both A and B
(regardless of whether there is a code for A* or not). Finally, the value of "NoMatchToB" is empty if neither A nor
A* have a code in this row.
For each row of the updated correspondence table, the value of "NoMatchFromAStar" is equal to 1 if the row
contains a code of A* that appears in the table of classification A* but not in the concordance table A:A*. The value of
the flag is 0 if the row contains a code of A* that appears in both the table of classification
A* and the concordance table A:A*. Finally, the value of the flag is empty if the row contains no code of A* or if it
contains a code of A* that appears in the concordance table A:A* but not in the table of classification A*.
For each row of the updated correspondence table, the value of "NoMatchFromB" is equal to 1 if the row
contains a code of B that appears in the table of classification B but not in the correspondence table A:B. The value of
the flag is 0 if the row contains a code of B that appears in both the table of classification B and the
correspondence table A:B. Finally, the value of the flag is empty if the row contains no code of B or if it contains a code
of B that appears in the correspondence table A:B but not in the table of classification B.
For each row of the updated correspondence table, the value of "LabelChange" is equal to 1 if the labels of
the codes of A and A* are different, and 0 if they are the same. Finally, the value of "LabelChange" is empty if
either of the labels, or both labels, are missing. Lower and upper case are considered the same, and punctuation characters
are ignored when comparing code labels.
The argument "Redundancy_trim" is used to delete all the redundancies which are mapping correctly. If the analysis
concludes that the A*code / Bcode mapping is correct for all cases involving redundancies, then an action is needed to remove
the redundancies. If the selected value is TRUE, all redundant records are removed and kept only one record for each unique
combination. For this record retained, the Acodes, the Alabel and the Asupp information is replaced with ‘multiple’. If the multiple
A records are the same, their value will not be replaced. If the selected value is FALSE, no trimming is executed so redundant
records are shown, together with the redundancy flag.
if (FALSE) { # \dontrun{
# Read CSVs outside, pass data frames in:
A_df <- utils::read.csv(system.file("extdata/test", "NAICS2017.csv",
package = "correspondenceTables"),
sep = ",", header = TRUE, check.names = FALSE,
colClasses = "character", encoding = "UTF-8")
AStar_df <- utils::read.csv(system.file("extdata/test", "NAICS2022.csv",
package = "correspondenceTables"),
sep = ",", header = TRUE, check.names = FALSE,
colClasses = "character", encoding = "UTF-8")
B_df <- utils::read.csv(system.file("extdata/test", "NACE.csv",
package = "correspondenceTables"),
sep = ",", header = TRUE, check.names = FALSE,
colClasses = "character", encoding = "UTF-8")
AB_df <- utils::read.csv(system.file("extdata/test", "NAICS2017_NACE.csv",
package = "correspondenceTables"),
sep = ",", header = TRUE, check.names = FALSE,
colClasses = "character", encoding = "UTF-8")
AAStar_df <- utils::read.csv(system.file("extdata/test", "NAICS2017_NAICS2022.csv",
package = "correspondenceTables"),
sep = ",", header = TRUE, check.names = FALSE,
colClasses = "character", encoding = "UTF-8")
UPC <- updateCorrespondenceTable(
A = A_df, B = B_df, AStar = AStar_df, AB = AB_df, AAStar = AAStar_df,
Reference = "none", MismatchToleranceB = 0.5, MismatchToleranceAStar = 0.3,
Redundancy_trim = FALSE
)
summary(UPC)
head(UPC$updateCorrespondenceTable)
UPC$classificationNames
} # }