R/updateCorrespondenceTable.R
updateCorrespondenceTable.Rd
Update the correspondence table between statistical classifications A and B when A has been updated to version A*.
updateCorrespondenceTable(
A,
B,
AStar,
AB,
AAStar,
CSVout = NULL,
Reference = "none",
MismatchToleranceB = 0.2,
MismatchToleranceAStar = 0.2,
Redundancy_trim = TRUE
)
A string of the type character
containing the name of a csv file that contains the original classification A.
A string of the type character
containing the name of a csv file that contains classification B.
A string of the type character
containing the name of a csv file that contains the updated version A*.
A string of the type character
containing the name of a csv file that contains the previous correspondence table A:B.
A string of the type character containing the name of a csv file that contains the concordance table A:A*, which contains the mapping between the codes of the two versions of the classification.
The preferred name for the output csv files that will contain the updated correspondence table and
information about the classifications involved. The valid values are NULL
or strings of type character
. If
the selected value is NULL
, the default, no output file is produced. If the value is a string, then the output is
exported into two csv files whose names contain the provided name (see "Value" below).
The reference classification among A and B. If a classification is the reference to the other, and
hence hierarchically superior to it, each code of the other classification is expected to be mapped to at most one
code of the reference classification. The valid values are "none"
, "A"
, and "B"
. If the selected
value is "A"
or "B"
, a "Review" flag column is included in the output (see "Explanation of the flags" below).
The maximum acceptable proportion of rows in the updated correspondence table which contain no
code of the target classification B, among those which contain a code of A, of A*, or of both. The default value
is 0.2
. The valid values are real numbers in the interval [0, 1].
The maximum acceptable proportion of rows in the updated correspondence table which contain
no code of the updated classification A*, among those which contain a code of A, of B, or of both. The default value
is 0.2
. The valid values are real numbers in the interval [0, 1].
An argument used to facilitate the trimming of the redundant records. The valid logical values are TRUE
or FALSE
.
The default value is TRUE
, which removes all redundant records, replacing the values of Acode Alabel and Asupp with the value ‘Multiple’ (to indicate that multiple A records are involved). If the multiple A records are the same, their value will not be replaced.
The other values is FALSE
, which shows redundant records together with the redundancy flag.
updateCorrespondenceTable()
returns a list with two elements, both of which are data frames.
The first element is the updated correspondence table A*:B augmented with flags "CodeChange", "Review" (if
applicable), "Redundancy", "NoMatchToAStar", "NoMatchToB", "NoMatchFromAStar", "NoMatchFromB", "LabelChange", and
with all the additional columns of the A
, B
, AStar
, AB
and AAStar
files.
The second element contains the names of the original classification A, the target classification B, and the updated version A*, as read from the top left-hand side cell of the respective input files.
If the value of argument CSVout
is a string of type character
, the elements of the list are
exported into files of csv format. The name of the file for the first element is the value of argument CSVout
and the name of the file for the second element is classificationNames_CSVout
. For example, if
CSVout
= "updateCorrespondenceTable.csv", the elements of the list are exported into
"updateCorrespondenceTable.csv" and "classificationNames_updateCorrespondenceTable.csv", respectively.
File and file name requirements:
The files that correspond to arguments A
, B
, AStar
, AB
, AAStar
must be
in csv format with comma as delimiter. If full paths are not provided, then these files must be available
in the working directory. No two filenames provided must be identical.
If any of the two files where the output will be stored is read protected (for instance because it is open elsewhere) an error message will be reported and execution will be halted.
Classification table requirements:
The files that correspond to arguments A
, B
and AStar
must contain at least one column
and at least two rows. The first column contains the codes of the respective classification. The first row contains
column headers. The name of the first column is the name of the respective classification (e.g., "CN 2021").
The classification codes contained in a classification file (expected in its first column as mentioned above) must be unique. No two identical codes are allowed in the column.
If any of the files that correspond to arguments A
, B
and AStar
has additional columns
the first one of them is considered as containing the labels of the respective classification codes.
Correspondence and concordance table requirements:
The files that correspond to arguments AB
and AAStar
must contain at least two columns and at least
two rows. The first column of the file that corresponds to AB
contains the codes of classification A. The second
column contains the codes of classification B. Similar requirements apply to the file that corresponds to AAStar
.
The first row of each of these files contains column headers. The names of the first two columns are the names of the
respective classifications.
The pairs of classification codes contained in the concordance and the correspondence table files (expected in their first two columns as mentioned above) must be unique. No two identical pairs of codes are allowed in the first two columns.
Interdependency requirements:
At least one code of classification A must appear in both the file of concordance table A:A* and the file of correspondence table A:B.
At least one code of classification A* must appear in both the file of classification A* and the file of concordance table A:A*.
At least one code of classification B must appear in both the file of classification B and the file of correspondence table A:B.
Mismatch tolerance:
The ratio that is compared with MismatchToleranceB
has as numerator the number of rows of the updated
correspondence table which contain a code for A, for A*, or for both, but no code for B and as denominator the number of
rows which contain a code for A, for A*, or for both (regardless of whether there is a code for B or not). If the ratio
exceeds MismatchToleranceB
the execution of the function is halted.
The ratio that is compared with MismatchToleranceAStar
has as numerator the number of rows of the updated
correspondence table which contain a code for A, for B, or for both, but no code for A* and as denominator the number of
rows which contain a code for A, for B*, or for both (regardless of whether there is a code
for A* or not). If the ratio exceeds MismatchToleranceAStar
the execution of the function is halted.
For each row of the updated correspondence table, the value of "CodeChange" is equal to 1
if the code of A (or A*)
contained in this row maps -in this or any other row of the table- to a different code of A* (or A), otherwise the
"CodeChange" is equal to 0
. The value of "CodeChange" is empty if either the code of A, or the code of A*, or both are missing.
The "Review" flag is produced only if argument Reference
has been set equal to "A
" or "B
".
For each row of the updated correspondence table, if Reference
= "A
" the value of "Review" is equal to
1
if the code of B maps to more than one code of A*, and 0
otherwise. If Reference
= "B
" the
value of "Review" is equal to 1
if the code of A* maps to more than one code of B, and 0
otherwise. The value
of the flag is empty if either the code of A*, or the code of B, or both are missing.
For each row of the updated correspondence table, the value of "Redundancy" is equal to 1
if the row contains
a combination of codes of A* and B that also appears in at least one other row of the updated correspondence table. The
value of the flag is empty if both the code of A* and the code of B are missing.
When "Redundancy_Trim" is equal to FALSE
the "Redundancy_keep" flag is created to identify with value 1
the records that will be kept if trimming is performed.
For each row of the updated correspondence table, the value of "NoMatchToAStar" is equal to 1
if there is a
code for A, for B, or for both, but no code for A*. The value of the flag is 0
if there are codes for both A and
A* (regardless of whether there is a code for B or not). Finally, the value of "NoMatchToAStar" is empty if neither A nor B
have a code in this row.
For each row of the updated correspondence table, the value of "NoMatchToB" is equal to 1
if there is a code
for A, for A*, or for both, but no code for B. The value of the flag is 0
if there are codes for both A and B
(regardless of whether there is a code for A* or not). Finally, the value of "NoMatchToB" is empty if neither A nor
A* have a code in this row.
For each row of the updated correspondence table, the value of "NoMatchFromAStar" is equal to 1
if the row
contains a code of A* that appears in the table of classification A* but not in the concordance table A:A*. The value of
the flag is 0
if the row contains a code of A* that appears in both the table of classification
A* and the concordance table A:A*. Finally, the value of the flag is empty if the row contains no code of A* or if it
contains a code of A* that appears in the concordance table A:A* but not in the table of classification A*.
For each row of the updated correspondence table, the value of "NoMatchFromB" is equal to 1
if the row
contains a code of B that appears in the table of classification B but not in the correspondence table A:B. The value of
the flag is 0
if the row contains a code of B that appears in both the table of classification B and the
correspondence table A:B. Finally, the value of the flag is empty if the row contains no code of B or if it contains a code
of B that appears in the correspondence table A:B but not in the table of classification B.
For each row of the updated correspondence table, the value of "LabelChange" is equal to 1
if the labels of
the codes of A and A* are different, and 0
if they are the same. Finally, the value of "LabelChange" is empty if
either of the labels, or both labels, are missing. Lower and upper case are considered the same, and punctuation characters
are ignored when comparing code labels.
The argument "Redundancy_trim" is used to delete all the redundancies which are mapping correctly. If the analysis
concludes that the A*code / Bcode mapping is correct for all cases involving redundancies, then an action is needed to remove
the redundancies. If the selected value is TRUE
, all redundant records are removed and kept only one record for each unique
combination. For this record retained, the Acodes, the Alabel and the Asupp information is replaced with ‘multiple’. If the multiple
A records are the same, their value will not be replaced. If the selected value is FALSE
, no trimming is executed so redundant
records are shown, together with the redundancy flag.
Running browseVignettes("correspondenceTables")
in the console opens an html page in the user's default browser.
Selecting HTML from the menu, users can read information about the use of the sample datasets that are included in the
package.
If they wish to access the csv files with the sample data, users have two options:
Option 1: Unpack into any folder of their choice the tar.gz file into which the package has arrived. All sample datasets may be found in the "inst/extdata" subfolder of this folder.
Option 2: Go to the "extdata" subfolder of the folder in which the package has been installed in their PC's R
library. All sample datasets may be found there.
{
## Application of function updateCorrespondenceTable() with NAICS 2017 being the
## original classification A, NACE being the target classification B, NAICS 2022
## being the updated version A*, NAICS 2017:NACE being the previous correspondence
## table A:B, and NAICS 2017:NAICS 2022 being the A:A* concordance table. The desired
## name for the csv file that will contain the updated correspondence table is
## "updateCorrespondenceTable.csv", there is no reference classification, and the
## maximum acceptable proportions of unmatched codes between the original
## classification A and the target classification B, and between the original
## classification A and the updated classification A* are 0.5 and 0.3, respectively.
tmp_dir<-tempdir()
A <- system.file("extdata", "NAICS2017.csv", package = "correspondenceTables")
AStar <- system.file("extdata", "NAICS2022.csv", package = "correspondenceTables")
B <- system.file("extdata", "NACE.csv", package = "correspondenceTables")
AB <- system.file("extdata", "NAICS2017_NACE.csv", package = "correspondenceTables")
AAStar <- system.file("extdata", "NAICS2017_NAICS2022.csv", package = "correspondenceTables")
UPC <- updateCorrespondenceTable(A,
B,
AStar,
AB,
AAStar,
file.path(tmp_dir,"updateCorrespondenceTable.csv"),
"none",
0.5,
0.3,
FALSE)
summary(UPC)
head(UPC$updateCorrespondenceTable)
UPC$classificationNames
csv_files<-list.files(tmp_dir, pattern = ".csv")
if (length(csv_files)>0) unlink(csv_files)
}