Create a candidate correspondence table between two classifications based on their correspondences with intermediate classifications
newCorrespondenceTable(
Tables,
Reference = "none",
MismatchTolerance = 0.2,
Redundancy_trim = TRUE,
Progress = TRUE
)A string of type character containing the name of a csv file which contains the names of the files that contain the classifications and the intermediate correspondence tables OR a list of vectors with the names of the dataframes classifications and the intermediate correspondence tables (see "Details" below).
The reference classification among A and B. If a classification is the reference to the other, and hence
hierarchically superior to it, each code of the other classification is expected to be mapped to at most one code
of the reference classification. The valid values are "none", "A", and "B". If the selected value
is "A" or "B", a "Review" flag column (indicating the records violating this expectation) is included
in the output (see "Explanation of the flags" below).
The maximum acceptable proportion of rows in the candidate correspondence table which contain
no code for classification A or no code for classification B. The default value is 0.2. The valid values are
real numbers in the interval [0, 1].
An argument in the function containing the logical values TRUE or FALSE
used to facilitate the trimming of the redundant records.
The default value is TRUE, which removes all redundant records.
The other values is FALSE, which shows redundant records together with the redundancy flag.
An argument in the function containing the logical values TRUE (default) or FALSE.
Used to switch ON (when TRUE) or OFF (when FALSE) the progress bar that interactively display the creation process of the new table in the console output.
newCorrespondenceTable() returns a list with two elements, both of which are data frames.
The first element is the candidate correspondence table A:B, including the codes of all "pivot" classifications, augmented with flags "Review" (if applicable), "Redundancy", "Unmatched", "NoMatchFromA", "NoMatchFromB" and with all the additional columns of the classification and intermediate correspondence table files.
The second element contains the names of classification A, the "pivot" classifications and classification B as read from the top left-hand side cell of the respective input files.
File and file name requirements:
The file that corresponds to argument Tables and the files to which the contents of Tables
lead, must be in csv format with comma as delimiter. If full paths are not provided, then these files must
be available in the working directory. No two filenames provided must be identical.
The file that corresponds to argument Tables must contain filenames, and nothing else, in
a \((k+2)\) × \((k+2)\) table, where \(k\), a positive integer, is the number of "pivot" classifications.
The cells in the main diagonal of the table provide the filenames of the files which contain, with this order,
the classifications \(A, C_1\), \(\ldots\), \(C_k\) and \(B\). The off-diagonal directly above the main
diagonal contains the filenames of the files that contain, with this order, the correspondence tables
\(A:C_1\), {\(C_i\):\(C_{i+1}\), \(1 \le i \le k-1\)} and \(B:C_k\). All other cells of the table
must be empty.
Classification table requirements:
Each of the files that contain classifications must contain at least one column and at least two rows. The first column contains the codes of the respective classification. The first row contains column headers. The header of the first column is the name of the respective classification (e.g., "CN 2021").
The classification codes contained in a classification file (expected in its first column as mentioned above) must be unique. No two identical codes are allowed in the column.
If any of the files that contain classifications has additional columns the first one of them is assumed to contain the labels of the respective classification codes.
Correspondence table requirements:
The files that contain correspondence tables must contain at least two columns and at least two rows. The first column of the file that contains A:\(C_1\) contains the codes of classification A. The second column contains the codes of classification \(C_1\). Similar requirements apply to the files that contain \(C_i\):\(C_{i+1}\), \(1 \le i \le k-1\) and B:\(C_k\). The first row of each of the files that contain correspondence tables contains column headers. The names of the first two columns are the names of the respective classifications.
The pairs of classification codes contained in a correspondence table file (expected in its first two columns as mentioned above) must be unique. No two identical pairs of codes are allowed in the first two columns.
Interdependency requirements:
At least one code of classification A must appear in both the file of classification A and the file of correspondence table A:\(C_1\).
At least one code of classification B must appear in both the file of classification B and the file of correspondence table B:\(C_k\), where \(k\), \(k\ge 1\), is the number of pivot classifications.
If there is only one pivot classification, \(C_1\), at least one code of it must appear in both the file of correspondence table A:\(C_1\) and the file of correspondence table B:\(C_1\).
If the pivot classifications are \(k\) with \(k\ge 2\) then at least one code of \(C_1\) must appear in both the file of correspondence table A:\(C_1\) and the file of correspondence table \(C_1\):\(C_2\), at least one code of each of the \(C_i\), \(i = 2, \ldots, k-1\) (if \(k\ge 3\)) must appear in both the file of correspondence table \(C_{i-1}\):\(C_i\) and the file of correspondence table \(C_i\):\(C_{i+1}\), and at least one code of \(C_k\) must appear in both the file of correspondence table \(C_{k-1}\):\(C_k\) and the file of correspondence table B:\(C_k\).
Mismatch tolerance:
The ratio that is compared with MismatchTolerance has as numerator the number of rows in the candidate
correspondence table which contain no code for classification A or no code for classification B and as denominator
the total number of rows of this table. If the ratio exceeds MismatchTolerance the execution of the function
is halted.
If any of the conditions required from the arguments is violated an error message is produced and execution is stopped.
The "Review" flag is produced only if argument Reference has been set equal to "A" or "B". For
each row of the candidate correspondence table, if Reference = "A" the value of "Review" is equal to
1 if the code of B maps to more than one code of A, and 0 otherwise. If Reference = "B"
the value of "Review" is equal to 1 if the code of A maps to more than one code of B, and 0 otherwise.
The value of the flag is empty if the row does not contain a code of A or a code of B.
For each row of the candidate correspondence table, the value of "Redundancy" is equal to 1 if the row
contains a combination of codes of A and B that also appears in at least one other row of the candidate
correspondence table.
When "Redundancy_Trim" is equal to FALSE the "Redundancy_keep" flag is created to identify with value 1
the records that will be kept if trimming is performed.
For each row of the candidate correspondence table, the value of "Unmatched" is equal to 1 if the row
contains a code of A but no code of B or if it contains a code of B but no code of A. The value of the flag is
0 if the row contains codes for both A and B.
For each row of the candidate correspondence table, the value of "NoMatchFromA" is equal to 1 if the row
contains a code of A that appears in the table of classification A but not in correspondence table A:\(C_1\). The
value of the flag is 0 if the row contains a code of A that appears in both the table of classification A and
correspondencetable A:\(C_1\). Finally, the value of the flag is empty if the row contains no code of A or if it
contains a code of A that appears in correspondence table A:\(C_1\) but not in the table of classification A.
For each row of the candidate correspondence table, the value of "NoMatchFromB" is equal to 1 if the row
contains a code of B that appears in the table of classification B but not in correspondence table B:\(C_k\). The
value of the flag is 0 if the row contains a code of B that appears in both the table of classification B and
correspondence table B:\(C_k\). Finally, the value of the flag is empty if the row contains no code of B or if it
contains a code of B that appears in correspondence table B:\(C_k\) but not in the table of classification B.
The argument "Redundancy_trim" is used to delete all the redundancies which are mapping correctly.
The valid logical values for this argument in the candidate correspondence table are TRUE or FALSE.
If the selected value is TRUE, all redundant records are removed and kept exactly one record for each unique combination.
For this retained record, the codes, the label and the supplementary information of the pivot classifications are replaced with
'multiple'. If the multiple infomration of the pivot classifications are the same, their value will not be replaced.
If the selected value is FALSE, no trimming is executed so redundant records are shown, together with the redundancy flag.
If the logical values are missing the implementation of the function will stop.
If the user wish to access the csv files with the sample data, they have two options:
Option 1: Unpack into any folder of their choice the tar.gz file into which the package has arrived. All sample datasets may be found in the "inst/extdata" subfolder of this folder.
Option 2: Go to the "extdata" subfolder of the folder in which the package has been installed in their PC's R
library. All sample datasets may be found there.