vignettes/Implementation_of_sample_datasets.Rmd
Implementation_of_sample_datasets.Rmd
This vignette provides information about applying the correspondenceTables package on the sample datasets included in it.
ATTENTION: please set as working directory a folder different than the folder in which the package has been installed, for instance as follows:
library(correspondenceTables)
tmp_dir<-tempdir()
csv_files<-list.files(tmp_dir, pattern = ".csv")
if (length(csv_files)>0) unlink(csv_files)
LOCAL COPIES OF THE SAMPLE DATA
If users wish so, they can make copies of the sample datasets in a local folder of their choice. There are two ways of doing this:
Unpack into any folder of your choice the tar.gz file in which the package has arrived. All sample datasets may be found in the “inst/extdata” subfolder of this folder.
Copy sample datasets from the “extdata” subfolder of the folder in which the package has been installed in your PC’s R library.
ACCESSING SAMPLE DATASETS FROM WITHIN THE PACKAGE
Application of function updateCorrespondenceTable().
Case 1
Execute the following code in order to get the path of the required input files.
A <- system.file("extdata", "CN2021.csv", package = "correspondenceTables")
AStar <- system.file("extdata", "CN2022.csv", package = "correspondenceTables")
B <- system.file("extdata", "CPA21.csv", package = "correspondenceTables")
AB <- system.file("extdata", "CN2021_CPA21.csv", package = "correspondenceTables")
AAStar <- system.file("extdata", "CN2021_CN2022.csv", package = "correspondenceTables")
Execute the following code line to apply function updateCorrespondenceTable() on these data. When there are redundant records, these are removed and kept exactly one record for each unique combination.
UPC <- updateCorrespondenceTable(A, B, AStar, AB, AAStar, file.path(tmp_dir,"updateCorrespondenceTableCase1.csv"),
"B", 0.4, 0.4, TRUE)
print(UPC[[1]][1:10, 1:7])
#> CN 2021 CN 2022 CPA 2.1 CodeChange Review Redundancy NoMatchToAStar
#> 1 Multiple 03099000 10.20.34 1 1 1 0
#> 2 Multiple 16021000 10.89.19 1 1 1 0
#> 3 Multiple 16029099 10.89.19 1 1 1 0
#> 4 Multiple 24041100 12.00.19 1 0 1 0
#> 5 Multiple 24041200 20.59.59 1 0 1 0
#> 6 Multiple 24041990 20.59.59 1 0 1 0
#> 7 Multiple 24049900 20.59.59 1 0 1 0
#> 8 Multiple 28444110 20.13.13 1 0 1 0
#> 9 Multiple 28444210 20.13.13 1 0 1 0
#> 10 Multiple 28444320 20.13.13 1 0 1 0
print(UPC[[2]])
#> Classification: Name
#> 1 A: CN 2021
#> 2 B: CPA 2.1
#> 3 AStar: CN 2022
Case 2
Execute the following code in order to get the path of the required input files.
A <- system.file("extdata", "CN2021.csv", package = "correspondenceTables")
AStar <- system.file("extdata", "CN2022.csv", package = "correspondenceTables")
B <- system.file("extdata", "PRODCOM2021.csv", package = "correspondenceTables")
AB <- system.file("extdata", "CN2021_PRODCOM2021.csv", package = "correspondenceTables")
AAStar <- system.file("extdata", "CN2021_CN2022.csv", package = "correspondenceTables")
Execute the following code line to apply function updateCorrespondenceTable() on these data. When there are redundant records, these are removed and kept exactly one record for each unique combination.
UPC <- updateCorrespondenceTable(A, B, AStar, AB, AAStar, file.path(tmp_dir,"updateCorrespondenceTableCase2.csv"), "A", 0.4, 0.3, TRUE)
Case 3
Execute the following code in order to get the path of the required input files.
A <- system.file("extdata", "NAICS2017.csv", package = "correspondenceTables")
AStar <- system.file("extdata", "NAICS2022.csv", package = "correspondenceTables")
B <- system.file("extdata", "NACE.csv", package = "correspondenceTables")
AB <- system.file("extdata", "NAICS2017_NACE.csv", package = "correspondenceTables")
AAStar <- system.file("extdata", "NAICS2017_NAICS2022.csv", package = "correspondenceTables")
Execute the following code line to apply function updateCorrespondenceTable() on these data. When there are redundant records, these are removed and kept exactly one record for each unique combination.
UPC <- updateCorrespondenceTable(A, B, AStar, AB, AAStar, file.path(tmp_dir,"updateCorrespondenceTableCase3.csv"), "none", 0.5, 0.3, TRUE)
Case 4
Execute the following code in order to get the path of the required input files.
A <- system.file("extdata", "CN2021.csv", package = "correspondenceTables")
AStar <- system.file("extdata", "CN2022.csv", package = "correspondenceTables")
B <- system.file("extdata", "NST2007.csv", package = "correspondenceTables")
AB <- system.file("extdata", "CN2021_NST2007.csv", package = "correspondenceTables")
AAStar <- system.file("extdata", "CN2021_CN2022.csv", package = "correspondenceTables")
Execute the following code line to apply function updateCorrespondenceTable() on these data. When no trimming is executed, redundant records are shown, together with the redundancy flag.
UPC <- updateCorrespondenceTable(A, B, AStar, AB, AAStar, file.path(tmp_dir,"updateCorrespondenceTableCase4.csv"), "B", 0.4, 0.3, TRUE)
Case 5
Execute the following code in order to get the path of the required input files.
A <- system.file("extdata", "CN2021.csv", package = "correspondenceTables")
AStar <- system.file("extdata", "CN2022.csv", package = "correspondenceTables")
B <- system.file("extdata", "SITC4.csv", package = "correspondenceTables")
AB <- system.file("extdata", "CN2021_SITC4.csv", package = "correspondenceTables")
AAStar <- system.file("extdata", "CN2021_CN2022.csv", package = "correspondenceTables")
Execute the following code line to apply function updateCorrespondenceTable() on these data. When no trimming is executed, redundant records are shown, together with the redundancy flag.
UPC <- updateCorrespondenceTable(A, B, AStar, AB, AAStar, file.path(tmp_dir,"updateCorrespondenceTableCase5.csv"), "B", 0.3, 0.7, TRUE)
Case 6
Execute the following code in order to get the path of the required input files.
A <- system.file("extdata", "CN2021.csv", package = "correspondenceTables")
AStar <- system.file("extdata", "CN2022.csv", package = "correspondenceTables")
B <- system.file("extdata", "BEC4.csv", package = "correspondenceTables")
AB <- system.file("extdata", "CN2021_BEC4.csv", package = "correspondenceTables")
AAStar <- system.file("extdata", "CN2021_CN2022.csv", package = "correspondenceTables")
Execute the following code line to apply function updateCorrespondenceTable() on these data. When no trimming is executed, redundant records are shown, together with the redundancy flag.
UPC <- updateCorrespondenceTable(A, B, AStar, AB, AAStar, file.path(tmp_dir,"updateCorrespondenceTableCase6.csv"), "B", 0.3, 0.6, FALSE)
Application of function newCorrespondenceTable().
The function fullPath is used in all cases in order to get the path of the required input files.
fullPath <- function(CSVraw, CSVappended){
NamesCsv <- system.file("extdata", CSVraw, package = "correspondenceTables")
A <- read.csv(NamesCsv, header = FALSE, sep = ",")
for (i in 1:nrow(A)) {
for (j in 1:ncol(A)) {
if (A[i,j]!="") {
A[i, j] <- system.file("extdata", A[i, j], package = "correspondenceTables")
}}}
write.table(x = A, file = file.path(tmp_dir,CSVappended), row.names = FALSE, col.names = FALSE, sep = ",")
return(A)
}
Case 1
fullPath("names1.csv", "names.csv")
Execute the following code to apply function newCorrespondenceTable() on these data. When no trimming is executed, redundant records are shown, together with the redundancy flag.
system.time(NCT <- newCorrespondenceTable(file.path(tmp_dir,"names.csv"), file.path(tmp_dir,"newCorrespondenceTableCase1.csv"), "A", 0.5, FALSE))
#> Percentage of codes of ISIC Rev. 4 processed:
#>
#> Percentage of codes of CPA 2.1 processed:
#>
print(NCT[[1]][1:10, 1:6])
#> ISIC Rev. 4 CPC 2.1 CPA 2.1 Review Redundancy Redundancy_keep
#> 1 01 0 0
#> 2 01.1 0 0
#> 3 01.11 0 0
#> 4 01.11.1 0 0
#> 5 01.11.2 0 0
#> 6 01.11.3 0 0
#> 7 01.11.4 0 0
#> 8 01.11.5 0 0
#> 9 01.11.6 0 0
#> 10 01.11.7 0 0
print(NCT[[2]])
#> Classification: Name
#> 1 A: ISIC Rev. 4
#> 2 C1: CPC 2.1
#> 3 B: CPA 2.1
Case 2
fullPath("names2.csv", "names.csv")
Execute the following code to apply function newCorrespondenceTable() on these data. When no trimming is executed, redundant records are shown, together with the redundancy flag.
system.time(NCT <- newCorrespondenceTable(file.path(tmp_dir,"names.csv"), file.path(tmp_dir,"newCorrespondenceTableCase2.csv"), "B", 0.5, FALSE))
#> Percentage of codes of CN 2022 processed:
#>
#> Percentage of codes of NACE Rev. 2 processed:
#>
Case 3
fullPath("names3.csv", "names.csv")
Execute the following code to apply function newCorrespondenceTable() on these data. When there are redundant records, these are removed and kept exactly one record for each unique combination.
system.time(NCT <- newCorrespondenceTable(file.path(tmp_dir,"names.csv"), file.path(tmp_dir,"newCorrespondenceTableCase3.csv"), "B", 0.5, TRUE))
#> Percentage of codes of NACE Rev. 2 processed:
#>
#> Percentage of codes of ISIC Rev. 4 processed:
#>
Case 4
fullPath("names4.csv", "names.csv")
Execute the following code to apply function newCorrespondenceTable() on these data. When there are redundant records, these are removed and kept exactly one record for each unique combination.
system.time(NCT <- newCorrespondenceTable(file.path(tmp_dir,"names.csv"), file.path(tmp_dir,"newCorrespondenceTableCase4.csv"), "none", 0.96, TRUE))
#> Percentage of codes of NACE Rev. 2 processed:
#>
#> Percentage of codes of SITC4 processed:
#>