get_eurostat_data(
id,
filters = NULL,
lang = "en",
exact_match = TRUE,
date_filter = NULL,
label = FALSE,
select_freq = NULL,
cache = TRUE,
update_cache = FALSE,
cache_dir = NULL,
compress_file = TRUE,
stringsAsFactors = TRUE,
keep_flags = FALSE,
cflags = FALSE,
check_toc = FALSE,
local_filter = TRUE,
force_local_filter = FALSE,
mode = "xml",
verbose = FALSE,
...
)
A code name for the dataset of interest.
See search_eurostat_toc
for details how to get an id.
a string, a character vector or named list containing words to filter by the different concepts or geographical location.
If filter applied only part of the dataset is downloaded through the API. The words can be
any word, Eurostat variable code, and value which are in the DSD search_eurostat_dsd
.
If a named list is used, then the name of the list elements should be the concepts from the DSD and the provided values will be used to filter the dataset for the given concept.
The default is NULL
, in this case the whole dataset is returned via the bulk download. To filter by time see date_filter
below.
If after filtering still the dataset has more observations than the limit per query via the API, then the raw download is used to retrieve the whole dataset and apply the filter on the local computer. This option can be disabled with the local_filter=FALSE
parameter.
a character string either en
, de
or fr
to define the language version for the DSD to search in for the filters
. The default is en
- English.
a boolean with the default value TRUE
, if the strings provided in filters
shall be matched exactly as it is or as a pattern.
a vector which can be numeric or character containing dates to filter the dataset. If date is defined as character string it should follow the format yyyy[-mm][-dd], where the month and the day part is optional.
If date filter applied only part of the dataset is downloaded through the API.
The default is NULL
, in this case the whole dataset is returned via the bulk download.
If after filtering still the dataset has more observations than the limit per query via the API, then the raw download is used to retrieve the data and apply the filter on the local computer.
This option can be disabled with the local_filter=FALSE
parameter.
a boolean with the default FALSE
. If it is TRUE
then the code values are replaced by the name from the Data Structure Definition (DSD) get_eurostat_dsd
.
For example instead of "D1110A", "Raw cows' milk from farmtype" is used or "HU32" is replaced by "Észak-Alföld".
a character symbol for a time frequency when a dataset has multiple time
frequencies. Possible values are:
A = annual, S = semi-annual, H = half-year, Q = quarterly, M = monthly, W = weekly, D = daily.
The default is NULL
as most datasets have just one time
frequency and in case there are multiple frequencies, then only the most common frequency kept.
If all the frequencies needed the get_eurostat_raw
can be used.
a logical whether to do caching. Default is TRUE
. Affects
only queries without filtering. If filters
or date_filter
is used then there is no caching.
a logical with a default value FALSE
, whether to update the data in the cache. Can be set also with
options(restatapi_update=TRUE)
a path to a cache directory. The NULL
(default) uses the memory as cache.
If the folder cache_dir
directory does not exist it saves in the 'restatapi' directory
under the temporary directory from tempdir()
. Directory can also be set with
option(restatapi_cache_dir=...)
.
a logical whether to compress the
RDS-file in caching. Default is TRUE
.
if TRUE
(the default) the non-numeric columns are
converted to factors. If the value FALSE
they are returned as characters.
a logical whether the observation status (flags) - e.g. "confidential",
"provisional", etc. - should be kept in a separate column or if they
can be removed. Default is FALSE
. For flag values see:
https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/codelist/ESTAT/OBS_STATUS/?compressed=false&format=TSV&lang=en.
a logical whether the missing observations with flag 'c' - "confidential"
should be kept or not. Default is FALSE
, in this case these observations dropped from the dataset. If this parameter
TRUE
then the flags are kept and the parameter provided in keep_flags
is not taken into account.
a boolean whether to check the provided id
in the Table of Contents (TOC) or not. The default value
FALSE
, in this case the base URL for the download link is retrieved from the configuration file.
If the value is TRUE
then the TOC is downloaded and the id
is checked in it. If it found then the download link
is retrieved form the TOC.
a boolean whether do the filtering on the local computer or not in case after filtering still the dataset has more observations
than the limit per query via the API would allow to download. The default is TRUE
, in this case if the response footer contains information
that the result cannot be downloaded becuase it is too large, then the whole raw dataset is downloaded and filtered on the local computer.
a boolean with the default value FALSE
. In case, if there are existing filter conditions, then it will do the filtering on the local
computer and not requesting through the REST API. It can be useful, if the values are not numeric as these are provided as NaN (Not a Number) through the REST API,
but it is fully listed in the raw dataset.
defines the format of the dataset response from the API. It can be
csv
for SDMX-CSV or xml
for the SDMX-ML version.
A boolean with default FALSE
, so detailed messages (for debugging) will not printed.
Can be set also with options(restatapi_verbose=TRUE)
further arguments to the for search_eurostat_dsd
function, e.g.: ignore.case
or name
.
The ignore.case
has the default value FALSE
, then the strings provided in filters
are matched as is,
otherwise the case of the letters is ignored. If the name=FALSE
then the pattern(s) provided in the filters
argument is only searched in the code column of the DSD, and the names of the codes will not be searched.
a data.table with the following columns:
freq | A column for the frequency of the data in case there are multiple frequencies, for single frequency this columns is dropped from the data table |
dimension names | One column for each dimension in the data |
time | A column for the time dimension |
values | A column for numerical values |
flags | A column for flags if the keep_flags=TRUE or cflags=TRUE otherwise this column
is not included in the data table |
The data.table does not include all missing values. The missing values are dropped if the value and flag are missing on a particular time.
In case the provided filters
can be found in the DSD, then it is used to query the API or applied locally. If the applied filters
with combination of date_filter
and select_freq
has no observation in the data set then the fucntion returns the data.table with 0 row.
In case none of the provided filters
, date_filter
or select_freq
can be parsed or found in the DSD then the whole dataset downloaded through the bulk download with a warning message.
In case the id
is not exist then the function returns the value NULL
.
Data sets are downloaded from the Eurostat Web Services
SDMX API if there is a filter otherwise the
the Eurostat bulk download facility is used.
If only the table id
is given, the whole table is downloaded from the
bulk download facility. If also filters
or date_filter
is defined then the SDMX REST API is
used. In case after filtering the dataset has more rows than the limitation of the SDMX REST API (1 million values at one time) then the bulk download is used to retrieve the whole dataset .
By default all datasets cached as they are often rather large.
The datasets cached in memory (default) or can be stored in a temporary directory if cache_dir
or option(restatpi_cache_dir)
is defined.
The cache can be emptied with clean_restatapi_cache
.
The id
, is a value from the code
column of the table of contents (get_eurostat_toc
), and can be searched
for with the search_eurostat_toc
function. The id value can be retrieved from the Eurostat database
as well. The Eurostat database gives codes in the Data Navigation Tree after every dataset in parenthesis.
Filtering can be done by the codes as described in the API documentation providing in the correct order and connecting with "." and "+".
If we do not know the codes we can filter based on words or by the mix of the two putting in a vector like c("AT$","Belgium","persons","Total")
.
Be careful that the filter is case sensitive, if you do not know the code or label exactly you can use the option ignore.case=TRUE
and exact_match=FALSE
,
but in this case the results may include unwanted elements as well. In the filters
parameter regular expressions can be used as well.
We do not have to worry about the correct order of the filter, it will be put in the correct place based on the DSD.
The date_filter
shall be a string in the format yyyy[-mm][-dd]. The month and the day part is optional, but if we use the years and we have monthly frequency then all the data for the given year is retrieved.
The string can be extended by adding the "<" or ">" to the beginning or to the end of the string. In this case the date filter is treated as range, and the date is used as a starting or end date. The data will include the observation of the start/end date.
A single date range can be defined as well by concatenating two dates with the ":", e.g. "2016-08:2017-03-15"
. As seen in the example the dates can have different length: one defined only at year/month level, the other by day level.
If a date range is defined with ":", it is not possible to use the "<" or ">" characters in the date filter.
If there are multiple dates which is not a continuous range, it can be put in vector in any order like c("2016-08",2013:2015,"2017-07-01")
. In this case, as well, it is not possible to use the "<" or ">" characters.
load_cfg()
eu<-get("cc",envir=restatapi::.restatapi_env)
# \donttest{
if (!(grepl("amzn|-aws|-azure ",Sys.info()['release']))) options(timeout=2)
head(get_eurostat_data("NAMA_10_GDP"))
#> There is a warning by the download of the TSV file. Run the same command with verbose=TRUE option to get more info on the issue.
#> The file download was not successful. Try again later.
#> NULL
head(get_eurostat_data("htec_cis3",update_cache=TRUE,check_toc=TRUE,verbose=TRUE))
#> get_eurostat_data - footer code option value at start:
#> get_eurostat_data - footer code option value after reset:
#> get_eurostat_data - id: htec_cis3
#>
#> get_eurostat_toc - API version:2 - number of cores:2
#> get_eurostat_toc - Downloading https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/xml
#> get_eurostat_toc - class(xml_leafs): xml_nodeset
#> get_eurostat_toc - number of nodes: 9688
#> get_eurostat_toc - number of cores: 2
#> get_eurostat_toc - The TOC was cached in memory ('toc.xml.en' in '.restatapi_env').
#> get_eurostat_data - data TOC rows: 7253
#> bulk url from TOC: https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/HTEC_CIS3?format=tsv&compressed=true
#> data rowcount in TOC: 1006
#> get_eurostat_data - select_freq:
#>
#> get_eurostat_bulk - API version:2
#>
#> get_eurostat_toc - API version:2 - number of cores:2
#>
#> get_eurostat_cache - The 'toc.xml.en' was loaded from 'toc.xml.en' in '.restatapi_env'.
#> get_eurostat_toc - The TOC was cached previously in memory ('toc.xml.en' in '.restatapi_env'). It remained unchanged.
#> get_eurostat_bulk - TOC rows: 7253
#> get_eurostat_bulk - tsv bulk url from TOC: https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/HTEC_CIS3?format=tsv&compressed=true
#> get_eurostat_bulk - ndata rowcount in TOC: 1006
#> get_eurostat_bulk - class of id, cache, update_cache, cache_dir, compress_file, stringsAsFactors, keep_flags, check_toc, melt, verbose:
#> character - logical -logical - NULL - logical - logical - logical - logical - logical - logical
#>
#> get_eurostat_raw - API version:2
#>
#> get_eurostat_toc - API version:2 - number of cores:2
#>
#> get_eurostat_cache - The 'toc.xml.en' was loaded from 'toc.xml.en' in '.restatapi_env'.
#> get_eurostat_toc - The TOC was cached previously in memory ('toc.xml.en' in '.restatapi_env'). It remained unchanged.
#> get_eurostat_raw - raws of TOC: 7253
#> get_eurostat_raw - txt bulk url from TOC:https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/HTEC_CIS3?format=tsv&compressed=true
#> get_eurostat_raw - txt bulk url from cfg:https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/htec_cis3?format=TSV&compressed=true
#> get_eurostat_raw - data rowcount in TOC: 1006
#> get_eurostat_raw - class(raw): data.tabledata.frame
#> get_eurostat_raw - caching in raw: FALSE
#> get_eurostat_raw - local filter: FALSE
#> get_eurostat_raw - called from: restatapi::get_eurostat_rawidtxtcacheupdate_cachecache_dircompress_filestringsAsFactorskeep_flagscheck_tocTRUEverbose
#> get_eurostat_raw - get_eurostat_raw in sys.call(): FALSE
#> get_eurostat_data - The data was downloaded with bulk download.
#> get_eurostat_data - bulk restat - nrow:1006; ncol:6; colnames:indic_in/nace_r1/unit/geo/time/values
#> get_eurostat_data - cflags:FALSE
#> get_eurostat_data - restat - nrow:1006; ncol:6; colnames:indic_in/nace_r1/unit/geo/time/values
#> get_eurostat_data - processed restat - nrow:867; ncol:6; colnames:indic_in/nace_r1/unit/geo/time/values
#> get_eurostat_data - The data was cached in memory ('b_htec_cis3-2013.04.12-0-0' in '.restatapi_env').
#> Key: <indic_in, nace_r1, unit, geo>
#> indic_in nace_r1 unit geo time values
#> <fctr> <fctr> <fctr> <fctr> <fctr> <num>
#> 1: CO2_YES C-E PC AT 2000 7.44
#> 2: CO2_YES C-E PC BE 2000 11.72
#> 3: CO2_YES C-E PC DE 2000 10.74
#> 4: CO2_YES C-E PC DK 2000 18.91
#> 5: CO2_YES C-E PC EL 2000 4.94
#> 6: CO2_YES C-E PC ES 2000 3.17
head(get_eurostat_data("agr_r_milkpr",cache_dir="/tmp",cflags=TRUE))
#> Key: <milkitem, dairyprod, geo>
#> milkitem dairyprod geo time values flags
#> <fctr> <fctr> <fctr> <fctr> <num> <fctr>
#> 1: PRO D1110A AL 2021 859.38
#> 2: PRO D1110A AL 2022 824.94
#> 3: PRO D1110A AL 2023 765.35
#> 4: PRO D1110A AL0 2021 859.38
#> 5: PRO D1110A AL0 2022 824.94
#> 6: PRO D1110A AL0 2023 765.35
options(restatapi_update=FALSE)
options(restatapi_cache_dir=file.path(tempdir(),"restatapi"))
head(get_eurostat_data("avia_gonc",select_freq="A",cache=FALSE))
#> get_eurostat_dsd - There is a warning by the download of the DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> get_eurostat_dsd - There is an error by the reading of the downloaded DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Could not download the DSD. The filter is ignored
#> None of the filter could be applied. The whole dataset will be retrieved through bulk download.
#> Key: <unit, tra_meas, geo>
#> unit tra_meas geo time values
#> <fctr> <fctr> <fctr> <fctr> <num>
#> 1: FLIGHT CAF_FRM AT 1993 11
#> 2: FLIGHT CAF_FRM AT 1996 2
#> 3: FLIGHT CAF_FRM AT 1998 16
#> 4: FLIGHT CAF_FRM AT 1999 23
#> 5: FLIGHT CAF_FRM AT 2000 213
#> 6: FLIGHT CAF_FRM AT 2001 48
head(get_eurostat_data("agr_r_milkpr",date_filter=2008,keep_flags=TRUE))
#> geo milkitem dairyprod time values flags
#> <fctr> <fctr> <fctr> <fctr> <num> <fctr>
#> 1: AT PRO D1110A 2008 3195.95
#> 2: AT1 PRO D1110A 2008 675.95
#> 3: AT11 PRO D1110A 2008 31.15
#> 4: AT12 PRO D1110A 2008 644.80
#> 5: AT13 PRO D1110A 2008 0.00 n
#> 6: AT2 PRO D1110A 2008 700.63
dt<-get_eurostat_data("avia_par_me",
filters="BE$",
exact_match=FALSE,
date_filter=c(2016,"2017-03","2017-07-01"),
select_freq="Q",
label=TRUE,
name=FALSE)
#> get_eurostat_dsd - There is a warning by the download of the DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> get_eurostat_dsd - There is an error by the reading of the downloaded DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Could not download the DSD. The filter is ignored
#> There is a warning by the download of the XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> There is an error by the reading of the downloaded XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Problem by the extraction of the footer information from the xml_file.
#> get_eurostat_dsd - There is a warning by the download of the DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> get_eurostat_dsd - There is an error by the reading of the downloaded DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Could not download the DSD. No label is applied.
dt<-get_eurostat_data("agr_r_milkpr",
filters=c("BE$","Ungarn"),
lang="de",
date_filter="2007-06<",
keep_flags=TRUE)
#> get_eurostat_dsd - There is a warning by the download of the DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> get_eurostat_dsd - There is an error by the reading of the downloaded DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Could not download the DSD. The filter is ignored
dt<-get_eurostat_data("nama_10_a10_e",
filters=c("Annual","EU28","Belgium","AT","Total","EMP_DC","person"),
date_filter=c("2008",2002,2013:2018))
#> get_eurostat_dsd - There is a warning by the download of the DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> get_eurostat_dsd - There is an error by the reading of the downloaded DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Could not download the DSD. The filter is ignored
#> There is a warning by the download of the XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> There is an error by the reading of the downloaded XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Problem by the extraction of the footer information from the xml_file.
#> There is a warning by the download of the XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> There is an error by the reading of the downloaded XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Problem by the extraction of the footer information from the xml_file.
#> There is a warning by the download of the XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> There is an error by the reading of the downloaded XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Problem by the extraction of the footer information from the xml_file.
#> 404 - No data retrived with the given filter(s)
dt<-get_eurostat_data("vit_t3",
filters=c("EU28",eu$EA15,"HU$"),
date_filter=c("2015",2007))
#> get_eurostat_dsd - There is a warning by the download of the DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> get_eurostat_dsd - There is an error by the reading of the downloaded DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Could not download the DSD. The filter is ignored
#> There is a warning by the download of the XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> There is an error by the reading of the downloaded XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Problem by the extraction of the footer information from the xml_file.
dt<-get_eurostat_data("avia_par_me",
filters="Q...ME_LYPG_HU_LHBP+ME_LYTV_UA_UKKK",
date_filter=c("2016-08","2017-07-01"),
select_freq="M")
#> get_eurostat_dsd - There is a warning by the download of the DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> get_eurostat_dsd - There is an error by the reading of the downloaded DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Could not download the DSD. The filter is ignored
#> There is a warning by the download of the XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> There is an error by the reading of the downloaded XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Problem by the extraction of the footer information from the xml_file.
dt<-get_eurostat_data("htec_cis3",
filters="lu",
ignore.case=TRUE)
#> get_eurostat_dsd - There is a warning by the download of the DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> get_eurostat_dsd - There is an error by the reading of the downloaded DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Could not download the DSD. The filter is ignored
#> None of the filter could be applied. The whole dataset will be retrieved through bulk download.
dt<-get_eurostat_data("bop_its6_det",
filters=list(bop_item="SC",
currency="MIO_EUR",
partner="EXT_EU28",
geo=c("EU28","HU"),
stk_flow="BAL",
time="2015:2020"),
date_filter="2010:2012",
select_freq="A",
label=TRUE,
name=FALSE)
#> get_eurostat_dsd - There is a warning by the download of the DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> get_eurostat_dsd - There is an error by the reading of the downloaded DSD file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Could not download the DSD. The filter is ignored
#> There is a warning by the download of the XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> There is an error by the reading of the downloaded XML file. Run the same command with verbose=TRUE option to get more info on the issue.
#> Problem by the extraction of the footer information from the xml_file.
#> 404 - No data retrived with the given filter(s)
clean_restatapi_cache("/tmp",verbose=TRUE)
#>
#> clean_restatapi_cache - All objects (outside of 'cfg', 'rav', 'cc' and 'dmethod') are removed from '.restatapi_env'.
#> clean_restatapi_cache - Deleted all .rds files from /tmp; /tmp/Rtmp7EeCja/restatapi
options(timeout=60)
# }