Syntax
quantile
: Compute empirical quantiles of a variable with sample data corresponding to given probabilities.
Common parameterisation
Some arguments are common to the implementations in the different languages:
probs
: (option) list of probabilities with values in [0,1]; the smallest observation corresponds to a probability of 0 and the largest to a probability of 1; default: probs is set to the sequence0 0.25 0.5 0.75 1
, so as to match default valuesseq(0, 1, 0.25)
used in R quantile;-
type
: (option) an integer between 1 and 11 selecting one of the 9+1+1 quantile algorithms discussed in Hyndman and Fan’s, Cunane’s and Filliben’s articles (see references) and detailed below to be used;type
description 1 inverted empirical CDF 2 inverted empirical CDF with averaging at discontinuities 3 observation numberer closest to qN (piecewise linear function) 4 linear interpolation of the empirical CDF 5 Hazen’s model (piecewise linear function) 6 Weibull quantile 7 interpolation points divide sample range into n-1 intervals 8 unbiased median (regardless of the distribution) 9 approximate unbiased estimate for a normal distribution 10 Cunnane’s definition (approximately unbiased) 11 Filliben’s estimate default:
type=7
(likewise Rquantile
); method
: (option) choice of the implementation of the quantile estimation method; this can be either:INHERIT
for an estimation based on the use of an already existing implementation in the given language,DIRECT
for a canonical implementation based on the direct transcription of the various quantile estimation algorithms (see below) into the given language;
default:
method=DIRECT
;na_rm
: (option) logical; if true, any NA and NaN’s are removed from x before the quantiles are computed.
SAS
macro
%quantile(var, probs=, type=7, method=DIRECT, names=, _quantiles_=,
idsn=, odsn=, ilib=WORK, olib=WORK, na_rm = YES);
Arguments
var
: data whose sample quantiles are estimated; this can be either:- the name of the variable in a dataset storing the data; in that case, the parameter
idsn
(see below) should be set; - a list of (blank separated) numeric values;
- the name of the variable in a dataset storing the data; in that case, the parameter
idsn
: (option) when input data is passed as a variable name,idsn
represents the dataset to look for the variablevar
(see above);ilib
: (option) name of the input library; by default: empty, i.e.WORK
is used ifidsn
is set;olib
: (option) name of the output library (seenames
below); by default: empty, i.e.WORK
is also used whenodsn
is set.
Returns
Return estimates of underlying distribution quantiles based on one or two order statistics from
the supplied elements in var
at probabilities in probs
, following quantile estimation algorithm
defined by type
. The output sample quantile are stored either in a list or as a table, through:
_quantiles_
: (option) name of the output numeric list where quantiles are stored in increasingprobs
order; incompatible with parametersodsn
andnames
below;odsn, names
: (option) respective names of the output dataset and variable where quantiles are stored; if bothodsn
andnames
are set, the quantiles are saved in thenames
variable ot theodsn
dataset; if justodsn
is set, then they are stored in a variable namedQUANT
; if instead onlynames
is set, then the dataset will also be named afternames
.
Notes
probs
: (see above) in the casemethod=UNIVAR
(see below), these values are multiplied by 100 in order to be used byPROC UNIVARIATE
;method
: (see above) in the casemethod=INHERIT
, the macro uses thePROC UNIVARIATE
procedure already implemented in SAS; this is incompatible withtype
other than(1,2,3,4,6)
sincePROC UNIVARIATE
does actually not support these quantile definitions (see table above); in the casetype=5
,7
,8
, or9
,method
is then set toDIRECT
.type
: (see above) note the (non bijective) correspondance between the different algorithms and the currently available methods inPROC UNIVARIATE
(through the use ofPCTLDEF
parameter):
type |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
PCTLDEF |
3 | 5 | 2 | 1 | n.a. | 4 | n.a. | n.a. | n.a. | n.a. | n.a. |
na_rm
: (see above) true isyes
, false isno
.
Python
method
>>> q = quantile(x, probs, na_rm = False, type = 7,
method='DIRECT', limit=(0,1))
Arguments
x
: input 1D (vector) data (numpy.array
,pandas.DataFrame
, orpandas.Series
); 2D arrays are also accepted;limit
: (option) tuple/list of (lower, upper) values; values of a outside this open interval are ignored.
Returns
q
: 1D vector of quantiles returned as anumpy.array
.
Notes
probs
: (see above) the following codes: 2 orM2
, 3 orT3
, 4 orQu4
, 5 orQ5, 6 or
S6, 10 or
D10, 12 or
Dd12, 20 or
V20, and 100 or
P100` can be used to compute common specialised quantiles (median, terciles, quartiles, quintiles, sextiles, deciles, duo-deciles, ventiles and percentiles resp.);method
: (see above) in the casemethod=INHERIT
, thescipy::mquantiles
function is used to estimate quantiles; this case is incompatible withtype<4
(see below);type
: (see above) methods 4 to 11 are available in originalscipy::mquantiles
function;na_rm
: (see above) true isTrue
, false isFalse
.
R
method
> q <- quantile(x, data = NULL, probs=seq(0, 1, 0.25), na.rm=FALSE,
type=7, method="DIRECT", names= FALSE)
Arguments
x
: a numeric vector or a value (character or integer) providing with the sample data; whendata
is not null,x
provides with the name (char
) or the position (int) of the variable of interest in the table;data
: (option) input table, defined as a dataframe, whose column defined byx
is used as sample data for the estimation; if passed, thenx
should be defined as a character or an integer; default:data=NULL
and input sample data should be passed as numeric vector inx
;probs
: (option) numeric vector giving the probabilities with values in [0,1]; default:probs=seq(0, 1, 0.25)
like in originalstats::quantile
function;na_rm, names
: (option) logical flags; ifna.rm=TRUE
, any NA and NaN’s are removed fromx
before the quantiles are computed; ifnames=TRUE
, the result has a names attribute; these two flags follow exactly the original implementation ofstats::quantile
; default:na.rm= FALSE
andnames= FALSE
.
Notes
method
: (see above) in the casemethod=INHERIT
, thestats::quantile
function is used to estimate quantiles; this case is incompatible withtype>9
(see below);type
: (see above) methods 1 to 9 are available in originalstats::quantile
function.
Returns
q
: 1D vector of quantiles returned as a numeric vector.