Syntax
quantile: Compute empirical quantiles of a variable with sample data corresponding to given probabilities.
Common parameterisation
Some arguments are common to the implementations in the different languages:
probs: (option) list of probabilities with values in [0,1]; the smallest observation corresponds to a probability of 0 and the largest to a probability of 1; default: probs is set to the sequence0 0.25 0.5 0.75 1, so as to match default valuesseq(0, 1, 0.25)used in R quantile;-
type: (option) an integer between 1 and 11 selecting one of the 9+1+1 quantile algorithms discussed in Hyndman and Fan’s, Cunane’s and Filliben’s articles (see references) and detailed below to be used;typedescription 1 inverted empirical CDF 2 inverted empirical CDF with averaging at discontinuities 3 observation numberer closest to qN (piecewise linear function) 4 linear interpolation of the empirical CDF 5 Hazen’s model (piecewise linear function) 6 Weibull quantile 7 interpolation points divide sample range into n-1 intervals 8 unbiased median (regardless of the distribution) 9 approximate unbiased estimate for a normal distribution 10 Cunnane’s definition (approximately unbiased) 11 Filliben’s estimate default:
type=7(likewise Rquantile); method: (option) choice of the implementation of the quantile estimation method; this can be either:INHERITfor an estimation based on the use of an already existing implementation in the given language,DIRECTfor a canonical implementation based on the direct transcription of the various quantile estimation algorithms (see below) into the given language;
default:
method=DIRECT;na_rm: (option) logical; if true, any NA and NaN’s are removed from x before the quantiles are computed.
SAS macro
%quantile(var, probs=, type=7, method=DIRECT, names=, _quantiles_=,
idsn=, odsn=, ilib=WORK, olib=WORK, na_rm = YES);
Arguments
var: data whose sample quantiles are estimated; this can be either:- the name of the variable in a dataset storing the data; in that case, the parameter
idsn(see below) should be set; - a list of (blank separated) numeric values;
- the name of the variable in a dataset storing the data; in that case, the parameter
idsn: (option) when input data is passed as a variable name,idsnrepresents the dataset to look for the variablevar(see above);ilib: (option) name of the input library; by default: empty, i.e.WORKis used ifidsnis set;olib: (option) name of the output library (seenamesbelow); by default: empty, i.e.WORKis also used whenodsnis set.
Returns
Return estimates of underlying distribution quantiles based on one or two order statistics from
the supplied elements in var at probabilities in probs, following quantile estimation algorithm
defined by type. The output sample quantile are stored either in a list or as a table, through:
_quantiles_: (option) name of the output numeric list where quantiles are stored in increasingprobsorder; incompatible with parametersodsnandnamesbelow;odsn, names: (option) respective names of the output dataset and variable where quantiles are stored; if bothodsnandnamesare set, the quantiles are saved in thenamesvariable ot theodsndataset; if justodsnis set, then they are stored in a variable namedQUANT; if instead onlynamesis set, then the dataset will also be named afternames.
Notes
probs: (see above) in the casemethod=UNIVAR(see below), these values are multiplied by 100 in order to be used byPROC UNIVARIATE;method: (see above) in the casemethod=INHERIT, the macro uses thePROC UNIVARIATEprocedure already implemented in SAS; this is incompatible withtypeother than(1,2,3,4,6)sincePROC UNIVARIATEdoes actually not support these quantile definitions (see table above); in the casetype=5,7,8, or9,methodis then set toDIRECT.type: (see above) note the (non bijective) correspondance between the different algorithms and the currently available methods inPROC UNIVARIATE(through the use ofPCTLDEFparameter):
type |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
PCTLDEF |
3 | 5 | 2 | 1 | n.a. | 4 | n.a. | n.a. | n.a. | n.a. | n.a. |
na_rm: (see above) true isyes, false isno.
Python method
>>> q = quantile(x, probs, na_rm = False, type = 7,
method='DIRECT', limit=(0,1))
Arguments
x: input 1D (vector) data (numpy.array,pandas.DataFrame, orpandas.Series); 2D arrays are also accepted;limit: (option) tuple/list of (lower, upper) values; values of a outside this open interval are ignored.
Returns
q: 1D vector of quantiles returned as anumpy.array.
Notes
probs: (see above) the following codes: 2 orM2, 3 orT3, 4 orQu4, 5 orQ5, 6 orS6, 10 orD10, 12 orDd12, 20 orV20, and 100 orP100` can be used to compute common specialised quantiles (median, terciles, quartiles, quintiles, sextiles, deciles, duo-deciles, ventiles and percentiles resp.);method: (see above) in the casemethod=INHERIT, thescipy::mquantilesfunction is used to estimate quantiles; this case is incompatible withtype<4(see below);type: (see above) methods 4 to 11 are available in originalscipy::mquantilesfunction;na_rm: (see above) true isTrue, false isFalse.
R method
> q <- quantile(x, data = NULL, probs=seq(0, 1, 0.25), na.rm=FALSE,
type=7, method="DIRECT", names= FALSE)
Arguments
x: a numeric vector or a value (character or integer) providing with the sample data; whendatais not null,xprovides with the name (char) or the position (int) of the variable of interest in the table;data: (option) input table, defined as a dataframe, whose column defined byxis used as sample data for the estimation; if passed, thenxshould be defined as a character or an integer; default:data=NULLand input sample data should be passed as numeric vector inx;probs: (option) numeric vector giving the probabilities with values in [0,1]; default:probs=seq(0, 1, 0.25)like in originalstats::quantilefunction;na_rm, names: (option) logical flags; ifna.rm=TRUE, any NA and NaN’s are removed fromxbefore the quantiles are computed; ifnames=TRUE, the result has a names attribute; these two flags follow exactly the original implementation ofstats::quantile; default:na.rm= FALSEandnames= FALSE.
Notes
method: (see above) in the casemethod=INHERIT, thestats::quantilefunction is used to estimate quantiles; this case is incompatible withtype>9(see below);type: (see above) methods 1 to 9 are available in originalstats::quantilefunction.
Returns
q: 1D vector of quantiles returned as a numeric vector.