Common Descriptive Analytics for Security Data 3: Quartile Analysis

Similar to Cross-Sectional Analysis, both require the analyst to select a collection of attributes to examine Then identify a suitable grouping and aggregation strategy.
Finally, insight springs directly from the contrasts uncovered
by attribute-by-attribute comparison.
Instead of considering all records in the aggregation equally,
Quartile analysis takes extra steps
by ranking each aggregated result (e.g. risk score) into four “quartiles”.
– The first quartile represents the best 25% – The second quartile cuts off the best 50% –
The third quartile cuts off the top 75% –
The forth quartile represents the worst 25% Quartile Analysis is Very powerful,and easy-to-understand.
Aggregated statistics between the 1st and 4th quartile are usually dramatic and revealing.
Let’s look at this simple sample quartile data of 16 random numbers.
This Figure shows an unsorted set on the left, and on the right,
the same set after ranking into quartiles.
By ranking each attribute into quartiles, the analyst gains a broad understanding of which
“buckets” each record falls into: best, worst, or something in between.
Again, this table was excerpted from Andrew Jaquith’s book “Security Metrics”.

This was was a result from a study of network vulnerability,
which exhibits this quartile summary data.
Notice how the table nicely summarizes how the four quartiles of clients performed relative to each other for both vulnerability counts and BAR index scores.
Now that is macro-level behavior.
Quartile summary statistics “cluster” performers with similar characteristics
into a small, discrete set of buckets.
As such, quartile statistics possess a great deal of analytical power and play an important role in bench-marking.
Sometimes, though, management does not want
to analyze all the data; they simply want to make decisions.
First-versus-fourth-quartile analysis-that is,
discarding the middle two quartiles-permits extremely rapid conclusions to be made based on the data at hand.
Discarding the middle quartiles sharpens the contrast significantly by focusing on the outliers; the message comes back reflected as if through a fun-house mirror.
For example, the data from this table reveal vast disparities between the first and fourth quartiles: nearly twelve times the network vulnerabilities, and an even larger differential in business-adjusted risk.
When presenting first-and-fourth analyses across a large number of attributes (for example, six,
ten, or more), root causes pop out almost immediately.
In almost all cases, healthy discussions-and action plans-follow the presentation.
Quartile time series chart uses quartile information from data sets to show broader measures of performance over time.
Time series chart is a very popular data presentation technique for security metrics.
It is the most easy to understand information graphics.
This Time Series Chart simply graphs the 1st, 2nd and 3rd quartiles.
The horizontal axis is time, the vertical is BAR score (Business Adjusted Scores).
The thick line represents the median values that separate the second and third quartiles,
and the thin line above the median separates the third from the 4th.
Based on the positions of the lines, you can quickly identify the correct quartile
that any other data points falls into.
The period from 2000 until 2001 had the most dramatic improvements (a 50% drop in median scores) Since 2001, median scores have stayed fairly flat.
The worst application (4th percentile) demonstrated continuous improvement through all periods.
All quartiles appear to be converging, which means that application security BAR scores are generally improving
across the board (specifically, the difference between the 1st
and 3rd quartile lines decreases over time).
The 1st quartile as worsened in the most recent year relative to the previous one.
Very useful for answering a common question from management
about a particular item (“How did we do?”) by plotting additional data points to show which quartile they fall into (combing with scatter plot showing the scores for selected data points in the set.
It can also create the “You are here” bench-marking chart
by adding a horizontal line representing the score
for a particular data point being bench-marked.

Reference: Tong Sun,
Adjunct Professor of Computing Security
Rochester Institute of Technology

Author: McPeters Joseph

Joseph McPeters is a Security Researcher. He specializes in network and web application penetration testing. Contact:

Leave a Reply

Your email address will not be published. Required fields are marked *