For the purpose of the discussion, we are using a simple data set here (excerpted
from Andrew Jaquith book on Security Metrics).
This is a table of application defects data set that was collected during a recent risk analysis on its in-house developed application.
The goal is to analyze the level of security of all assessed applications.
Each row is a record, that embodies a core security event or concept,
such as a security defect, privacy violation, open port, or detected virus.
The core event we are recording is a “Application defect”
with the following attributes: Name of Application, Department
of business owner, Name of defect, Exploitability (index score; for example,
based on CVSS- a cross industry proposed standard that provides a consistent method of “scoring” the seriousness of computer security vulnerabilities.)
The higher the number, the easier it can be exploited Impact
if exploited (index score based on asset value) Business adjusted risk (index score,
product of exploitability and impact) Cost
to remediate (estimated engineering hours) This table shows a sample data set containing 27 records from a single quarter Q1 2005.
First, let’s see the grouping and aggregation example.
Non-numeric attributes are aggregated by counting them.
For instance, “Applications”, “Owners” and “Defects”.
We can also group them by “department”.
That would have produced three sets of aggregate results (one for each group).
For numeric valued attributes, we can generate the “summary statistics” numbers, such as mean,
median, standard deviation and also boxplot diagrams.
For a small data set, an Excel PivotTable should work fine.
For larger data set, simple formulas will not do, consider more sophisticated tools,
such as R, Python, SAS, SPSS, or a business analysis tool combined
with relational database storage.
Time series analysis, quartile analyses, and correlation analysis rely heavily
on grouping and aggregation techniques.
Time series analysis refers to the technique of attempting
to understand how a set of data behave over time.
More specifically, a time series contains a series of observations
for a particular attributes, measured at regular intervals.
Time series are generally grouped and aggregated by a desired analysis interval (e.g. hours,
days, weekly, or monthly) depending on the data frequency
and the granularity of the analysis efforts.
The analysis interval should be sufficiently precise that it lends insight,
but not so detail that overwhelms the reader.
A typical analysis interval for most security metrics worth measuring is monthly
or quarterly intervals.
1) First, Grouping and aggregating all records within the desired interval of time,
here we use monthly interval
2) After aggregation, we can sort the result set by date,
generally in ascending order so that earlier entries appear first.
Key Observation from this time series analysis: The number of security defects declined from 14 to 5, suggesting application security improved overall during
that period. Although the overall number of defects decreased,
those detected in later periods were of higher impact and were slightly easier
to exploit. The amount of engineering time required to fix defects increased
over time (from 14.9 to 28.2), as did the standard deviations of the estimates.
This in turn suggests that easy-to-fix problems were likely fixed early,
estimates for the remaining, harder problems have become more consistent over time.
Time series analysis is an essential tool in the security analyst’s bag of tricks.
It provides the foundation for other types of analysis.
When combined with cross-sectional analysis and quartile analysis,
it provides the basis for bench marking.
If time-series analysis attempts to understand how an attribute varies over time,
cross-sectional analysis asks how the data vary over a cross section of comparable observations.
If we slice a set of records by a particular attribute, what happens to the other attributes?
Cross-sectional analysis involves three steps.
First, the analyst selects an attribute to use for creating the cross section-that is, an attribute to slice with.
Typically, textual attributes such as department, industry, or categories make good cross-sectional attributes.
After selecting a suitable attribute, the analyst groups and aggregates the data.
Finally, the real fun begins: analyzing the results.
For instance, suppose we want to analyze the performance of each department.
Using the owner attribute as the cross-sectional attribute, then conduct a suitable grouping and aggregating strategy for the cross-series analysis.
Two summary statistics:
- MEAN DEFECT PER APPLICATION
- MEAN BAR PER APPLICATION
Both of these metrics help us understand how well each department writes code.
Key observations: Operations (application group) appears to be worst off,
based on the number of defects per application (11)
and the risk carried by each (BAR score of 128).
Compare this to Sales, whose applications contained just three (on average)
and a BAR score 80% lower (24.5).
In addition to having fewer defects per application overall,
Sales defects tended to be consistently less serious.
Note how the Sales department’s mean BAR per defect (8.2) was considerable less than those of Operations(11.6) and IT (13.4), and had a lower standard deviation (2.5 vs 6.1, and 8.9). Although IT sits between Operations and Sales with respect to average defects and BAR scores, its defects tended to require the least amount of engineering time to fix (8 hours compared to 29.8 and 11).
This suggests that IT has more “quick hit” opportunities and may have more potential to improve its score in the future relative to its peers.
Trivial though this example may seem, it should demonstrate to you
that cross-sectional techniques provide a powerful way to compare security effectiveness by organization, security topic, or other attributes.
Suppose the sample data set contained an attribute that classified each security defect (such as authentication, encryption, and user input validation).
A cross-sectional analysis might compare the incidence rate by department for each type of defect.
Managers could use knowledge of the “trouble spots” to better target developer training.
Departments could also “compare notes” and share knowledge of best practices.
Reference: Tong Sun,
Adjunct Professor of Computing Security
Rochester Institute of Technology