The Fraught Utility Of Vulnerability Disclosure Databases

Do we need vulnerability databases? Are the ones we have working? Should we do something else? How can we improve our overall approach to the “WTF is going on?” problem?

My strong bias is toward the scientific method — which requires open inquiry and easy access to knowledge — and against unreliable or false metrics. I also strongly resist any make-work or boondoggling that is not directly relevant to understanding how software works and fails, and making it work more goodlier.

So, I tried to think about vulnerability databases, and what all we might want from them. First, some definitions.

Definitions

Developer:
An organization which (or lone hacker who) develops software.
Developer communications:
Communications from developers about vulnerabilities, including bug trackers, release notes, Knowledge Base articles, code review and CI/CQ, et c.
Researcher:
An organization which (or lone hacker who) hunts for vulnerabilities in software.
Researcher communications:
Communications from researchers about vulnerabilities, including bug trackers, advisories, blog posts, exploits, and Twitter threads.
Deployer:
An organization or person who is using some software to achieve a goal.
Vulnerability database program (VDB):
An organization that tracks, describes, and/or issues alerts for vulnerabilities.

Vulnerability Databases

What might we want in a VDB?

I made a rough comparison of 4 sources of vulnerability information:

Source Information Searchability Authoritativeness Alert Quality Overhead
CVE Poor Good Low, due to poor information Highly varying1 High
OVE None; provides only IDs Good None; provides only IDs None; provides only IDs epsilon
Developer communications Highly varying2 Good ‘Should be’ ideal but varies with information quality3 Good None beyond what is inherently necessary
Researcher communications Varying; often good Poor Varying; sometimes good Varying; sometimes good None beyond what is inherently necessary

From this I observe a few things:

Developer communications have the best ability to meet all our requirements: developers (should) have the best knowledge about the software they create, full information about the nature of the bug, full information about the fix, and full information about remediation. Sometimes developers do meet our requirements, and that is great. Ideally, they always would. All too often, they don’t, and keeping communications high quality requires constant effort and skill from program managers.

Researcher communications have a great ability to meet our information requirements in particular. Sometimes they do, and that is great. Sometimes, they can be more authoritative than reticent developers.

CVE’s clearest benefit seems to be an authoritative source of unique ID numbers, plus whatever information the developer might provide (usually very little). But in my experience the coordination cost is high for developers, and as a result developers often minimize their use of CVE. Hence OVE: the argument goes that if all CVE reliably does for us is make numbers, well, we can do that far more cheaply.

We might benefit if program managers of VDBs stopped accepting poor, late, and un-actionable information from developers. The CVE program, as the most widely recognized VDB, has an opportunity to raise the bar across the industry by calling out such reticent developers, citing the needs of and benefits to the public and basic science.

For example, imagine if a VDB tagged entries with a message like “Developer declined to provide meaningful information” when the vendor provided a meaningless description of the vulnerability. That might exert some salutary pressure on developers.

We would benefit if VDBs made it easier for the developer to commit current information to the database. For example, CVE-2022-2294, which is currently being exploited in the wild, is documented in Chrome’s 4 July 2022 release notes, but the CVE entry as of 8 July contains no information, saying only:

** RESERVED ** This candidate has been reserved by an organization or individual that will use it when announcing a new security problem. When the candidate has been publicized, the details for this candidate will be provided.

Perhaps the CVE entry will be eventually consistent with the Chrome release notes, hopefully including a link to the bug tracker. (Chrome policy is to make security bugs public 14 weeks after the fix has shipped, so a link to the bug tracker will become valuable in time.)

Prioritizing Vulnerability Response

What do we want in a vulnerability ‘scoring’ system? (Do we want a vulnerability scoring system?)

As an experiment, I imagined a hypothetical easy to use, network-based, denial of service (DoS — not DDoS) attack, and tried to score it with CVSS. I assumed there is an existing exploit that doesn’t completely take down a service, but causes it to consume lots of time and/or space.

For example, imagine a database query that, for some reason, is slow in a given database engine. It is for some reason (less than SQL injection, more than just using the site normally) remotely-triggerable. Perhaps an attacker can make some unauthenticated web request that invokes this expensive query, and it’s expensive only because the query planner has a bug — normally, the query would be efficient4.

The CVSS vector I got is

AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:L/E:F/RL:U/RC:C/CR:X/IR:X/AR:H/MAV:N/MAC:L/MPR:N/MUI:N/MS:U/MC:N/MI:N/MA:L

which scores 6.0 in NIST’s CVSS 3.1 calculator.

Since that string is just noise, let’s look at a graphic:

A screenshot of
the NIST CVSS calculator showing the hypothetical bug with a score of
6.0
More readable, though not more informative.

What does “6.0” mean? Is it high enough to call the on-call SRE? Do we only get out of bed for 7.5 or higher? (Why 7.5?) Is this bug bad enough to call the vendor to complain — or sue?

Some people might use CVSS to make that kind of decision. It is, after all, a score telling you how severe a problem is.

But there is no single threat model, so there can be no single score that suits all audiences. Not everyone who uses the DoS-able database engine makes that kind of inefficient query. Not all deployers of vulnerable platforms need to worry, even if the bug is present — maybe their servers are overprovisioned relative to their load.

But what about a shopping site? What about a shopping site during the winter holiday season? Such a deployer can put a concrete dollar value on the cost of downtime, and that cost changes from quarter to quarter. In turn, that will change how the deployer prioritizes different vulnerabilities.

Nor do the scores map to real-world costs and risks — will a vulnerability with a CVSS score of 10.0 cost you twice as much as (or, say, 5 orders of magnitude more than) one with a score of 5.0? The question is nonsensical because nobody has the same cost model, either.

The CVSS people are aware of these problems, and have tried to address them. From the CVSS User Guide:

2.1. CVSS Measures Severity, not Risk

The CVSS Specification Document has been updated to emphasize and clarify the fact that CVSS is designed to measure the severity of a vulnerability and should not be used alone to assess risk.

Concerns have been raised that the CVSS Base Score is being used in situations where a comprehensive assessment of risk is more appropriate. The CVSS v3.1 Specification Document now clearly states that the CVSS Base Score represents only the intrinsic characteristics of a vulnerability which are constant over time and across user environments. The CVSS Base Score should be supplemented with a contextual analysis of the environment, and with attributes that may change over time by leveraging CVSS Temporal and Environmental Metrics. More appropriately, a comprehensive risk assessment system should be employed that considers more factors than simply the CVSS Base Score. Such systems typically also consider factors outside the scope of CVSS such as exposure and threat.

CVSS 3.0 and greater were (presumably) devised to address the problem of nonsensical scores, such as that Heartbleed — a bug that lets unauthenticated internet attackers read secrets out of a server’s memory — scored only 5.0 at the time. (Click the CVSS Version 2.0 button to see it.) At least in the case of Heartbleed, CVSS 3 results in scores that seem more ‘intuitively accurate’ — to those of us assuming a particular class of threat model.

However, I find that section quoted above to be a bit of a cop-out, given how people have reported using CVSS to me. People are using it to make operational decisions. It also feels insufficient: it’s not just that risk is different for different people at different times, it’s that severity can vary too!

Imagine that a hypothetical shopping app deployer has deployed the DoS-able database such that each query runs in a sandboxed and resource-limited process. The deployers have tested their sandbox resource limits such that 99.99% of true shopping queries succeed, while queries that exceed the memory limit or use more than some number of milliseconds of compute time are killed. For this deployer, the severity of the bug goes way down, nearly to zero, even though the cost of successful attack has stayed the same. This deployer has effectively mitigated the bug. (This deployment strategy can mitigate many potential bugs, and can change how the deployer prioritizes a wide variety of vulnerabilities.)

Another problem with CVSS is its false precision. If you look at the calculator, you’ll see that the ‘measurements’ you can make about a vulnerability are of very coarse ‘precision’, e.g. None – Low – High, or (for Exploit Code Maturity) Not Defined – Unproven – PoC – Functional – High. The measurements are in tertiles, quartiles, and quintiles, yet the calculator produces results purporting 2 significant figures (e.g. 6.2). This is an illusion produced by the arithmetic of the CVSS scoring procedure, not actual measurements of real bug severity.

I have my doubts about whether the severity of vulnerabilities can be scored at all, especially without lots and lots of deployer-specific context. Even with that context, you still also need a well-grounded cost model — but it is very difficult to get one. Not all users that a given deployer is serving will share a given model, so you many need many cost models. And then you need a way to balance the concerns of all your constituents — another complex and hard-to-ground model. In real life, people make risk decisions much more qualitatively than we or they would like to believe.

That doesn’t mean we shouldn’t strive for well-grounded quantitative models! Just that we need to be prepared to act without them, and that CVSS is not one.

If we do away with the spurious numbers and just treat CVSS as purely qualitative — which it is, and which is fine! — we’d have a more honest and safer-to-use system. (Nobody is really worrying about the difference between 7.6 and 7.4 anyway. At least I hope not.) The basic qualities that CVSS encompasses are all important and useful, and account for many of the desiderata at the top of this post.

Conclusion

The combination of CVE + CVSS gives us some of what we want, and we could have more of it at lower cost if any of a few magical things happened:

However, it will never be possible to beat the information richness, searchability, or authoritativeness of a well-run developer communications program. (This is especially true for projects that are open source as well as being well-run.) Also great are well-run researcher communications programs — Taxonomy Of In-The-Wild Exploitation was only possible because so many researchers wrote so many great blog posts and PoCs. (Thank you!)

Additionally, there will always be vulnerabilities that are known and fixed but which don’t get VDB entries. In my experience, the majority of vulnerabilities go un-numbered, and for those vulnerabilities, this whole discussion is moot. This is not a fault of any VDB program: although reducing the friction of working with the program would help, we will always need to prepare for vulnerabilities that aren’t announced or tracked. You never get perfect global coordination, no matter how low the friction. And sometimes developers don’t even realize (or want to admit) that they are fixing a vulnerability (as opposed to just a regular bug). And sometimes their own bug trackers are already easier to use and more useful than a global database.

Therefore no VDB can ever be the sole trigger for action on the part of deployers. The only reliable way to get all the available fixes is to track the latest stable version. No matter how good any VDB gets, that will always be true — and deployers who do so will be insulated from gaps and mistakes on the part of developers and of VDB programs.

For the sake of public safety — and, honestly, just for the pride of engineering excellence — we must improve the quality and discoverability of vulnerability information, and reduce the cost of providing and getting it. There’s a lot of room for us to do a lot better as a community. The status quo is not working.


1. When CVE alerts are of low quality, it is not typically the ‘fault’ of the CVE program itself. Software development organizations must provide timely, relevant, and actionable information; if they don’t, there’s not much the CVE program can do.

2. Some developer bug trackers are great and have most or all of the properties we want. This is typically, but neither inherently nor historically only, seen in the trackers for open source projects. But other developers provide very little information, intentionally hide information, or don’t even have bug trackers at all.

3. When the developer’s bug tracker is information-poor, then researchers’ bug trackers, advisories, and blogs become more authoritative.

4. I once really had a bug like this, on a security review engagement years ago. The client told me that their biggest fear was that 1 tenant in their multi-tenant platform would starve other tenants of resources, so they had strict quotas around CPU time and memory allocation. I was able to make the string allocation routine in their language runtime go quadratic in a way that their quota system couldn’t see, but which I could and did time remotely. I fired off a few pathological requests, and the development server became unresponsive. So I wrote it up and went to lunch, since I couldn’t test any more. For this client, it was the most high-priority bug to fix; but for others, it might not matter as much or at all.