Jump to content
Science Forums

Software Risk Analysis


Pyrotex

Recommended Posts

I can calculate the POC to a reasonable level of confidence. Without ever seeing the code. :)

 

That's what I was saying! :)

 

The other approach I was thinking about was just assuming that you could use those industry metrics on bugs in software (per line of code, per hour of programmer effort, per angel dancing on the head of the CPU, etc.) and then simply come up with a methodology for determining the probability that the *hardware* providing the *inputs* to the software would produce values that were catastrophic to the software.

 

That pushes the whole thing back into the hardware realm.... ;)

 

I just drive em', I don't know what makes 'em go, :hihi:

Buffy

Link to comment
Share on other sites

That's what I was saying! :)...
You are sooooo close that I can see the freckles on your nose. :)

 

I cannot give details, but basically, my method depends on the fact that for 30 years, detailed metrics were kept on number of bugs found at each stage of development, testing, simulation and flight-testing; and on the severity of those bugs; and under what conditions of nominality they occurred.

 

With a little sleuthing, one can develop a model that feeds the total number of bugs expected in the entire system in one end, and at the other end outputs the chance that an undiscovered bug of highest severity gets executed under conditions which are not recoverable.

 

...after every possibility has been eliminated, whatever is left, however improbable it may be, must be the solution. :)

 

And that's all I'm gonna say about that! :hihi:

Link to comment
Share on other sites

And that's all I'm gonna say about that! :)
Obviously a massive government/NASA conspiracy to sap and impurify all of our precious bodily fluids! :)

 

 

The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents, :hihi:

Buffy

Link to comment
Share on other sites

Obviously a massive government/NASA conspiracy to sap and impurify all of our precious bodily fluids! :)...

Make up your cotton-pickin' mind, Buffster. :)

 

Do you want your bodily fluids sapped or impurified.

 

Either one can be arranged. :) :hihi: :eek_big:

 

Pyro the Government/NASA Toadie

Link to comment
Share on other sites

  • 1 month later...

Thank you muchly, Buffster_1. :hihi:

 

So basically this is what's up. The target software (TSW) was originally designed and built 30 years ago, and has gone through about 50 versions since. Detailed statistics were kept on errors found at various stages of development: source inspection, system and subsystem testing, full integrated tested with full-up hardware systems and real-time simulation with people in the loops (let's call them "pilots"). These errors were classified as to their severity, 1,2,3 -- with "1" meaning total catastrophe.

Error statistics were also kept for actual "usages" (let's call them "flights") of the TSW--that is, errors that occurred during real "flights".

As it turns out, the probability that an error will make it through all the testing and finally execute during "flight" can be correlated to the probabilities that errors will execute during simulations and during integrated testing.

We know: the probability that a source line of code will contain an error;

the probability that it will be caught at various stages of development/testing;

the number of new or modified lines of code at each version;

the probability that a "flight" will be nominal, thereby NOT executing those portions of TSW intended for handling bad events.

Shuffle this all together, and for a particular version, you can calculate the probability of having a Real Bad Day (RBD) in an actual "flight".

Now we're looking at the degrees of uncertainty in our calculations.

 

Pyro the Probable

Link to comment
Share on other sites

  • 1 month later...

Brief update --

Game Over, I Win. :naughty:

 

I did in fact come up with a SW Probabalistic Risk Assessment model using 30 years of quality management data and statistics. It is a sequence of seven failure events:

 

1. Probability that a single line of code (SLOC) contains a bug

2. Probability that visual code inspection fails to find the bug

3. Probability that module integration and testing fails to find the bug

4. Probability that full-up system simulations fail to find the bug

5. Probability that the the bug executes while in flight

6. Probability that the bug is a critical-1 failure (the worst kind)

7. Probability that backup procedures fail to correct

 

We managed to find enough historical data to calculate most of these event probabilities directly. For events 5 and 7, we found "proxie" data in the training simulation QM data sufficient to generate close approximations.

 

I gave our pitch to the tip-top muckity-mucks and sahibs last Monday. And it went extremely well. Our results interested a lot of folks and there were several calls asking for the detailed report we published. Plans are being made to take our pitch to the companies who are building the next generation software. Mucho visibility. Mucho praise from unexpected quarters. :hihi:

 

Now... where do I go from here???

Link to comment
Share on other sites

Game Over, I Win. :smart:
That was a foregone conclusion! :)
Now... where do I go from here???

Find Critical Event Number 8 that reduces the standard deviation by 50% and patent it...

 

There is always one more feature, :alien_dance:

Buffy

Link to comment
Share on other sites

Have you published it in a publicly available place (other than this hypography thread, of course)? If not, if you have the personal bandwidth, and no IP/trade secret entanglements, I’d recommend that as your next goal. In short, satisfy the wikipedia notability criteria – then, of course, craft a nice wikipedia article for it. :smart:

 

It’s a nice model, reminding me of the Drake equation, and having, I think, the potential to be as useful and as well-known.

 

I’ll do my personal part to promote it Monday, when I have a couple of meeting with project test teams, who are always hungry for tools to quantify risk.

 

PS: :alien_dance: In the past 2 weeks, I’ve discovered a couple of appalling datapoints contributing to my personal event 1 probability, including one that’s been being touched over a hundred million times a day for the past 3 years with the potential to affect the survival of over half a million people! Fortunately, its failure mode didn’t occur until the recent test of some new software, where our apparently high event 4 probability made me aware of it, so the live world was never at risk.

Link to comment
Share on other sites

That was a foregone conclusion! :)...

There is always one more feature...

Thank you muchly, dear Buffster. Your confidence in me is touching.

 

The one thing folks ARE requesting is a sensitivity/uncertainty analysis. We start Monday on discussing the ways and means of doing that.

 

The big deal of our analysis and pitch was in finding the degree to which "risk" is determined by the "maturity" of the ground test and verification processes: the odds of flight SW causing a Real Bad Day can vary from 1 in 100 for the first coupla flights -- to 1 in 2000 after 20 years of getting one's act thoroughly together. That really raised some eyebrows.

 

Some pressure is now being applied to the next-gen SW developers who had intended to "roll their own" ground test and verification process AFTER the SW was developed. You can't do that and keep risks down. The ground stuff must be totally in place and being used diligently from the first phase of SW development, or you face doing your first flight with a low-maturity system.

 

{shiver} :smart: :alien_dance:

Link to comment
Share on other sites

Have you published it in a publicly available place (other than this hypography thread, of course)?

...with the potential to affect the survival of over half a million people! ....

No, I haven't. Currently, it is still an internal document. I won't think of publishing until the muckity-mucks and sahibs grant permission.

 

Half a million? :alien_dance: :smart: :) Craig! What do you DO?? Write SW for fast breeder reactors in the middle of Amarillo, Texas??? :lol:

Link to comment
Share on other sites

…with the potential to affect the survival of over half a million people! …
Half a million? :eek_big: :D :eek_big: Craig! What do you DO?? Write SW for fast breeder reactors in the middle of Amarillo, Texas??? :doh:
Mostly, I make grandiosely exaggerated claims on the internet. ;)

 

My defective line of code of note was in a piece of basic string manipulating code for an interface engine used by a 3-state region of a clinic and hospital system serving about half a million people. The event 6 - the bug is a critical-1 failure proability, even had its failure mode occurred, was pretty small. Most of them would have resulted in XML messages being incorrectly rejected as misformed, causing some support person to curse the system, and, ultimately, me. Less likely would have been something like sending you to the wrong examining room at the wrong time. Worst case would be a swapping patient identities to give you someone else’s lab test results, a medication to which you’re allergic, etc. Doctors, nurses and such being pretty well trained not to trust technology in life-or-death situations, their human contribution to event 7 - backup procedures fail to correct - keeps its probability pretty low.

 

In short, I exaggerated badly in claiming my code is critical to the survival of half a million people. Still, reliability is a big concern in medical software and hardware – causing injury or death of even one out of half a million people is not an experience any programmer or other medical IT person wants. Unlike in the technically sexier world of spaceflight software, I’m unaware of any initiatives like yours to systematically quantify the software risk component in healthcare (though there is a lot of study of injury and death causing error of healthcare systems as a whole), but his rocket-explosion-predicting model seems well suited for adapted to healthcare.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...