Jump to content
Science Forums

Software Risk Analysis


Pyrotex

Recommended Posts

NOTE: The software doesn't have to FAIL, necessarily. It could be doing exactly what it was designed to do. Remember, there are other dimensions: the Environment and the Operations of human beings.

 

Hi Pyrotex,

 

I've always had the philosophy in software development that if the user absolutely positively doesn't have to provide input, then don't let them;).

 

(1) The Environment varies over time.

(2) Functions are constant over time or vary depending on environment? (not clear)

(3) In certain environments machines cannot operate.

(4) In certain environments humans cannot operate.

(5) Human Operation varies with environment over time.

 

So the first step is to create a machine (or a set of rules for one) that will be able to transport goods from A to B that will not damage the computer(s) controlling all of the above. i.e. only points (1), (2) and (3) at the start.

 

Once this model is 'working' you can add all the other bits (perishable goods) and also have a good set of override rules for those pesky humans.

 

Also, the calculation of your risk number could either be on a fixed time basis or a fixed event basis (i.e. a combination of variant factors trigger the risk calculation).

Link to comment
Share on other sites

Relative to Buffy's Black Arts reference...

 

Alternative Metaphors:

 

1) If I was out to assess whether any given tree was going to fall over or not, one good assessment would be to know its root structure. I can't necessarily know from the surface what a live tree's root structure looks like. I don't have x-ray glasses to look at its roots. But I might know from experience of digging up other trees that when a tree looks like an oak that its root structure generally mimics its upper branches and is very solid.

 

By contrast if it looks like a palm tree then I know its root structure will likely be a ball of roots that is not all that deep.

 

By assessing enough tree types and figuring out identifying markers then I can use those markers to make a best guess as to what an unknown tree is like and therefore what its chances of root failure are.

 

Basically I am assessing the character of the tree.

 

If a tree falls in a forest and no one is around to hear it,

do the other trees still make fun of it?

 

2) A similar metaphor is that of assessing whether a house will develop structural cracks or not. In this case assessing its foundation and the type of ground it was built on will go a long ways towards assessing the character of the house. Was it built on rock, clay, or sand? Did they use rebar? What do samples of the cement say it was made from?

 

3) People can instantly size up the character of a person by their shoes they are wearing and their fingernails. Some other characteristics like hair style, clothing, how they carry themselves etc also factor in.

 

4) The Meyer Briggs can do a good job of assessing the archetype of a person - which is designed to be neutral in terms of the expected success or failure of the person. But assessing whether a person has an addiction tends to be a fairly reliable indicator of their... well... reliability. :rolleyes:

 

5) If I hear a person say "Well Number 1 I think ___ and B I think ____" I instantly begin to doubt their reliability and consistency. (I know that Buffy has had a few similar reasons to doubt my own reliability ;) )

 

5) I can look at a piece of source code and in about 5 seconds tell you the character of the programmer behind it.

  • Was he consistent
  • did she follow typical coding conventions
  • did he use variables like vstrq or vector_strength_quotient
  • Did she use buffers, pointers, meta data tables, encryption, hashing
  • Is it sophisticated or simplistic
  • Does it look "tight" (like a well practiced music band)

 

If we would give it a score of "well written" then its reliability could be expected to be higher. If we would give it a score of "poorly written" then its reliability could be expected to be lower. Granted these can be difficult to automatically asses. I'm just putting this out for the total concept of assessing the character of things.

 

Now considering we should expect professional code to all be well written, and tested, we could still be looking for subtle characteristics.

 

 

To reel all this back in, I think that recognizing basic syntax (If, Loop, Pointer reference) can help assess the character of a code module. Also, as Buffy said, the Function Points of how many inputs, outputs, files used etc help assess the character of the code.

 

So if these various syntax elements can be individually walked and assigned a "not risky" to "highly risky" score, then some level of summation of the risk for the module should be usable as a general assessment.

Link to comment
Share on other sites

So where are we going with all this?

 

Lets say a perfect software risk algorithm is created and we can assign nice neat scores to each module. If we discover high risk modules, what will be done with them?

  • Will they be assigned additional scrutiny and/or testing?

  • Do they need to have backup code created and run in parallel like the shuttles 5ish side by side decision makers?

 

Knowing what will be done about it can help reverse engineer and tune what is trying to be scored and how.

Link to comment
Share on other sites

Lets say a perfect software risk algorithm is created and we can assign nice neat scores to each module. If we discover high risk modules, what will be done with them?

 

Knowing what will be done about it can help reverse engineer and tune what is trying to be scored and how.

 

Hi Symbology,

 

At the moment the shuttle has been grounded because of a faulty fuel guage that shows empty when it's nearly full. NASA knows that the fuel tank is nearly full because they have put so much fuel in without it leaking out but the guage disagrees. The temperature environment in the tank changed when the fuel started to be added. This probably has something to do with the problem.

 

This type of problem goes beyond software risk analysis and the solution is a holistic combination of all the factors that directly contribute to the problem, before the problem develops (i.e. environment, function, operation etc).

 

I think this is what Pyro is after, a generic early warning system for human intervention if the software stuffs up by going to or past its boundaries, because if it knew it was stuffing up it could fix the problem itself without any intervention from anybody (which is ideal but not realistic due to the nature of the problem(s)).

Link to comment
Share on other sites

Hi, back from a mini-leaveation ( leavation - a leave in a vacation sense, only where you didnt really go anywhere, and were working the entire time)

 

It's not even about FINDING bugs or flaws.

Right, that is what software vulnerability assessment is for by many various means.... ;)

 

Ok, so thoughts as to how you can do Software Risk Analysis...

 

Firstly i need to tell you that it is not something that can have concrete mathematics, nothing like probablility, although some math may be needed. In all reality this one of the topics i could write a thesis on, and although i dont know it very extremely well, i already have lots and lots of thoughts as to what factors should contribute...

 

so thoughs:

 

First of all, just how do you quantify software risk? It is a highly complex set of factors, and many a things need to be looked at before you can begin... Remember some of these pertain to an organization, so if you are auditing some software for somebody else, you may or may not apply some of the points below.

 

Application - what the apllication for the software is, what kinds of data it holds, what kinds of settings there are, permissions, so forth... (on a scale of pointless data to high risk stuff)

 

People - Who is responsible for the software, who has the ability to change data, who has the ability to view data, who has the ability to add data, within all those factors a person using this software may be the biggest determining factor for software risk... (more people, more chance of a mishap)

 

Importance - related to application, importance is more towards the line of, what will happen if a part or parts of this software stop functioning, how damaging would it be to a business. For example, if the software stops displaying data, would it hinder the performance of a business using it, if so what amount of hinderance are we talking here? remember consider short-term and long-term cases... what if there is a miscalcualtion in the output... all that stuff. (a combination of numbers and verbal assessment is here)

 

Software security - yes you need to look at this, if you are a high class pro, you should assess how secure the software is, if you are not, look at the regularity of fixes administered, try to find patterns in them. Best yet try to find patterns in fixes, and do your own assessment, various ways to do so, it's best to try a few, like fuzzing... (on a level of not secure to secure)

 

Costs - for every question under importance, what would be the worst-case cost of anything happening in that category, factor in the code and software security sections. Also remember that time is also a cost, possibility of a lawsuit is another one. (you can get solid numbers out of this)

 

Data Security - determine where the data is stored, how it is stored, if its backed up, people that have access to those places, and determine the probability of data loss, including hardware security, including natural processes, fire, thunderstorms, etc... (for some of this, you can do hard statistical analysis with complex math ;) )

 

 

look at this too Software Metrics Program for Risk Assessment

 

and lastly this Software Risk Assessment

 

both of the links have some interesting info, some of it will repeat me, some of it will be new...

 

i hope some of my thoughts dont sound too crazy :)

Link to comment
Share on other sites

No, it doesn't sound crazy at all. However, not all of it applies to my situation. Risk from hackers/crackers & vandalism is not an issue here. Analyzing the risk (probability) that a piece of software will "fail" in some way, leading to disaster ("crash!") is a very difficult problem that has been addressed only in limited cases.

 

One way of doing this is by Bayesian Updates. This can be summarized by the question, "Given that the probability of failure was initially F1, what is the probability of failure now (F2), now that the software has been used X times without a failure?"

 

--optionally--

 

"...now that the software has been used X times and it has only failed Z times?"

 

In certain very constrained cases, where there is a mammoth body of real-world experience with the software, this is a successful approach. You start off by showing that after integration testing, 5 lethal bugs were found, and we estimate (loosely) from this that the risk of disaster is 1 / 200.

 

After 1,000 actual runs of the software, we have 1 disaster. We do the math, and this yields a new risk of disaster is now 1 / 465.

 

After another 1,000 actual runs, with no disasters, the math yields an updated risk of 1 / 970. And so on, and so on. Presumably, the longer this goes on, the more accurate the updated risk probability is.

 

But with avionics software, say, like the stuff in the Shuttle, you just cannot run it often enough to generate the kind of disaster data you need.

 

Real-world execution of software in a challenging environment (guiding a jet with terrain surveillance radar, or guiding a re-entry vehicle into the atmosphere, for example--where "disaster" means Loss of Vehicle) operates in rapid interface with a highly dynamic environment, which cannot be totally predicted. The volume of input parameters attempts to model the external environment and vehicle state, but never can completely. What if the software is asked to handle 2 boundary problems ("edge of the model") at the same time? How rare an occurrence is that?

 

Testing HW essentially performs a massively parallel set of tests on every atom, molecule and component simultaneously--until one of them breaks. Testing SW is essentially a sequential process, testing each one of the millions of paths one at a time. A small compiler or command-and-control module could potentially have 10^15 paths. Or 10^20. Combinatorics make exhaustive testing impossible.

 

What if there is a pure math approach similar to Bayesian Update or Weibull Analysis that can take a body of code, analyze it for complexity (path density), and then use historic information to calculate the probable number of bugs still in there undetected?

 

Hmmmmmm, he said. :)

Link to comment
Share on other sites

Pyro, it ALWAYS is an issue, no matter how much you try to deny it, always is and will be for still quite some time!

 

Only if you are connected to the Internet, and only if you have no physical security. If you are not connected to the internet then its purely up to physical security. Which I bet is pretty good in this case.

 

But it can never hurt to be too careful. It only costs you time and resources.

Link to comment
Share on other sites

Testing HW essentially performs a massively parallel set of tests on every atom, molecule and component simultaneously--until one of them breaks. Testing SW is essentially a sequential process, testing each one of the millions of paths one at a time. A small compiler or command-and-control module could potentially have 10^15 paths. Or 10^20. Combinatorics make exhaustive testing impossible.

 

Combinatorial Explosion is usually handled very well by an inference engine.

Using something like a Rete Network, the system can know which specific rules are eligible to fire (R1, R2, R3 etc) when an alpha variable is changed by a rule or a beta variable is changed by an external source such as an interrupt or I/O, instead of having to reevaluate the entire system. The trick here would be how to treat modules of code like rules in an inference engine.

 

Basically if there is a mapping between what variables a module can modify vs which variables are inputs to a module, then you know the dependencies.

Link to comment
Share on other sites

What if there is a pure math approach similar to Bayesian Update or Weibull Analysis that can take a body of code, analyze it for complexity (path density), and then use historic information to calculate the probable number of bugs still in there undetected?

 

Hmmmmmm, and you could use the new analysis program to determine how many undetected bugs you had in the new analysis program.

 

Real-world execution of software in a challenging environment (guiding a jet with terrain surveillance radar, or guiding a re-entry vehicle into the atmosphere, for example--where "disaster" means Loss of Vehicle) operates in rapid interface with a highly dynamic environment, which cannot be totally predicted. The volume of input parameters attempts to model the external environment and vehicle state, but never can completely. What if the software is asked to handle 2 boundary problems ("edge of the model") at the same time? How rare an occurrence is that?

 

What's wrong with testing all combinations of boundary conditions beforehand to identify the real nasty combinations that should be tracked in real time? Also, otherwise benign (combinations of) boundary conditions may actually become critical at/after a certain time in a mission, i.e. after certain points previously benign factors can become critical and potentially critical ones can become benign.

Link to comment
Share on other sites

As this thread still seems open and brainstorm-y, and I’ve not seen mention of it yet, I throw out a concept: provable correctness.

 

This was a largely hypothetical approach to software popular, as best I can tell, mostly in the 1980s. Basically, it required that language compilers be what one book I read called “higher order”, so that one could prove that all the object/executable code they produced could function only within a formally well-defined domain, corresponding roughly to “bug free”

 

A common acryonym-oid for such software was/is ProCoS. A “popular” languages of this kind was called AXIS. Here is as basically random web-search result finding reference to both.

 

Along with a couple of management folk, I tried to stir interest in ProCoS within my org in the early 1990s, with little success, encountering daunting “cultural” obstacles. To this day, I often find it difficult to successfully argue much weaker applications of such an approach, such as the idea that very formally defined unit testing can substantially reduce the need for end-to-end integration and regression testing. In some projects in which I’ve had personal stake, this what-I-perceive-as cultural barrier has, I’m convinced, incurred million of unnecessary cost, and in the worst case, actually made resources (including me) unavailable for unit testing, resulting in significantly more undetected defects in deployed software.

 

While I suspect the culture at the JSC to be a bit … well, better than in my medical IT world – that discussions like these are even coming out of it appears to confirm my suspicion – the basic ProCoS concept – that special language and compiler design can effectively eliminate specific categories of errors, and to a great extent the need for testing – is, I suspect, not much more comfortably received in any IT community than in my own.

Link to comment
Share on other sites

...Firstly i need to tell you that it is not something that can have concrete mathematics, nothing like probablility, although some math may be needed. In all reality this one of the topics i could write a thesis on, and although i dont know it very extremely well, i already have lots and lots of thoughts as to what factors should contribute...
Write that thesis! Can you have it finished and in my hands by April? :hihi: :) :)

I also have given this thought. I have made a list (exhaustive?) of all transformations that are sources of data-risk, and those that mitigate or even fix data-risk. What this produces is a complex "network" of data-risk-flow, that yields at the end, the probability that there will still be flaws or bugs in the Flight Load (the final step in my chart, so far). I still have to address what the probabilities are that 1) the bug will actually be executed, and 2) the executed bug will actually cause loss of mission.

 

We already have real-world examples of bugs in Flight Loads that would have caused a crash, but the circumstances that would have triggered that code never came about. (phew!!!)

...Application - what the apllication for the software is, what kinds of data it holds, what kinds of settings there are, permissions, so forth...

Dirt simple. Input parameters in, calculations on the parameters, outputs (commands) out -- to attitude controls, explosive bolts, gizmos and other hardware subsystems. No loops to speak of. No construction of data structures, no saving of history.

...People - Who is responsible for the software

parallel teams of professionals all located at the sw facility. No internet. No external networks. No modems or floppies or electronic contact with the world.

...Importance - related to application, importance is more towards the line of, what will happen if a part or parts of this software stop functioning

CRASH!!!! BOOM!!!! ARRRGGGGGHHHHHHHHHHH!!!!!

...Software security - yes you need to look at this

No internet. No external networks. No modems or floppies or electronic contact with the world. Guards at doors. Lots of "stinkin' badges". :shrug:

...Data Security...

Multiple backups, top-line Config Mngt, off-site copies, data dictionaries. The whole garbanzo.

 

What I have concluded is this: 1) The probability of software "failing" is essentially an analysis of HUMAN ERROR in all the various stages and transformations of the software.

 

2) The probability of a (given) software bug causing a CRASH!!!! BOOM!!!! ARRRGGGGGHHHHHHHHHHH!!!!! is essentially dependent on anomalous environmental conditions and events. "Edge of the Model" circumstances. Rare events or rare combinations of events.

Link to comment
Share on other sites

...that special language and compiler design can effectively eliminate specific categories of errors, and to a great extent the need for testing – is, I suspect, not much more comfortably received in any IT community than in my own.
Some progress in this direction was made in the 70's and 80's here. I remember my Space Station Freedom days [am I THAT old?] where I got to see some actual flight code.

 

Loops were highly constrained to the point of non-existence. Conditionals consisted only of IF--THEN--fall through. There were no GO-TOs. The compiler produced flow-charts, and a number of other "high-level" analysis functions. IBM is no longer a big player at JSC but I have to assume that something similar is in place.

Link to comment
Share on other sites

Combinatorial Explosion is usually handled very well by an inference engine....Basically if there is a mapping between what variables a module can modify vs which variables are inputs to a module, then you know the dependencies.
Let's talk about this off-line. I don't understand.

 

The question has been raised, "Why software PRA?" -- what do you do with the probability of catastrophic failure once you get it?

 

It's a design approach. Risk is assigned and parceled out to all components in the early stages of design. We want to do that to software, too. Therefore, we must have a way of confirming that our sw does indeed have no greater risk than that which was assigned to it. :shrug: Conversely, as we design and build the sw, the total assigned risk becomes one of the requirements for the sw.

Link to comment
Share on other sites

Write that thesis! Can you have it finished and in my hands by April?

Get in line, there's people that want me to write them a thesis on multiple architecture, non centralized, dynamically scalable computing cluster with virtual processing nodes and a seamless interface on top (aka gcc that will compile for this headache).... (mouthful, but trust me i have discussed this subject for hours) you know, for going to a community college and working full time, there are quite a few doctorates that i could probably write... sadly enough...

 

Input parameters in, calculations on the parameters, outputs (commands) out

yes the simplest most exploited and most expoitable application... yippie

 

parallel teams of professionals all located at the sw facility. No internet. No external networks. No modems or floppies or electronic contact with the world.

You dont have to have direct electronic contact with the world, City Bank thought they had no electronic contact with the world either, until they found out that they were being pwnt by a group of white hats that used some bank in Zimbabwe that stupidly plugged in the internal bank network (running a proprietary protocol) and the outside world on the same switch.... took months for the white hats to find, but from that point on, they got into the bank network from that switch...

 

As to professionals, no matter what people you put behind the wheel, they are still a people factor, so therefore they can enter in wrong values therefore you should make sure that the software checks for this.

 

Lastly, if this is a locked down facility, there could most certainly be interest in what happens inside by outside people/countries/companies. No matter whether that is or is not the case, software security should always be a priority, no loops does not mean no wholes.

 

CRASH!!!! BOOM!!!! ARRRGGGGGHHHHHHHHHHH!!!!!

sounds like a normal day at work....

 

Lots of "stinkin' badges".

You hire skunks? must really be high-sec application

 

Multiple backups, top-line Config Mngt, off-site copies, data dictionaries. The whole garbanzo.

sounds like its not a self-contained network after all.... so there are outside connections :eek:

 

Ooh ooh another questiundo, um, is there a database you deal with?

 

lastly, if you say there is no need to test physical security of the software, fine then dont, just make sure the inputs are checked to valid data types, lengths, parameters, etc....

 

What I have concluded is this: 1) The probability of software "failing" is essentially an analysis of HUMAN ERROR in all the various stages and transformations of the software.

 

2) The probability of a (given) software bug causing a CRASH!!!! BOOM!!!! ARRRGGGGGHHHHHHHHHHH!!!!! is essentially dependent on anomalous environmental conditions and events. "Edge of the Model" circumstances. Rare events or rare combinations of events.

Yes, i think you nailed it there.... though i will think a little bit more about this, maybe i can think of something smart :eek2:

Link to comment
Share on other sites

  • 3 weeks later...

Woof!

 

What a long, strange trip it's been :hihi:

 

I understand PRA now, and why they want to do it to software. Before the software does it to us. The whole idea is to assign a probability of catastrophe (POC) to flight software. Not in an attempt to find a bug but just to know where we stand. Just what risks ARE we taking? If the POC is too high for comfort, then decisions need to be made on where to spend money (and how much) to lower the POC--but that is someone else's problem.

 

And that's really it. Sounds simple, but nobody has done it until now.

 

Until now.

 

I've cracked it. :) I can calculate the POC to a reasonable level of confidence. Without ever seeing the code. ;)

 

And they said it couldn't be done. Ha.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...