Jump to content
Science Forums

Standard Deviation Equation


thimble

Recommended Posts

If you know of a web page that explains the standard deviation formula for people that aren't very good with math ..
As Jay mentions, the wikipedia page is pretty good. I’m guessing you’re looking for a practical formula, though, and the wiki may be more than you want.

 

Size, mean and variance are the three foundation statistics of any finite population. The standard deviation is the square root of the variance. The choice of using variance or standard deviation is up to whoever's writing about it - all that's important is that everyone know which of these two forms of the same statistic are being used.

 

A population is any collection of numbers. These number are called such things as members, elements, values, or variables. {1, 2, 3, 4, 5, 6, 7} is a population. A good term to use for a member of a population is [math]X_i[/math]. In the above population, [math]X_5=5[/math].

 

Size is such a well know statistic that few even consider it a statistic. It's the count of the number of members in a population. The count of above population is 7. [math]n[/math] is a good term to use for size

 

Mean is also a well know statistic. It's the sum of the members of a population, divided by the number of elements, which can be written as the formula

[math]\mu = \frac{1}{n} \sum_{i=1}^n X_i[/math]

. For the above population, [math]\mu = (1+2+3+4+5+6+7) \div 7 = 4[/math]

 

Variance is the first statistic not well known by most readers. There are several sensible definitions of it. One is that it’s the mean of the square of the difference between each member and the mean, which can be written by the formula

[math]\sigma^2 = \frac{1}{n} \sum_{i=1}^n (X_i-\mu)^2[/math]

. Because the above formula has [math]\mu[/math], it takes two passes at the population to calculate it, so another definition of variance is more popular: the mean of the square of the members minus the square of the mean, which can be written

[math]\sigma^2 = \left(\frac{1}{n} \sum_{i=1}^n X_i^2 \right) - \mu^2[/math]

. For the above population [math]\sigma^2 = (1+4+9+16+25+36+49) \div 7 - 16 = 4[/math].

Using the first formula give you the same answer, [math]\sigma^2 = (9+4+1+0+1+4+9) \div 7 = 4[/math].

 

Since the standard deviation is the square root of the variance, the standard deviation for the above population is [math]\sigma = \sqrt{4} = 2[/math]

 

Statistics are usually calculated using computer programs. The usual way that [math]\mu[/math] and [math]\sigma[/math] are calculated in a program is by keeping a running total of the [math]n[/math], the sum of the members S, and the sum of the square of the members SS. Doing this with the above looks something like this:

n       X       S      SS
0               0       0
1       1       1       1
2       2       3       5
3       3       6      14
4       4      10      30
5       5      15      55
6       6      21      91
7       7      28     140

From which we calculate:

[math]\mu= 28 \div 7 = 4[/math]

[math]\sigma^2 = 140 \div 7 -16 = 4[/math]

[math]\sigma = 2[/math]

 

The above just tells us how to calculate the standard deviation using all the members in a population. Statistics users usually want to do more complicated things, like estimate the mean and standard deviation from a small sample of a large population, or predict how likely a given member of the population to be within some number range. There are many formulas for doing this, too many to summarize in a “what is” question thread like this.

Nice little normal curve, that exists, we are told, everywhere. Then you can be 1 -3 (4?) SDs (not STDs) away from the norm/average/whatever. :)

Having a mean and a standard deviation doesn’t guarantee that a population “follows a bell-shaped curve” (ie: is normally distributed). The example above doesn’t – its distribution “curve” is a straight horizontal line.

 

Every finite collection of numbers has a mean and a standard deviation, no matter what shape its distribution curve, and not everything in nature follows a normal distribution.

Link to comment
Share on other sites

Every finite collection of numbers has a mean and a standard deviation, no matter what shape its distribution curve, and not everything in nature follows a normal distribution.

Thanks for the correction

It is along time since I studied or used statistics (1970)-or any other sums-

No computer programmes to speak of then. Punch cards were the in thing !

 

I always thought you had to have a normal distribution to get a SD.

 

The Normal Bellcurve Percentiles, Standard Scores, Standard Deviations

Untitled Document

 

It always worried me that there seemed to be implied that given a big enough population a bell curve magically appeared. (Or maybe I was asleep at that point).

Distance from the mean given any shape of data sounds more feasible and intelligent..

Are there different assumptions/ formulas then for different patterns that data forms on a graph?

Link to comment
Share on other sites

I always thought you had to have a normal distribution to get a SD.
Nope.

 

An everyday example are ordinary playing dice. If you throw lots of fair 6 sided dice, you’ll discover that the mean approaches [math]\mu=\frac{7}{2}=3.5[/math], the variance [math]\sigma^2 = \frac{35}{12} \dot= 2.92[/math], the standard deviation [math]\sigma \dot=1.71[/math] This is an example of a discrete uniform distribution, for which the formula for variance is [math]\sigma^2 = \frac{n^2-1}{12}[/math], where [math]n[/math] is the number of possible equally likely, equally spaced values (eg: 1, 2, 3, 4, 5, 6).

 

Every possible population obeys Chebyshev's theorem, so given only the count, mean, and variance of some population, it’s possible to with complete certainty determine what range the population must fall between, as given by the formula:

[math]\mu -\sigma \sqrt{n} < X_i < \mu +\sigma \sqrt{n}[/math]

 

This is a pretty fun theorem to play with by trying to “beat” it, which you’ll discover is impossible (of course, as if it could be beaten, it wouldn’t be a proven theorem ;)).

 

Though Chebyshev's theorem is absolutely guaranteed to be true, it gives such conservative statements that it’s less useful than less reliable functions for specific kinds of distributions. The normal distribution is one of the most useful ones, because a lot of common place population, while not exactly following it, are close enough for practical purposes.

Are there different assumptions/ formulas then for different patterns that data forms on a graph?
Yes. The wikipedia article “list of probability distributions includes only the better known ones. Lesser know ones are invented all the time.
Link to comment
Share on other sites

  • 2 weeks later...

Sorry to garble on but I just read this article and thought it might be sort'a gemane.

 

Unfortunately, it is written by a mathematician so there is no point to the article, but for my untrained mathematical mind I found some of the things said were interesting.

 

Article from: The Australian

 

THERE is an age-old philosophical question: is mathematics created or discovered?

I think the former.

 

To me, mathematics is an artificial human construct, an excellent tool when we acknowledge its limitations in modelling complex phenomena.

. . .

Consider a standard box of matches. It is labelled as containing 50 matches, and most boxes do although some have 49 or 51 matches, a few have 48 or 52 matches, and the odd rare box will have 47 or 53 matches.

 

This situation could easily and appropriately be modelled using the Gaussian bell-shaped curve. It would not be impossible for a box to have, say 42 matches, although the likelihood of this is so low as to make it insignificant and, even if it did occur the mean number of matches per box would not change to any meaningful extent.

. . .

it is foolhardy to make decisions on statistically based claims

. . .

"Close to 99 per cent of the variations over the span of 20 years will be represented in one single day -- the day the European monetary system collapsed," he writes.

 

All macroeconomic data is affected by this phenomenon, Taleb argues. "If you look long enough, almost all the contributions in some classes of variables will come from rare events."

. . .

This propensity to suspend critical thinking and rely on studies of chance based on games and dice is what Taleb refers to as the ludic fallacy.

. . .

Modelling is a useful exercise. Humans, it seems, compulsively theorise and model to gain some foothold in the future.

. . .

Taleb writes, "anyone using variance or standard deviation (or worse, making models that make us take decisions based on them) is incompetent".

To stand a better chance of predicting the future in Extremistan, our sage points to the modern and still evolving fields of fractal geometry and chaos theory.

Critical thought beats blind faith in models | The Australian

I read somewhere that 50% of chaos theoy specialist mathematicians/physicists were employed by the Stock exchange

 

So how did they stop one of the guys who set it up (Wall St Stock Exchange) robbing people of 50 Billion?

They make the Mafia look like amateurs.

Would an iconoclastic social psychologist done any better?

Link to comment
Share on other sites

Critical thought beats blind faith in models | The Australian

Unfortunately, it is written by a mathematician so there is no point to the article…

I’m perplexed why one would find it unfortunate that an essay/opinion statement about a mathematical financial analyst (Nassim Nicholas Taleb) was written by a mathematician (Peter Hodge), but this apparently being the case, can offer the reassurance that Hodge is not what most would call a mathematician, but rather a secondary school Math teacher and freelance journalists. Nonetheless, he’s written a pretty good article about Taleb’s writing and influence, I think.

 

Like Hodges, Taleb is not what I’d call a mathematician, despite having been an adjunct professor of Math, but rather a business expert, having an MBA and a PhD in Management Science – though he appears to be mathematically knowledgable, and has been described by many as a polymath – someone with a deep understanding of many branches of math. He’s certainly both a capable businessperson and an influential academic. His writing, however, appears to be commentary on and interpretation of math within a fairly narrow context of business finance, than pure contributions to it.

 

Taleb’s basic message of caution in relying on the expertise of statisticians and claims that financial instruments are based on sound statistics is, I think, a good one. However, I don’t see in Taleb’s writing a general indictment of mathematical statistics, but rather an indictment of their misuse.

 

It’s easy, I think, to misinterpret this message. For example Michaelangelica quotes from Hodge’s article:

it is foolhardy to make decisions on statistically based claims

when the article’s full sentence is:

It is here [in the domain Taleb calles Extremistan], he [Taleb] argues, that it is foolhardy to make decisions on statistically based claims.

In short, Taleb is simply making the common sense observation that it’s foolish to make descision based on bad statistics.

I read somewhere that 50% of chaos theoy specialist mathematicians/physicists were employed by the Stock exchange

 

So how did they stop one of the guys who set it up (Wall St Stock Exchange) robbing people of 50 Billion?

Although my work as a statistician is in medicine, not finance, I think my experience can speak to this question.

 

As a statistician, you’re often paid an hourly rate to provide support for conclusion your employer has already accepted. One of the more difficult parts of the job is when best statistical methods reveal that conclusion to be resoundingly wrong (Especially if your employer’s based a their PhD thesis on it). You’re not any sort of legally empowered police officer, able to forbid your employer to persist in her claim because your analysis shows it false. The best outcome in such a situation you can hope for is that your employer will reverse or modify her claim. Often, the best you get is the choice of whether to artfully fake the statistics to support bad claims and continue getting paid, or refuse, with the knowledge that she’ll simply pay a less capable or ethical statistician to do so.

 

In the case of the financial industry, the purpose to which statisticians like Taleb were employed was to make their employers and themselves lots of money, so it seems to me the ethics of their job were far less well defined than mine. In short, far from being charged with stopping “robbing people”, they were assistants and accomplices in it.

Link to comment
Share on other sites

I’m perplexed why one would find it unfortunate that an essay/opinion statement about a mathematical financial analyst (Nassim Nicholas Taleb) was written by a mathematician (Peter Hodge), but this apparently being the case, can offer the reassurance that Hodge is not what most would call a mathematician, but rather a secondary school Math teacher and freelance journalists. Nonetheless, he’s written a pretty good article about Taleb’s writing and influence, I think.

Thanks that makes a lot more sense now. I had no idea what a "Talib" was.

 

i guess all modelling-including mathematics- is a simplification of reality. Made so we can try and grasp it with our tiny minds.

Is statistics any better than the "I Ching" or "tarot cards" for predicting the future?

 

I am a bit bemused about the Black Swan bit as all swans are black in Australia. They are anything but 'rare'.

 

 

He seems to be saying "What we don't know, we know, is more important than what we (think we) know"??

 

Perhaps humans have a desire to create meaning and order out of what is basically random??

Is predicting the future an impossible human dream?

Link to comment
Share on other sites

i guess all modelling-including mathematics- is a simplification of reality. Made so we can try and grasp it with our tiny minds.
As I see it, the math of probability and statistics fall into tow main “what is” areas:
  • Making absolutely certain statements about uncertain events
  • Attempting to match the former area to observed phenomena

In short, when what we use statistics to predict are the outputs of precise mathematical functions, we can in principle – though not always easily, and not for all functions – be absolutely certain in our predictions, even though they don’t predict specific values. When we attempt to map these exact functions to real world phenomena, things become less certain. Thus, its common for statistics to be considered an art, rather than a science.

Is statistics any better than the "I Ching" or "tarot cards" for predicting the future?
Yes, much better.

 

These things are, IMHO, in very different domains of human experience. If you built an engine using fortune telling cards/sticks in place of statistics, you’d do very poorly. If you read someone’s fortune using statistics, with rare exceptions, you’d do very poorly. Statistics allow us to make predictions. Fortune telling at its best is a sort of potentially very beneficial psychotherapy.

I am a bit bemused about the Black Swan bit as all swans are black in Australia. They are anything but 'rare'.
Bemusing indeed :weather_snowing: Whatever Talib’s discipline can be said to be – business, math, or philosophy – it’s clearly not zoology!
He seems to be saying "What we don't know, we know, is more important than what we (think we) know"??
I’ve not read much of Talib’s writing, not even one of his books, but my impression of the message he’s trying to impress on the non-technical public, particularly people in financial businesses, is that it’s essentially two part.

 

First, beware bad statistics. Some of this is common sense. If someone claims to have rigorously statistically proven something that seems too good to be true, keep in mind the old maxim “if something sounds too good to be true, it probably is”. Statistics do not lie, but people do, and will use any means to do so convincingly, including misuse of statistics.

 

Second, don’t confuse the statistical function (generator) that seems to describe some measurement what is actually causing the phenomena you’re measuring. This point is more profound, more subtle, yet still common sense, if approached with sufficient clarity. It can be illustrated by the following example:

Say you measure a phenomena that, after 1 million of samples, produces nearly equal counts of 6 values, 1, 2, 3, 4, 5, and 6, almost perfectly matching a uniform distribution. You would be justified in concluding that it will continue to do so. However, the actual phenomena might, exactly every 1,000,001 samples, produce a 7th value, 100. It’s important, then, that no matter how closely some collection of measurements of a phenomenon of which the underlying mechanism is unknown or poorly known, matches a generator function, one should not conclude that because the generator function can never generate a particular value, the phenomena can’t. Statistical correlation, even very high confidence, is not a substitute for knowing the underlying mechanism of phenomena.

 

 

Perhaps humans have a desire to create meaning and order out of what is basically random??
I think this is true, and reflected by our tendency to see faces in clouds, hear voices in the wind, etc. This would seem to be an artifact of the way our brains are “wired.”

 

A major social function of statistics in particular and science in general is, I think, to counteract this bias.

Is predicting the future an impossible human dream?
I don’t think so. However, the idea that such a prediction, concerning events and on time scales of immediate interest to humans – years, decades, and centuries - would describe a single outcome, or assign certain probabilities to a collection of possible outcomes is, I think, statistically naïve. These statistics are very likely chaotic – which is not to say they cannot be well described mathematically, but that rather than the math making simple predictions, it makes complicated ones.
Link to comment
Share on other sites

  • 3 weeks later...

The variance formula was given with n in the denominator. That is the form of the formula to use with the population. When working with samples use n-1 in the denominator.

 

For example, suppose you want to know the average height of a tree in the forest. Then you want to know something about the variability of those tree heights, ie the spread of the data. Typically only some of the trees are measured. This becomes the sample. Use n-1 when calculating the variance. If all of the trees in the forest were measured, an overwhelming task in most cases, use n since everything was measured.

 

The difference is that the result from the sample is trying to say something about the entire forest even though only part of the forest was measured. The sample result is an estimate of the population result.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...