Jump to content
Science Forums

Statistical/probability issues in speciation


Biochemist

Recommended Posts

Per a suggestion in another thread, I am launching a discussion on the probabilities associated with mutation and speciation.

 

I built a simple model in an earlier thread here:

 

http://hypography.com/forums/biology/2756-punctuated-equilibria-theories-5.html?highlight=punctuated+equilibrium+theories

 

And I will excerpt part of that post for ease of reading:

 

Let me start by saying that we understand a relatively small portion of intracellular biochemistry. Like most sciences, the new things that we learn in biological sciences always raise more questions than they answer. This is dissimilar to physics, where leading physicists will actually openly discuss having a "theory of everything" and some will contend that string theory (even though still open to evaluation) is that theory.

 

There is no such position in biochemistry. Every additional material discovery surfaces other issues that bump up intracellular complexity by another order of magnitude. With that preamble, let me bore you with 12 steps of biochem 101 for a second, and then pull out some of the anomalies.

 

 

 

  1. DNA is a sequence of four nucleotides (guanine, cytosine, adenine, thymine)
  2. Three nucleotides in a sequence form a codon. Each of the 64 possible codons "codes" for one of 20 amino acids.
  3. There are an infinite number of amino acids possible in chemistry. Only 20 are used in living systems- pretty much the same 20 irrespective of the life system. Amino acid anomalies are extremely rare.
  4. DNA only codes for proteins and RNA. Proteins are the little machines to do things. RNA pretty much only helps to "transcribe" the DNA to make the proteins. Everything that is done in the cell is either done by a protein, or done by something built by a protein.
  5. DNA is hence a little machine that builds machines (ribosomes) that build machines (proteins) that build machines (everything else). Since DNA builds itself, you could add at least one more generation on this sequence.
  6. A typical protein is about 300-400 amino acids. They range from probably about 50 to over 10,000, but 300 is a good average. The set of codons that code for a protein is a gene. Ergo, a typical gene has 300x3 DNA bases in it, or about a thousand.
  7. Most proteins are highly specific. In most proteins (that have been tested) most individual amino acid residues cannot be changed at all or the protein stops functioning. Most proteins have exactly one substrate, exactly one output, and several speed modulators that control the rate at which the protein functions. Proteins that are acting in this fashion are called enzymes. This is differentiated from proteins that are part of our mechanical structure.
  8. Proteins are manufactured in a single-thread long string, but this 300 amino acid residue string "folds up" into a ball. It has to be in exactly one ball shape. Most proteins (all?) could fold up into different ball shapes which would be dysfunctional. They usually don't because other proteins ("chaperone proteins" ) manage the fold-up of the new protein to keep it the correct shape. Some diseases are thought to be errors in fold-up (e.g.,Alzheimers, cystic fibrosis) more about that issue at:http://www.faseb.org/opar/protfold/protein.html
  9. Human DNA is about 3.6 billion nucleotide bases, but there are thought to be only 30,000-40,000 functional genes. Even if there were 100,000 functional genes, that would account for 100 million bases. The other 3.5 billion are just standing by. That is, the ratio of stand-by DNA to functional DNA is probably higher than 40:1. More on that here:http://www.biology.eku.edu/FARRAR/gen-prot.htm
  10. Most proteins do not act alone. They act in a defined sequence of actions. Glycolysis, the Krebs cycle, the urea cycle, beta oxidation of fats: All of these are multi enzyme processes where the output of one enzyme is the input to the next. I will use the Krebs cycle as an 8-enzyme example (just because it it so famous). Picture here:http://www.bmb.leeds.ac.uk/illingwor...abol/krebs.htm
  11. Most proteins systems need to be physically associated with each other to function. Hence, there are specific transport systems that transport proteins to their work site within the cell. These transport systems need to recognize the protein and "know" its appropriate location.
  12. Enzymes occasionally break, or need to fluctuate in quantity. When they do, the DNA is triggered to produce more of the enzyme. A typical human chromosome is about 78 million bases, and is folded at least ten times (into at least a thousand parallel threads of DNA). The DNA is triggered to "unfurl" just a small portion of base pairs, "unzip", and let the ribosomes zip along it the make a new protein. The new protein is then chaperoned into a ball, transported into location, and usually inserted into a specific location in the target machinery.

Now the math:

 

Granted, the math I will present is related mostly to the human genome. Frankly, at the level of detail we are talking about, it would apply pretty well to bacteria as well. Bacteria don't have genomes quite so big, and have substantially less non-coding DNA (maybe 10% versus human 98%) but the numbers are still impressive.

 

  1. To get a functional protein by mutation: you would need at least 200 specific amino acids in specific sequence out of 300 in the protein (it is actually more like 260 on average, but I am making it simpler). This would be randomly 1 in 20^200, or about 1 in 10^260. Heck. To be conservative, let's make it a couple of trillion trillion trillion trillion times more likely, and make it 1 in 10^200.
  2. Proteins do not work alone, so figuring 5 enzymes in a sequence (being conservative, typical is 6 to 8), this give us 1in 10^1000.
  3. Keep in mind that there are thousands of separate interdependent enzyme systems and structural construction systems. I am not including the calculations for other logically required systems. Any additional required system would be a multiplier (yes, that would be 1 in 10^1,000,000). Sheesh.
  4. I have no idea how to calculate the odds of a chaperone protein, since we would have to know the odds of a specific protein folding incorrectly without one. Let's give this one a pass.
  5. I have no idea how to calculate the odds of recognition in the protein transport systems. We would have to know the requirment for transport, versus the degree of activity if the enzyme system was floating freely. Heck. Let's give this a pass too.
  6. I have no idea how to calcualte the feedback loop for production of additional protein from DNA, but this one probably dwarfs all of the previous numbers. Remember that we have to expose the specific thread of DNA to let the ribosomes zip along it. The DNA unfurls on signal, and this means that one single loop of perhaps 1000-2000 codons out of maybe 70-80 million bases in a chromosome is exposed. I didn't mention that related enzymes are often associated in adjacent genes (called an "operon") and are transcribed as a set, rather than as a single enzyme. I already accidentally gave us a pass for the probability of 5 or 6 adjacent genes of 1000 codons being arranged together on a string of 80 million bases (about 25 million codons). But the real problem is that ANY mutation to the chromosome would tend to mess up this complex unfurling arrangement. So we have to allow for not just the 1 in 10^1000 problem of a mutation to create the enzyme system, but we have to make sure any of the series of mutations to establish the enzyme system does not mess up the feedback unfurling of several thousand OTHER genes on the same chromosome. No guess for the odds here.
  7. We have not yet discussed the "lysosome problem". Cells are remarkably efficient scavengers, in that they destroy useless junk routinely. This means that the lysosomes (or other scavenger pathways in lower lifeforms) recognize foreign from non-foreign chemicals. This means that a new random protein would likely get scavenged. If it didn't, the cell would be swamped in non-functional proteins. I can't find any information on the efficiency of the cell scavenger process, but certainly a minority of proteins in the cell is non-functional. Otherwise, an organism would spend most of its energy (and food consumption) on production of non-functional material. Clearly not the case. Even if we assume that every 1 in 10^6 mutations was functional (a ludicrously positive assumption) we have to assume that lysosomes destroy the vast majority of these. The lysosome has to recognize these as non-foreign to let them remain. For each enzyme in the sequence. This would be mandatory, or the house of cards falls apart.
  8. I brought up bacteria above, and that they have perhaps 10% non-functional DNA. Using them as examples of prokaryotes, it sure is odd that these archaic, simplistic systems are so efficient. Is it odd that the progenitors are so genomically efficient and yet the sophisticated, higher systems are not? If we have systems to reverse mutations in DNA (we do) and to eradicate foreign proteins (we do), why don't we have systems to eradicate nonfuncitonal DNA? My suggestion is that we probably do. I suggest this "non-coding" DNA is not nonfunctional. It is required.

Too long an answer to your question. But anyone who wants to advocate improved morphology by mutation has to get the 1 in 10^1000 number (not to mention the 1 in 10^1,000,000 number) down to something like 1 in 10^6 to 1 in 10^8 to make it have any chance of playing a role in speciation. As a biochemist, I have no idea how do do that intelligently.

 

Let me offer a constrained question: Can anyone offer a reasonable mathematical model to explain how we could obtain a new enzyme system (through mutation) in a higher life form (like a mammal) in as little as 300 million years?

Link to comment
Share on other sites

  • Replies 53
  • Created
  • Last Reply

Top Posters In This Topic

Per a suggestion in another thread, I am launching a discussion on the probabilities associated with mutation and speciation...Can anyone offer a reasonable mathematical model to explain how we could obtain a new enzyme system (through mutation) in a higher life form (like a mammal) in as little as 300 million years?

Yes, I am game. The game is afoot. Actually, the game has two feet.

And yes, I can explain how to do the above in as little as 300 million years: Do half of it in 150 MY and the other half in 150 MY. ;)

 

Pyro--with 1200 caliber log-log sliderule set on full auto!

Link to comment
Share on other sites

I may time only during lunch or at the end of the day--for a couple of weeks. Work load just went off scale. Having said that...

 

I will suggest some ground rules as we enter this jungle of mathematical iniquity. Here is one:

It is (or should be) pointless to discuss the "probability" at the chemical / codon / gene / protein / DNA level for "events" that we have no good reason to believe actually occur.

 

For example, there are reasons to believe that "mutations" within a stable species that produces a wholly unique and new chemical pathway does not occur. So it is questionable that we can reason intelligently on the probability of this happening. Furthermore, it is not obvious that we should assume that Evolution "requires" such an event if we have reason to believe that such events do not occur. This groundrule is subject to radical revision at any time. ;)

Link to comment
Share on other sites

I really cannot even begin to discuss this on the level you outlined, but one thought is that the Earth is not a closed system... it is an open system, and we gain HUGE amounts of energy from that big thing we orbit called the sun. This source of power helps the flat probabilities everyone has discussed to be more likely.
I usually assume no energy constraints to keep the game positive. This assumption is a highly favorable one, but probably necessary to keep the mathematics within my humble calculus skill set.
Link to comment
Share on other sites

I will suggest some ground rules as we enter this jungle of mathematical iniquity. Here is one:

It is (or should be) pointless to discuss the "probability" at the chemical / codon / gene / protein / DNA level for "events" that we have no good reason to believe actually occur....

I have no problem with offering credible ground rules. And I would like to STRENUOUSLY avoid any implications theological.

 

I would just like to have a reasoned discussion of a narrow problem set: How do we get a new enzyme system on line (as might be required for a new phylum). I am open to any assumptions, as long as you are open to critique of the reasonableness of the assumptions.

 

Bio

Link to comment
Share on other sites

I really cannot even begin to discuss this on the level you outlined, but one thought is that the Earth is not a closed system....

Have no fear. We will be nibbling off the rough edges of the preamble for some time to come. It is a very good preamble, is it not? All the more so since it contains some very debatable assumptions, that may or may not be bogus to one extent or another. This may be where we begin this thread. The bogosities may not reveal themselves until we actually try to use the preamble items in serious speculations.

 

You might wish to pick one preamble item and ask a question or make a conjecture upon it. Or say more about "open system" dynamics.

Link to comment
Share on other sites

For example, there are reasons to believe that "mutations" within a stable species that produces a wholly unique and new chemical pathway does not occur. So it is questionable that we can reason intelligently on the probability of this happening. Furthermore, it is not obvious that we should assume that Evolution "requires" such an event if we have reason to believe that such events do not occur. This groundrule is subject to radical revision at any time. ;)
I admit that I picked 300 million years as the window that appears to exist in the fossil record between invertebrates and the advent of mammals (in the Cambrian explosion). I do not know of an actual example (although I would be happy to search for one if you like) but I am pretty confident that there are a number of entirely new enzyme systems that arrive on the phlyogenetic scene between those two fossil markers. Do you think I ought to do the research on it, or do you want to concede as a stipulation that at least one new enzyme system arose in those 300 million years?
Link to comment
Share on other sites

the Earth is not a closed system... it is an open system, and we gain HUGE amounts of energy from that big thing we orbit called the sun. This source of power helps the flat probabilities everyone has discussed to be more likely.

FYI... I just threw this in here because it's quickest and most efficient way to disable the argument of those attacking evolution using the 2nd law of thermodynamics as support for their arguments.

Link to comment
Share on other sites

I admit that I picked 300 million years as the window that appears to exist in the fossil record between invertebrates and the advent of mammals (in the Cambrian explosion). ...do you want to concede as a stipulation that at least one new enzyme system arose in those 300 million years?

Ahhh, prior to the CX! I will concede. However, may question whether the new chemical pathway was 1) stable, 2) had positive survival value, and 3) was triggered by a single mutation -- all simultaneosly. I prefer sloooow developmental mutations.

Link to comment
Share on other sites

Ahhh, prior to the CX! I will concede. However, may question whether the new chemical pathway was 1) stable, 2) had positive survival value, and 3) was triggered by a single mutation -- all simultaneosly. I prefer sloooow developmental mutations.
Really, it is more like "during" the explosion than prior to it. I don't really care whether we assume the new enzyme system was favorable. We could presume it would have to be to produce a fossil (maybe). But it does appear that new enzyme systems arose, and it also appears that most of them did not have incremental phenotypical expression (another challengable assumption). We can find some enzyme systems that appear to have interim phenotypical expression, but it looks like most of them do not, because they have fully separate physical genes in current species.

 

So, the mathematical question is: How can we credibly get a mature enzyme system in 300 million years when it does not look like there was an opportunity to "select" for it phenotypically in the 300 million year interim?

 

Please attack any unreasonable assumptions.

Link to comment
Share on other sites

FYI... I just threw this in here because it's quickest and most efficient way to disable the argument of those attacking evolution using the 2nd law of thermodynamics as support for their arguments.
No problem.

 

And the intent here is not to attack evolution per se. My intent is to air an issue that I think has substance. I don't care about the implications. Us science folks are not supposed to let the implications of analysis affect the analysis itself.

 

It is true that this thread is clearly questioning some of the underlying fundamentals of speciation by mutation. But there is a lot more to evolution that that. That being said, I do think speciation by mutation is not the general speciation mechanism. This position makes me an outlier.

 

But I like bourbon better than scotch, too. Yet one more outlier issue.

Link to comment
Share on other sites

Great word. But I am pretty sure it is "bogusities".

Oxford English Dictionary, 1998: "bogosities" from OE "boggonstinkums", literally, terrifically bad stenches that would bubble up without warning out of the bogs of western Britain."

 

However, I am very flexible. Let's go with "bogusities". ;)

 

Back to work! ;)

Link to comment
Share on other sites

Really, it is more like "during" the explosion than prior to it. I don't really care whether we assume the new enzyme system was favorable. We could presume it would have to be to produce a fossil (maybe). ...

You're right. That is the most interesting epoch of life on Earth, from an evolutionary standpoint. I have "Wonderful Life" by Gould and "Climbing Mount Improbable" by Dawkins. The first is devoted to the CX and the amazing CX fossils found at one particular site. The second has at least one chapter, I believe, that addresses this issue of rapid evolution during the CX. I will reperuse them this weekend.

Link to comment
Share on other sites

You're right. That is the most interesting epoch of life on Earth, from an evolutionary standpoint. I have "Wonderful Life" by Gould and "Climbing Mount Improbable" by Dawkins. The first is devoted to the CX and the amazing CX fossils found at one particular site. The second has at least one chapter, I believe, that addresses this issue of rapid evolution during the CX. I will reperuse them this weekend.
Given our current undersanding of genetic drift, we have a pretty good handle on a couple of things:

 

1) Whenever a cataclysm occurs, we eradicate large portions of extant populations, and also sequester small populations

2) Small populations are far more likely to express recessive alleles, simply by arithmetic

3) Ergo, any population that has unexpressed recessive alleles is far more likely to express those phenotypically after a major population reduction or population sequestration, as with a cataclysm.

 

The questions are:

 

1) If the fossil record is reasonably valid (challengable), then at least some of those recessive alleles are reasonably complex genes, and were not previously selected even if they were ever expressed.

2) If any of these genes were reasonably complex (and many appear to be), how did those complex recessive alleles show up if they were not previously phenotypically expressed?

 

This is the basis for the math question of this thread. We can explain that a cataclysm would result in expression of a recessive allele. It is more problematic to explain how the recessive allele would show up at all if there was no selection pressure for the necessary biochemical intermediates.

 

We give a lot of air time to some rare cases where it looks like some protein fragments are reused in unrelated cell components. But the vast majority of the 300,000 or so mammalian proteins do not look like they have significant reused fragments. Further, any disfunctional fragment that arrives in the cell is usually eradicated in the lysozyme system, or in some other scavenger pathway. If this were not true, the majority of protein in any cell would be dysfunctional. This is not the case in any species that I know of.

 

Ergo, it looks like we are able to get viable, expressed recessive alleles that are major leaps forward (potentially resulting in a significant change in body plan) without interim selection for the half-dozen (or more) enzymatic components.

 

I have a hard time making this math work, hence the thread.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...