Classification, Approaches to

Buffy · December 8, 2006

Actully I'm trying to solve a programming problem, but its so general, that I thought I'd throw it out an an interdisciplinary issue.

There are many ways to classify things, both in the sense that you can come up with different names underwhich you can group "things" as well as the fact that the way those groups are *structured* can take different forms.

The classification that most people with general science backgrounds will be familiar with is the Linnean Classification of living things. It provides lots of different pigeonholes for groups that are defined by specific physical traits, and these groups are structured into a tree-shaped hierarchy (its got only one root with branches) with well defined levels in the tree (Phylum, Class, Order, Species).

Recently, our ability to analyze DNA has resulted in a different mechanism for classification, and its "corrections" to the traditional Linnean taxonomy have caused some interesting conflicts. I don't want to get bogged down with what *should* be done about resolving these two classification mechanisms, but bring up the issue of how we think about multiple, different, and valid classifications of the same population of things.

In my programming problem, I'm trying to come up with a way to look at two or more classification schemes simultaneously. This is not too different than trying to use *both* the Linnean and DNA classifications at the same time, in order to gain insights about them: in living things, DNA gives very clear indications of linneage, whereas the parallel Linnean descriptions provide details of how those changes may have survived environmental selection.

It seems to me that--going back to the distinction posed at the beginning of this post--that there are two issues: defining groups that have varying levels of overlap (think Venn diagrams), as well as radically different linkages between these groups that can show insights through comparison and contrast of conceptual models and paths.

I've got to be honest, I'm not really even sure where I'm going with this, but I've never seen it discussed in any detail as an abstract problem and I'm hoping that some interesting insights might pop out of a random walk through the topic.

Please jump in anywhere!

Catagorically,

Buffy

Qfwfq · December 11, 2006

Tryin' to work out a tricky class hierarchy, eh?

I think the kind of ambiguity your concerned with can be fit into OO with miltihereditarity, which of course sucks in Java as ya gotta make do with interfaces. Hasta C++ siempre! :0005:

Suppose you need a chimp to be a kind of equine as well as a primate... MH: your directed graph doesn't have to be a tree! That's prolly not exactly what you need, you seem to be needing to have two trees superimposed... MH: your chimp is a primate but it's also one of the high-intelligence species (along with dolphins, elephants, mice), but isn't for example one of the predator or the necrophagic types.

I mean, you can have each concrete type inheriting from one abstract type from each of several trees (or directed graphs). Each of these could be simple or quite elaborate.

Eclogite · December 11, 2006

For some reason the expression two orthogonal matrices, with entities represented by intersecting nodes sprang to mind. I don't know if that's an interesting insight, but it certainly just popped out. :0005:

Buffy · December 12, 2006

Tryin' to work out a tricky class hierarchy, eh?

:) :) :)

Worse, I'm trying to implement it in a relational database! :)

I think the kind of ambiguity your concerned with can be fit into OO with miltihereditarity...

Yes of course, but actually, I've put implementation on the back burner to do research (which means I don't have to have a goal!), so I just want to think about the theoretical implications of looking at things from different points of view.

All discussion about shapes of relationships (tree, network, directed graph, etc) are all fair game. What I'd like to do is think about things like *mapping*, not just of the attributes of the "nodes" but of the links between them. If you've spent anytime with this, you know that its possible to turn graphs "inside-out" so that the attributes become the links and the links become attributes, but there are restrictions to such operations.

SO:

Suppose you need a chimp to be a kind of equine as well as a primate... MH: your directed graph doesn't have to be a tree! That's prolly not exactly what you need, you seem to be needing to have two trees superimposed... MH: your chimp is a primate but it's also one of the high-intelligence species (along with dolphins, elephants, mice), but isn't for example one of the predator or the necrophagic types.

...is exactly what I'm looking for! Expound here please, Q!

Nodes, they're not just for nerds anymore,

Buffy

Buffy · December 12, 2006

For some reason the expression two orthogonal matrices, with entities represented by intersecting nodes sprang to mind. I don't know if that's an interesting insight, but it certainly just popped out.:)

Very :) !

What made you think of it? What was it used for when you stumbled on it?

Driving connections,

Buffy

Qfwfq · December 12, 2006

Sorry Eclogite but I'm at a loss to make sense of "two orthogonal matrices, with entities represented by intersecting nodes". What exactly would the intersecting nodes be? :confused:

Worse, I'm trying to implement it in a relational database!

Which makes it tedious but not difficult. What I'm currently working on, electric distribution networks, are directed graphs represented in RDB and the implementaion I mostly work on is in pure C :hihi: which makes it less worthwhile to translate the joins into double linking.

I did imagine however that you're thinking less about implementation and more about the theoretical implications of looking at things from different points of view.

All discussion about shapes of relationships (tree, network, directed graph, etc) are all fair game. What I'd like to do is think about things like *mapping*, not just of the attributes of the "nodes" but of the links between them.

Infinitely many are the ways of mathematics.

Mappings... one of the most general concepts in mathematics, next to sets themselves! They take on semblances such as applications, functions, ordered n-ples, tables, graphs, forms, matrices, operators and younameit. But, when you mention types which may be a type of some other type, which may in turn be a type of some other type and the house that Jack built... and there may be more than one type of each type, but also each type may be a type of more than one other type, well, I see it naturallly as a directed graph. If anyone can think of another kind of mapping that suits the problem, throw it on the table...

If you've spent anytime with this, you know that its possible to turn graphs "inside-out" so that the attributes become the links and the links become attributes, but there are restrictions to such operations.

Right now I'm striving to continue a rather tricky task, which I began by first translating the original directed graph into a smaller one according to which branches are marked with a value of electric current. For each of these there is a node of the new graph, which also has children of a different type which are loads connected to some of the nodes of the original graph. The original graph can be considered a tree, of which the ancestor-descendant relation is altered but mainly contracted in an odd manner. A simple enough case of what you're after, though I did it in a nifty way that allowed me to allocate the new structs in simple C arrays, one for each type, and walk the old graph recursively but in such a way as to get the siblings all consecutive.

So, I now have two quite different representations of the same electric distribution network. In one, branches are the wires that cross the land at 10 or 15 kV and nodes are little buildings and may have connected loads. In the other, some nodes are some of the first one's branches, the other ones are loads and the branches are abstract.

Nerds are unfit for nodes...

IDMclean · December 12, 2006

I'm not sure how this helps (or hinders) but here it anyway.

humans have a base of {a, t, g, c} and {u}. u is special so it get's to be it's own set, in my mind. All of human DNA can be expressed by combonations of these bases. Now if I remember correctly (Chemistry and Genetics are not my strong points, yet), not all animals have this same set of bases.

So what would fall out of this, in my mind at least, is that any creature that shares a base element with another creature would be of greater similarity. Two creatures of identical base sets would be drasticly closer related than two creatures who shared one element of their base set, or no elements.

For making a classification scheme I see a Venn diagram and a bubble sort (or similar). Other attributes that could be used (though I am not sure how) are number of chromosomes. (IE Human c{46}, Horse c{64}, Donkey {62}, Mule{63}). The Venn Diagram can be generated from the relation of sets to one another. Of course this is thinking graphically. Which doesn't sound like what your looking for, Buffy. However that does give you something to go off, I think. Humans can be expressed as an array of arrays. Any creature can be. Humans{b{a, t, g, c} & {u}, c{46}}.

I imagine you could link this with the Linnean Classification scheme by further arrays(nodes?). Specific Epithet would be the termination point, while kingdom would be the beginning, and you could construct backwards. I am thinking of a sort of tree structure, I think. Though individual nodes, and particularly leaves, would have another level of relationships. That is a more complex geometry in my head. Which would amount to relationship of genetic commonalities.

I hope that helps some.

Buffy · December 12, 2006

This can all get so abstract, I'm going to take some shots at simple elements that will get the basic point across.

Here's the first one: Things we like to drink.

Classification Scheme 1: Components

Contains Sugar: kind, how much
Contains Alcohol: kind, how much
Fizzy!
Contains Fat
Contains Caffeine, Cocaine or other drugs

Classification Scheme 2: Manufacturing process

Mixing (Soda)
Milking (Milk)
Extraction (fruit juices, coffee)
Transformation (Beer!)

Classification Scheme 3: Provenance

Geographic location/cultural use
Discovery ("Ya see that animal over there with the big bag underneath it?")
Combination ("You spilled your chocolate milk in my coffee!")
Age (how long have humans been consuming it)

Classification Scheme 4: Usage

Health
Treat
Drug
Ritual

As you can see, even with a simple concept, there are so many different ways of looking at it. How do we think about not only the classifications, but the mappings between the classifications (sweet->treat). This one is even too simple to see some other attributes, so I'll have to think of more. But its a starting point to get you all thinking about it....

Classy,

Buffy

Pyrotex · December 12, 2006

Part of your problem may be the phrase "relational database".

For all of their power and ease of use (and I have used plenty), RD's have inherent constraints on what they can be made to do. For example, I tried to use an RD to model a very simple concept, the "exploded graph" of a typical Work Breakdown Structure. In a WBS, a cog is a "cog", but it also belongs to a "gearbox", and a "transmission", and an "elbow force module", and an "elbow mechanical assembly", and a "robotic arm", and a "mobile grasping system", and...

I couldn't do it. At first, I was tortured with self-doubt and angst. Then I demonstrated that it couldn't be done. I can't speak for all RD's, but the two RD's I had at hand simply couldn't kiss their own elbows in the right way necessary to model a WBS with "exploded graph".

However, young Jedi, there is a way, if you are prepared to take it.

You can build your own custom RD, with arrays of links, arrays of double-links, circular lists, multiply-connected lists, whatever you can imagine.

In 1985, I was presented with a dragon task requiring a RD of immense power, to catalog every task required to prepare and train a crew for a Shuttle Mission, and eventually calculate the maximum Shuttle launch rate, given the current (and proposed) resources within NASA's Johnson Space Center.

I built my own RD out of FORTRAN arrays, and could define ANY relationship, indeed, any multiple-complex-disjointed, overlapping relationships I could think of -- and in five months had the answers.

Buffy · December 12, 2006

Part of your problem may be the phrase "relational database"....For example, I tried to use an RD to model a very simple concept, the "exploded graph" of a typical Work Breakdown Structure...Then I demonstrated that it couldn't be done. ...I built my own RD out of FORTRAN arrays, and could define ANY relationship, indeed, any multiple-complex-disjointed, overlapping relationships I could think of -- and in five months had the answers.

Yes, say "bill-of-materials" to a relational database designer and watch her head explode. But it can be done as Q noted above, you just have to know *how* to do it and have enough horsepower to deal with it and a practical approach to programmatic access (both cursors and embedding within general programming languages) and limits of recursion and you can do amazing things. Just like you did in Fortran!

That's why its better to model first, *then* design.

But anyway, the thread's about the modeling part. I'll have fun with implementation later. "Its merely an implemenation problem..."

select x, (select y, (select z from t where y=...

Buffy

Rade · December 13, 2006

Perhaps this will help ? I do not have 10 posts so I cannot provide the link :reallyconfused: --you need to put these together. It is a Thesis called "Visualising Multiple Overlapping Classification Hierarchies" by Martin Graham. (put a dot after the first 4).

www

dcs

napier

ac

uk/

~marting/

phd/

GrahamThesisFinal.pdf

Buffy · December 13, 2006

While I've only had a chance to skim it, its a very interesting thesis! Although it focuses almost exclusively on the actual visualization of multiple heirarchies, its got some interesting stuff on combining hierarchies that's relevant here. It does not seem to deal with non-heirarchical relationships (although it shows how to take mulitiple heirarchies and turn them into more generalized networks), nor transformation of such networks as I've mentioned above, so I think there are other interesting topics to discuss here.

For those of you who don't want to type the URL, here it is:

http://www.dcs.napier.ac.uk/~marting/phd/GrahamThesisFinal.pdf

Thanks for the post Rade!

Cheers,

Buffy

Eclogite · December 13, 2006

What made you think of it? What was it used for when you stumbled on it?

It didn't derive from programming, but from how I think about the world and therefore lies quite deeply at the heart of my philosophy. Classification is artificial, yet essential; the particular class we place an entity or event within determines our reaction to it; therefore, proper reaction can only arise if we use multiple classifcations, since this will bring us closer to an appreciation of reality (whatever that is).

So, I imagine flat-form data bases, intersecting and thereby defining the key properties of entities, or events at the point of intersection. Its just a free form mental technique I have found helpful in making sense of the world. :phone:

I hoped that it might have a chance resonance for you that would suggest a possible solution to your specific challenge.

Qfwfq, I hope the above makes clear why my intersecting nodes were not clear to you.:doh:

Qfwfq · December 13, 2006

I can see what you mean about intersections, I was stuck on fitting them in with two orthogonal matrices.

I might rather say "nodes of intersection" or something, whatever, I see it as where more than one tree or graph "intersect" each other int he sense that an entity is more than one type (as defined by more than one graph).

BTW, don't think that you need to be a programmer in order to understand graph theory. We've been using its terminology here, in relation both to type hierarchies and to relations between instances of types, a distinction that shouldn't be overlooked.

Sign In

Classification, Approaches to

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation