Jump to content
Science Forums

Building an HPC Cluster...


Recommended Posts

So I am dedicating this sticky to that, i am currently putting together an HPC cluster to run at home for use, including other things, but mainly for some project computation needs here at hypo (please check out the Strange Numbers thread)

 

I figured what i wanted to do, i am going to make custom installs of Gentoo on a few machines to start with, i will start a 2 node cluster on machines i have at home, i hope to go to a 4 or 5 node next week, and hopefully to an 8 node in the coming months (actually if i get a decent amount of money from taxes, this may turn to have a large amount of nodes before the summer arrives)

 

current hardware:

 

master:

3.4Ghz P4ht 800Mhz fsb

2.5Gb DDR400

 

node 1:

3.0Ghz P4ht 800MHz fsb

1.5GB ddr400

 

network: is a dedicated 8 port gigabit switch, intel gigabit cards.

 

Storage: not decided yet, master node may get a storage facelift, figuring 1-2T should suffice for now this will eliminate the usage of the network to store the data

 

as i physically get more nodes, i will post them here

 

first reference: Gentoo Linux Documentation -- High Performance Computing on Gentoo Linux

 

now i just need to make it happen software-wise

 

I decided to go with gentoo only because its a totally stripped down distro that i can build up... i didnt want a live distro, it uses too much ram, usually they are inefficient too, i might eventually make a net-boot version, but for now i just want to get it going, and start writing mpi code for it.

 

I'll post as i go along, on what happens, how it gets resolved, etc :shrug:

 

once i have master, new nodes will be allowed to use distcc to build their packages, so each consecutive node will have more processing power to compile with for boot strapping etc

Link to comment
Share on other sites

So I am dedicating this sticky to that, i am currently putting together an HPC cluster to run at home for use, including other things, but mainly for some project computation needs here at hypo (please check out the Strange Numbers thread)...

 

:) :eek: So dude, I prolly wouldn't know a cluster if it bit me in the backside, but I just happened to watch an interview today that Charlie Rose did for 40 minutes with Jen-Hsun Huang, CEO & co-founder of Nvidia. What does this have to do with fast processing you ask? Well, seems Mr. Huang and his gang are applying programmable GPU's to amp up overall raw math processing usually left to the CPU and getting performances as much as 100 times faster. :idea:

 

So, I can't explain it so well maybe, but here is the interview. This is as cutting edge as it gets as far as I can gather from it. Enjoy. :shrug:

 

Charlie Rose - A conversation with Jen-Hsun Huang, CEO Nvidia

Link to comment
Share on other sites

i know, i do research well, and well, i've been using gentoo since 2005, you have probably vague understanding of just how many installs of it i had done, lets put it this way, we had weekends at college where my friends and i installed gentoo for any advanced population that came around... that was neat...

 

yeah i am going to use openmpi and mpitch, distcc... i've also been thinking about Dr.Queue batch job manager, in case someone wanted to do some rendering on it (speaking of multitude of friends i have that do a lot of random things that might want to use the cluster)

Link to comment
Share on other sites

db if you are interested i will post details on the installs, they are fairly similar, juuust a little different hardware they are both compiling kernels right now, but that's besides the point... they are stage 3's that will eventually be recompiled... getting them as down to core as i can actually disabling almost everything to conserve power....

 

i will probably be ordering a couple of Gigabyte boards with Phenom II 920's and 8 gb ram for each, pretty soon, i have to come up with some sort of an enclosure for them, but more then that, with MOHNEH (yes those don't magically appear for anyone) that is currently a bit tight, for now, i am recycling my friend's no longer used equipment, as long as it's like a p4 or a 3000+ or newer series AMD (preferably HT, because i set makeopts to j3, and with symmetric multiprocessing, those procs give interesting stats on compiles), i will use it :)

 

base system is almost installed, they are finishing some essentials before first reboot

Link to comment
Share on other sites

I was going to work on it from work today, but even though i did not forget to enable ssh on my mac, as i was going to use it to ssh into the boxes that are being worked on, i did forget to disable Little Snitch... which you can NOT disable in command line... that pisses me off, but that's besides the point, you cant unload the kext, you cant shutdown the service... useless...

 

so i will have wait till later to finish the installs (i have a couple of things left, boot loader, and some packages, but i mostly recompiled all the system libraries and binaries over night (optimized for the processors they are running on CFLAGS="-march=pentium4 -O2 -pipe -fomit-frame-pointer" also i disabled all kinds of USE flags so not to compile sound support, or any X, kde, gnome or gtk libs and a load of other crap that gets loaded on the system)

 

Oh also note to anyone doing Gentoo installs from stage 2-3 (i use stage3 for the speed and then optimize it)

 

if you have an error just after unpacking udev do NOT panic. Problem is caused by the differences in the march flags used to compile the binaries you are loading vs the new march flags you set in make.conf

 

first reemerge gcc

 

second run gcc-config and set the profile to the i686

 

third source /etc/profile

 

forth reemerge libtool

 

also if ss and com_err are blocking e2fsptogs and e2fsprogs-libs

 

first fetch everything you might need

# emerge -vuDaf world

 

second backup ss and com_err

# quickpkg ss com_err

 

remove ss and com_err and e2fsprogs (if they are blocking themselves)

# emerge -C ss com_err e2fsprogs

 

emerge the new ones

#emerge -va --oneshot e2fsprogs-libs e2fsprogs

 

fix brokage

#revdep-rebuild

Link to comment
Share on other sites

the master and node1 are now almost there, i have setup distcc to compile using them, this means that i have now the ability to setup new systems a few times faster then normal, i also have gotten 2 more p4 machines, one ht, one not (i think), they will be joining the cluster probably closer to the end of the week (i have to make some space for them near some power sockets)

 

Plan for tonite is to get openmpi working, as well as torque and maui (gentoo restricts the download of the code, i have to download it, but wget does not download it either) i was having a ton of fun compiling the kernel yesterday, i saw about a 60% increase in speed with just one more machine (granted it was a 3.4 vs 3.0 with nearly twice the memory).

 

plan for first cluster test next week (probably render something in blender using mpi) (do i hear anyone say MIT box?)

Link to comment
Share on other sites

another update, i finally registered downloaded and installed maui, figured out my dyndns update issue on the router and forwarded 22 to the master node (the router runs an ips and a block all by default policy, the box runs iptables that only allows 22 and icmp through from the outside)... i think i covered security fairly well...

 

Now its only a matter of figuring out the configuration of the batch job manager, openmpi etc, etc, what needs to be installed on the nodes, testing it with the one node and finally throwing more nodes on :) should be done in the next couple of days...

Link to comment
Share on other sites

Wow, you are building a cluster using linux? I tried to build a cluster usning a couple of really old computers at school, I kept running into soo many problems. I was using Windows Active directory on windows 2000 OS.

 

How many years of experience have you had with Linux? Ive played around with it but I never thought once about switching cause I could never figure out the entire OS (ugh commands!).

 

Keep us updated :-)

Link to comment
Share on other sites

been using linux on and off since 2005 actually, but i had good teachers, well, they helped me set up my first gentoo box, then i started reading, working with it, in 5 months i was managing a linux lab and i've stuck with unix-like OSes since then (in techno speak nix, implying BSD, linux, minix, os x), I can't say that i have a main OS, i run a lot of different distros, and OSes. I run various flavors of linux, generally Gentoo, Ubuntu or BackTrack, but i have done others; SuSe, CentOS, Slackware, RedHat, Fedora, Yellow Dog, some small Linux distros, embedded linux, and others. I use OS X as my main platform for doing graphics and audio work, and i am once again rebuilding my OpenBSD router (its running on dual ultra spark III machine with 768 megs of ram and a quad interface nic on top of that, isa naas 1u).

 

I cant see why anyone would want to build a cluster on windows, and its not that i am against windows, its bang for the buck thing, performance, cost and stability, 3 things that Linux excels in, i will just say this, most clusters and supercomputers today run Linux.

 

It's not as hard as it may seem. If you want an advise for starting up, take a system that you are no longer using, and throw ubuntu on it, its easier to install then windows, and there is no need of command line managing it. Also ubuntu has a great community, and a lot of "howto"s and beginner guides are available. Just one thing, you have to do your own research to find how to do things, its an integral part of working with linux, the solution is out there, you just need to find the answers....

 

other then that, ask, there are a couple of people here i know that know linux fairly well, and we will help. Use Ubuntu for a bit, once you feel acquainted with the interface, try getting to know it better, learn some command line, eventually, anyone can be a pro :)

Link to comment
Share on other sites

Well, considering that my main computer is the only computer that can handle anything, I have three others, two are 12 year old dell's and the third is a dell optiplex 270 with so little memory (256mb) that it cant handle installing Sun's operating system Solaris 10 (it does run XP but poorly). If I get more memory for it and a couple more of them then I might consider building a cluster. I would like a linux box to mess around with and study. Know of anyplace I can get a linux box or no-OS box for $50-$100?

How are you building that router and is it working out? I tried to use windows because that was the one I am most familier with and the only linux they had was a really old version of ubuntu and puppy linux as well as one that supposedly acts like windows.

Windows Will always be my main operating system for better or for worse until WINE can run games, :-)

Link to comment
Share on other sites

btw those old boxes will run linux no probs....

 

by the way, a cluster is not what you think, it does not start magically using other computers and make it seem like one big computer... the parallel cluster i am building will run programs that are written specifically for parallel processing with a specific set of libraries to manage the nodes, etc, it's not like you will get a giant supercomputer by doing this that you can play crysis at 1000fps on, very specific use, not something you can build without knowing a thing or two about systems, not something you can build without planning a purpose for it...

Link to comment
Share on other sites

btw those old boxes will run linux no probs....

Yea but what can I do with them? If they dont have adaquite speed and processing power what good are they?

 

it's not like you will get a giant supercomputer by doing this that you can play crysis at 1000fps on, very specific use, not something you can build without knowing a thing or two about systems, not something you can build without planning a purpose for it...

Lol, I am well aware of that :-)

I just wanted to see what building a cluster was like. How will you be able to get a command to all the computers does the gentoo OS allow you to control them using its GUI or is it all command based? I read that there was some software for windows computers that linked them as a cluster but it was command based.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...