I am usually pretty reserved with cash, but after working full-time for six months, I finally decided to spend some of my money on building a new research development server. This process was long overdue and the reason it took me so long to commit to this project was all of the new technology developed since building my last server. This “new technology” can be pretty confusing unless one specializes in computer architecture. I want to share what I have learned throughout this process, while giving some background. These are only my opinions, and I may be wrong on some things as I am not a hardware expert. I encourage you to read and learn more on your own.
The CPU/Processor
If you are reading this article, I probably do not need to explain what the CPU/processor does. For high performance computing, you will want to get a CPU that is very “fast” and also has multiple cores. The definition of the word “fast” is in the eyes of the beholder and typically refers to more than just clock speed (GHz). In the constant war between AMD and Intel, I stick with Intel. AMD processors are powerful, but they seem to have more of a market with gamers. Intel is my preference, but I have not yet run into anyone that feels strongly towards AMD for high-performance computing (HPC). There are two main processor lines under Intel: standard, and Xeon. Standard processors are your run of the mill CPUs that are found in consumer desktop machines. Xeon processors are designed for non-consumer server, workstation and embedded systems use. I do not consider researchers as “consumers,” we are producers, so the Xeon family is better suited to our needs. On the other hand, you may find that a standard CPU will fit your needs for your particular research or use case. Xeon processors typically have more cache and more multiprocessing capabilities…and they are a lot more expensive. For high-performance computing, I strongly suggest Intel Xeon.
After months of research, I have concluded that multiple Intel Xeon processors are better than one Intel Core i7. As of the time of this writing, it seems that i7 processors cannot be doubled (or tripled etc.) up like Xeons can. Like the AMD, the i7 seems to be favored by gamers and those needing a richer multimedia experience.
In 2011, most CPUs in new systems have multiple cores. Each core can essentially run one process each. A system with n cores can run n processes simultaneously. Many CPUs are hyperthreading enabled, meaning that each core can actually run 2 threads simultaneously, bringing the total number of threads to 2n. But can’t the system already run multiple processes concurrently? We can run Firefox, TweetDeck, Thunderbird etc. concurrently, right? In practice, it seems that the CPU is processing multiple threads simultaneously. If we could slow down time to the micro level, one would see that the CPU works on one process at a time, then does a context switch to another process. Theoretically, this gives the illusion that the CPU is running multiple processes simultaneously.
While Intel makes great products, its inventory is a nightmare to navigate. There are several things that you must know to ballpark a particular CPU model.
- the model number (the most reliable!)
- the brand name specifies a group of CPU models satisfying similar use cases (Core [i3/i5/i7/i9], Core 2 Duo, Quad Core, Pentium, Xeon).
- the architecture/subarchitecture — specifies a type of processor within a brand, each containing many series (Nehalem, Westmere, Sandy Bridge are common ones these days)
- the chipset (not commonly referred to, examples: Tylersburg, Cougar Point, Panther Point)
- the platform which refers to a set of models (e.g. Harpertown, Jasper Forest, Gainestown, Prescott, Gulftown). Models within a series are typically only differentiated by clock speed (GHz).
- the socket type specifies the shape and size of the CPU. The CPU and the motherboard must have the same socket type (i.e. LGA1366, Socket 775)
As if this is not confusing enough, each Intel Xeon model number is prefixed with a letter for different use cases. The letter distinguishes CPUs with differing thermal dissipation power (TDP). (source)
- W stands for “Workstation” and is meant to be installed in pairs. This designation does not seem very common anymore. They typically run the fastest (clock speed) and the hottest. They require significant cooling.
- E is “mainstream (rack mount)” and the standard model of CPU. Although it is “standard,” there is nothing wrong with it performancewise, but will run hot even when idle.
- X stands for “performance” and are similar to E but provide for extra overclocking capabilities and have lower idle power draw.
- L stands for “power optimized” and are low voltage CPUs (60W or less) that are typically only used for data centers or rack servers. They typically do not come in the higher clock speeds etc.
For the Intel Xeon, model numbers indicate what configuration it is compatible with on the motherboard (source):
- 3xxx Xeons are designed to be used by themselves, as the only CPU on the motherboard.
- 5xxx Xeons are designed to be used in pairs; two CPUs on the motherboard.
- 7xxx Xeons are designed to be used in pairs, or in larger groups.
The 2 CPUs that I purchased are model Intel Xeon E5645. The Intel Xeon E5645 is part of the Gulftown platform of the Xeon family. It uses the Westmere subarchiture which is the 32 nm shrink of the Nehalem architecture spec and connects to the system bus using socket LGA1366. (This is the same architecture used for the i7-9xx series to make it more confusing) The E means that it is a “mainstream” CPU. Since it is a 5000 model, it is installed with another identical CPU on the same board.
The number of cores is important. Most chips in current desktops contain 2 or 4 cores. Higher end systems and servers may have 6, 8 or 10 cores per chip. Xeons with 8 and 10 cores per unit debuted in Q2 of 2011 and are very expensive (about $2000 for 8 cores). They also require a brand new socket type (LGA1367), which means a new, expensive motherboard. A CPU with more cores allows an application to perform several units of work per task; these processors allow higher bandwidth.
The clock speed (GHz) used to be the deciding factor for most people, until Moore’s Law broke down. Higher clock speed possibly allows a single process to complete faster. Since games typically use a limited number of threads and require quick performance, a single i7 is a good choice. The i7 has multiple cores, and also has a very high clock speed.
The cache size and speed is also important. The cache allows very high speed access to memory locations that are frequently accessed by copying the data from RAM into the CPU cache. Modern systems typically have three levels of cache: L1, L2 and L3. L1 cache is said to be the “closest” to the CPU, meaning the CPU queries the L1 cache first when performing a memory access. The L1 cache is the smallest. The L2 and L3 caches are accessed next in order, and L3 cache is larger than L2 cache. Very simply put, CPUs with larger caches (especially L1) are better.
Newer processors report CPU throughput as gigatransfers per second (GT/sec) which, like GHz, quantifies some measure of “speed.” Using GT/s, one can compute the number of bits the CPU can transfer per second as
Think of the cores vs. clock speed decision as a highway. Suppose the clock speed indicates the maximum speed limit on a single lane highway. A faster CPU corresponds to a single lane highway with a high speed limit. You will get to your destination faster. On the other hand, consider a one-lane vs. a two-lane highway, both with identical speed limits. If one lane is too busy for you, take the other lane. An increase in the number of cores increases the number of choices of lanes you can transition to. On the single-lane highway, you would need to slow down and wait for the cars in front you to move forward. By switching lanes, you may get to your destination faster, or you may not, but more driving is completed overall.
So how much did this set you back?
About $2500, which isn’t too bad for me. A MacPro would have been close to $3000 for a low end. I was eyeing the MacPro…but I am kind of over Mac. I am still not settled on ALL of the hardware in this purchase. I’ve considered exchanging for a faster CPU clock speed, or a different motherboard, but we will see.
I kinda read this as “just buy top of the line everything except maybe the motherboard and you’ll be good.” Any experience with tradeoffs? How would you prioritize cpu vs ram vs disk io speed?
In my experience multiple cores don’t necessarily give a huge boost in HPC if you aren’t writing for it. I don’t tend to write multithreaded code because of the complexity and debugging issues that go with it. In that case, more cores won’t help my code finish faster; it’ll only help if there are multiple jobs running at once. Xeons are great if you’re switching between a boatload of threads like a web server might or you’re concerned about power/overheating, but in my experience they aren’t worth the extra cash you pay. In that situation, clock speed can be a lot more important than you give it credit for here.
Well, to be frank, this upgrade has been way overdue, and I did a bit of “throwing money” around. The purpose of the post was to document the decisions I made and what else was out there. Of course it will come across that way, because in an ideal world, if everyone could afford the top of the line hardware, we would all be good computing-wise. It would be too difficult to tailor this post to everyone’s needs.
In terms of priority, *for my research use* I put RAM as the most important, followed closely behind by the CPU and the number of threads it can run concurrently (not so much the clock speed), followed by disk I/O. I find that for my work investing in more or better RAM gives the best bang for the buck, followed by the CPU. Although a lot of my work is disk bound (crawling), faster disks are so much more expensive and are out of my budget. Because of this, I had to make the tradeoff to favor upgrading RAM over disks. At least there are some tricks that can be done with RAM to prevent overuse of disks.
To me “high performance computing” goes hand in hand with parallelism. Of course if the code has not been written to take advantage of the cores, the extra cores are useless. I would expect that if someone is buying a multicore processor, they intend to program to use the cores. For my research, not programming to take advantage of these extra cores would qualify as *not* high performance computing. I never suggested that CPUs with more cores will make things run faster; just that more work is done at a time. Hadoop is the most trivial example I can give where code takes advantage of multiple cores, OpenMP is another.
Everyone will have their own opinion, but for my use, based on experiences I have had using servers containing Xeons, and the issues I faced without a higher-end processor, the Xeons were the way to go without a question.
why wouldn’t you drop that money on several cheap computers, instead of one expensive one?
for $2500 you could have got 14 motherboards, 14 AMD Phenom quad-core CPUs and 14 x 1 GB ram
Sigh. I’ve heard that. While a cluster would be cool, I tend to use AWS when I need a cluster setup. Also just too much of a pain to have to build that many systems. I don’t spend money often, so I was ok with putting out a one time large purchase ;). This machine really serves two purposes: for development of jobs to be shipped to AWS (time = $), and to prevent me from having to use AWS for high-memory, high-speed applications.
Over the past 20 years though, I have collected a lot of old machines that might an ok cluster, just obviously not high end.
Heya
Is SATA3 Working on this motherboard?
I cant find any info saying that this Motherboard can support SATA3. Only Sata 2 in official notes.
Let me know if u got Sata 3 to work.
Thanks bye
I got D18 too, works like dream so far!
It may not support SATA III. Drives seem to lag far behind the interfaces though. Most drives can barely perform at SATA II interface speeds.
Btw forgot to say, U might be better off with EVGA Server motherboard- They can OC CPU = 12 cores x 4-5 ghz after OC…. That would be pretty much power…
I am having trouble receiving the D18. It seems to be a rare beast. It’s taken almost a month now. Considering cancelling. If I do I will probably go with the EVGA or Supermicro.
I can sell u mine D18 and jump to EVGA tbh. I could use some extra OC for my work π
Which OS do you plan on running?
I finished the system π
I put in the same HD that was running on my old server. I am running 64bit Ubuntu 10.04 (Lucid). Runs great!
Considering upgrading to 11.04, or switching to CentOS, but I really don’t have too much of a reason to switch to CentOS.