Two and Three GPUs rendering
Multi-GPU Aware Pro Apps
Posted Friday, July 11th, 2013 by rob-ART morgan, mad scientist
Over a month ago we posted some results showing how certain Pro Apps render faster with two (or more) GPUs installed in a Mac Pro. Well, we are back with more test results using even more examples of high-end GPUs including a pair of flashed GeForce GTX 770s loaned to us by MacVidCards.
After Effects CC Ray-traced 3D project of an animated robot uses CUDA capable GPUs exclusively for rendering. After Effects CS6 and CC automatically use all NVIDIA GPUs to render the project -- assuming the model name of your GPU pre-exists in or is added to the AE Whitelist of "raytracer_supported_cards." (FASTEST = the LOWEST time in minutes to the nearest hundredth.)
NOTE: The AE Ray-traced 3D animation we refer to as "robot" was provided courtesy of Juan Salvo and Danny Princz. Today, Danny has posted a compilation of render times featuring up to three GPUs.
Octane Render is a "GPU only" standalone renderer that can process scenes created in and exported from Maya, ArchiCAD, Cinema 4D, etc. -- and does so in a fraction of the time it takes with a CPU based renderer. However, currently it only works with CUDA capable NVIDIA graphics cards. The DEMO comes with a scene called octane_benchmark.ocs. For our test we selected RenderTarget PT (Path Tracing). The render time is tracked and displayed in minutes and seconds -- which we convert to minutes to the nearest hundredth.
You must "tell" Octane to render with multiple GPUs by going to Preferences > CUDA devices and putting a check mark next to all GPUs (CUDA devices) you want it to use. (FASTEST = the LOWEST time in minutes to the nearest hundredth.)
LuxMark OpenCL Benchmark
This is a benchmark that works with all GPUs that support OpenCL. Furthermore, the latest version 2.1b2 supports multiple GPUs. We feature here the results from rendering the Room scene which is extremely complex (2,000,000+ triangles) and is available only on 64bit executables. (FASTEST = HIGHEST number in thousands of samples per second.)
DaVinci Resolve 9.1.4 adds speed and power to color grading of HD video. It uses the GPU to apply and playback using specified effects in real time -- no pre-rendering required. However, the more effect nodes created, the slower the playback. The full version supports noise reduction which can seriously slowdown playback unless you have multiple GPUs at work.
Our graph features a two minute 1920x1080 10-bit YUV 4:2:2 24fps video. We set the maximum playback framerate to 500 fps to force fastest playback speed of four nodes (including color correction, blur, and noise reduction). Results are average frames per second. (FASTEST = HIGHEST frames per second.)
Dual G580C+Q4 = two GeForce GTX 580 Classifieds + Quadro 4000
Dual G770+Q4 = two GeForce GTX 770s + Quadro 4000
Dual G680+Q4 = two GeForce GTX 680s + Quadro 4000
Dual G570+Q4 = two GeForce GTX 570s + Quadro 4000
Dual G580C = two GeForce GTX 580 Classifieds
Dual G770 = two GeForce GTX 770s
Dual G680 = two GeForce GTX 680s
Dual G570 = two GeForce GTX 570s
Dual K5000 = two Quadro K5000s for Mac
One G690 = one GeForce GTX 690
One G580C = one GeForce GTX 580 Classified
One G770 = one GeForce GTX 770
One G680 = one GeForce GTX 680 Mac Edition
One G570 = one GeForce GTX 570
One K5000 = one Quadro K5000 for Mac
One R7950 = one Radeon HD 7950 Mac Edition
1. Two GPUs are definitely better than one in the 'multi-gpu aware' apps featured above. Since the pairs of high-end GPUs are 'two slots' wide, that leaves only the #4 PCie slot open. A third GPU had to be very thin. So that's why we used a Quadro 4000 for Mac as the display GPU in some cases to see what effect it might have.
In the case of Octane and LuxMark, the Quadro 4000 helped lower render time.
After Effects' ray-traced 3D render did not benefit from the Quadro 4000's presence. In some cases it slowed things down.
As for DaVinci Resolve, by being the 'display only' GPU, the Quadro 4000 freed up the matched pair of faster GPUs to concentrate on rendering the curves, blur and noise reduction nodes. When it participated in the rendering, it slowed things down.
The only way to get more than two matching, high-end, 'fat' GPUs functioning in a Mac Pro is to use an external PCIe Expansion box like the Cubix GPU-Xpander. With the soon-to-be-released 2013 Mac Pro, hopefully there will be support for full length 'fat' GPUs in external Thunderbolt 2.0 PCIe Expansion boxes.
2. If you don't have at least one CUDA capable GPU, OctaneRender will launch but will not render. Hopefully that will change by the time the new Mac Pro is released. We have already sent an email to the OctaneRender team imploring them to support OpenCL capable AMD GPUs like the FirePros in the new Mac Pro and Radeon HD 7950 in the current Mac Pro.
Currently, After Effects CS6 and CC will render Ray-traced 3D without CUDA enabled GPUs, but will use the CPU(s) instead of GPU. We tried it with a Radeon HD 5870 installed. It took 8 hours to render three-quarters of the 6 second Robot animation on a 6-core Mac Pro with 12 hyper threads. You get the picture. Unless Adobe modifies AE to render Ray-traced 3D with OpenCL, only CUDA capable NVIDIA GPUs need apply.
3. The $600 GeForce GTX 680 was faster than the $2200 Quadro K5000 -- which are being discounted to $1800 -- still triple the cost. Ditto for the lower priced flashed GTX 570 and 770.
4. Not all pro apps running under OS X utilize multiple GPUs for rendering. Some apps like Motion ignore all but the main display GPU. Others like Final Cut Pro X rely on CPU(s) even more the GPU when rendering video effects. A single GPU reported a 50% load according to OpenGL Driver Monitor. The CPU load was between 500 to 700% (or 5 to 7 cores) according to Activity Monitor.
Adding a second matched GPU didn't increase the GPU load. Instead it split the load at 25% each. The render times were the same whether running one or two GPUs. Hopefully this will change with the new Mac Pro and its dual FirePro GPUs -- plus an improved version of FCP and FPCX.
We were also disappointed with the speed bump provided by high-end GPUs and dual high-end GPUs when running Adobe Premiere Pro CS6 and CC.
5. Notice how well the Radeon HD 7950 did in the LuxMark test. That's the only test on this page that supports non-CUDA GPUs like those from AMD. Hopefully all the apps on this page will be modified to support the soon-to-be-released Mac Pro with it's AMD FirePro GPUs that support OpenCL but not CUDA.
POWER can be TRICKY
Each GTX 570, 680, and 770 requires two power feeds (and the Mac Pro only provides two total). To run a second 'fat' GPU, we used an external ATX 500W power supply. The lower optical bay of the Mac Pro can be used to provide a third power source. MacVidCards recommends a Booster X5 auxiliary power supply that fits neatly in the lower optical bay.
The second GTX 680 in our testing was a loaner from Mac*Pro who sells this 4G model on eBay. Same clock speed as the Mac Edition -- just more VRAM.
Power gets crazier with the GTX 580 Classifieds. They each require THREE power feeds. To make sure I 'fed' both cards sufficient power, I used TWO external power supplies rated at 500W each.
Comments? Suggestions? Email
, mad scientist.
Follow me on Twitter @barefeats
WHERE TO BUY CUDA CAPABLE NVIDIA GPUs for your MAC PRO