-
Upgraded to Ubuntu 16.04
04/24/2018 at 19:53 • 0 commentsWell I started again, and upgraded my box to Ubuntu 16.04 LTS. I wanted to be more up to date, and also since starting this project I have learned a heck of a lot about Linux and so I am not struggling quite so much understanding installation, and I wanted to get everything consistent.
I still actually don't have 4 GPU cards in there. The reason being that they have skyrocketed in cost because everyone is mining bitcoins. I mostly have been ensuring the software is consistent and things work ok on the setup I have. I do hope to expand it soon.I also upgraded the board firmware which is a case of downloading from manufacturer's site and putting it on a blank USB stick and then getting into the settings on startup (which is a challenge because it boots so fast) and finding the BIOS upgrade options.
I'll write more about other installation stuff later.
-
Success
06/27/2016 at 23:11 • 0 commentsI finally got tensorflow compiled from source and running with python3 cudnn version 5 and cuda toolkit 7.5 on the single GTX 970 that I currently have in this box.
The results are pretty awesome the training example for conv nets with mnist runs more than 25 times faster than it does on my macbook pro.
Now I am trying to decide if I should just add in a bunch of GTX 970s for the moment, because the 1080 is still very pricey.
-
Progress
06/27/2016 at 22:24 • 0 commentsWell I got past the Bazel thing (I think) by downloading and installing the binary.
Then the next issue was to try to run the tensor flow config script. However this wants to know a bunch of info which is tricky to find, the worst being the location of the cudnn library. It turns out that the headers are installed in /usr/include and the libs in /usr/lib/x86_64-linux-gnu/ and this script cant handle that. I had to copy the header to /usr/local/cuda/include and the libs to lib64 in the same directory. Also I had to add the libraries to the search path by adding the path to /etc/ld.so.conf.d/cuda.conf and running sudo ldconfig which is something I hadn't seen before.
I'm compiling TensorFlow now from source. I can't believe how many horrible warning messages are generated by this code.
-
Unable to install Bazel dependency
06/27/2016 at 20:02 • 1 commentI am trying to install TensorFlow from source on ubuntu 14.04. Installing the dependencies is the problem. I was able to install the python related dependencies, and then moved on to Bazel. The first step was to install java8, and that went ok.
The next was to install Bazel itself. When doing the apt-get update I just get the error
"W: Failed to fetch http://storage.googleapis.com/bazel-apt/dists/stable/InRelease Unable to find expected entry 'jdk1.8/binary-i386/Packages' in Release file (Wrong sources.list entry or malformed file) / E: Some index files failed to download. They have been ignored, or old ones used instead."
and then I am lost. I can't find anything useful about this error on the web. I might try installing the binary directly.
-
TensorFlow and cuDNN versions.
06/27/2016 at 19:28 • 0 commentsTurns out I have to install tensor flow from source because the binary version only works with cuDNN version 4. I installed cuDNN version 5, and I was going to roll back to v4, but then I found out that version 4 does not work with the GTX 1080 cards. That needs version 5. So now I am trying to uninstall TensorFlow and then reinstall it from source.
-
SLI
06/05/2016 at 03:08 • 0 commentsOne of the things I assumed earlier on was that I would want to use SLI, however this is not true. SLI is useful for games when you want multiple graphics cards to look like a single card. But in the case of using Tensor Flow or cuDNN, one just needs to specify which GPU device you want each part of the code to run on, and there's no need to pretend each one is a single device. So I didn't bother buying any SLI bridges or whatever.
-
GTX 1080
06/05/2016 at 03:06 • 0 commentsHere are some details about the GTX 1080. I definitely want to use four of these cards, but I'm having to wait for the price to come down. Currently the founders edition is available on Amazon but with a huge markup at around $850. I'm not sure how long it takes for these cards to settle on a more reasonable price. But I'm happy to get things going with the GTX 970 at the moment since its better than training on my laptop and there's a lot of setup to do with customizing the environment.
-
Installing packages
06/04/2016 at 05:03 • 0 commentsThe next tasks were
- Ensure all packages up to date with apt-get
- Ensure that python3 was ok
- Install scipy and any other relevant packages
- Install CUDA toolkit 7.5
- Install cuDNN
- Install pip3 because TensorFlow needs this
- Install TensorFlow
I installed scipy using 'sudo apt-get install python3-scipy'.
I installed CUDA toolkit using the deb(network) link, using dpkg, then apt-get.
I then went on to attempt to install cuDNN and chose the newest version (5). This turned out to be a mistake. Also cuDNN installation is confusing. In the end I installed it using the two deb packages, not manually. It puts the header files and shared libraries in the usual place but not the same place as CUDA toolkit.
Then I installed pip3 because this is the only way to get TensorFlow for linux. I followed the instructions on the TensorFlow site for linux with python3 and a GPU and everything built ok.
I needed to set the LD_LIBRARY_PATH to get python3 to find the cuda libraries.
I ran the tests on the TensorFlow page and everything seemed ok. But then when I tried to run the MNIST training example it crashed.
Eventually I found that this crash is because TensorFlow needs cuDNN version 4 not version 5. So now I have to go back and screw with the installation trying to remove cuDNN and install the older one and possibly rebuild TensorFlow.
-
Installing the OS
06/04/2016 at 04:50 • 0 commentsI first installed Ubuntu 16.04 from a DVD ISO using an external DVD drive. This went ok. I firstly formatted the new data drive:
- Install gksudo using apt-get
- install gparted using gksudo
- Run parted and partition and format the drive in MSDOS mode
Then I opened up the Ubuntu GUI for the installation / software update, and found that the machine was using an open source driver for the GTX 970 that was installed. It gave me the option of switching to the Nvidia proprietary driver which I did.
Then I attempted to install CUDA toolkit 7.5, and then hit my first snag, because there is no version available for Ubuntu 16.04. If you try to download the deb package anyway you will find that it will not install and complains about the signing key being too short. This is because the change to Ubuntu 16.04 has deprecated the package validation that Nvidia is using.
So therefore I started again and installed Ubuntu 14.04.4 LTS.
Then I ran into another problem which is that there was now a UI related crash on startup (unity-settings-daemon). However when I switched to the Nvidia display driver again this seemed to stop happening.
-
Getting the hardware running
06/04/2016 at 04:33 • 0 commentsThe building process was not too hard and kind of fun. The components look impressive.
One hassle with putting this together was fitting the water cooling radiator to the provided fan and attaching this to the case. I had to tie the thing in place with zip ties so that I could then put the bolts through and insert the washers without it all falling apart and losing small parts in the case.
It's not obvious which holes to use for screwing the PSU to the wall of the case. Some holes that look like holes are not threaded so it was easy to cross thread the screws into these - inspect carefully which are legitimate holes.
The mains cabe is very thick. With 1600W the machine basically needs its own 15A circuit. However the four GTX 1080s are not too power hungry so I think the PSU is probably over specified.
Another hassle was trying to plug all the cables into the motherboard when its actually quite dark in there (I used a flashlight) and also it was easy to bend the fine pins on the USB board interconnects.
There are four fans and a water pump in the cooling system. I plugged the fan that is on the radiator into the CPU_FAN board connector and the water pump into the CPU_OPT connector. There are three other fan connectors distributed around the board which I used for the case fans.
It's important to put the RAM in the right location. This system uses 4 DDR4 modules to make up 32GB. These go in the gray RAM sockets.
I was pleased to find that it booted up fine.
I upgraded the BIOS. One can use the QFLASH tool with a USB drive that contained the BIOS from the Gigabyte site. Initially I was getting an error trying to upgrade but eventually I realized that I was using the X99-SLI BIOS instead of the one for X99P-SLI.
The big problem that stopped me for almost a week was that the BIOS could not see the M.2 drive, so I could not install any OS. I looked everywhere for solutions on line and throughout every setting on the BIOS. In the end I replaced both the M.2 drive and the motherboard and then it worked ok. So I think there was some problem with the motherboard. Note that this motherboard does not support SATA M.2 drives, they must be PCIe.