由于之前已经在Ubuntu 14.04 x64上面安装cuda7.0+caffe, 并且已经配置好,caffe也已经跑通。
但是最近需要使用Torch,而Torch对cuda的要求是8.0,因此决定对cuda的版本进行升级,以满足Torch平台的需求。
而最新版的caffe也已经支持cuda8.0。
话不多说,开始安装cuda8.0。
显卡:GeForce GTX TITAN X 系统:Ubuntu 14.04(x_64) CUDA:cuda_8.0.61_375.26_linux.run cuDNN:cudnn-8.0-linux-x64-v5.1.tgz
1. GeForce GTX TITAN X显卡必须用CUDA 8.0版本。CUDA从此处下载。切记,千万不要下载 deb 包,否则后方无数坑在等着你。
 
在安装
GTX1080显卡必须用cuDNN-8.0-V5.1版本,不然用 caffe 跑模型,用 CPU或GPU显卡跑精度正常,一旦开启cuDNN模式,精度(acc)立刻下降到 0.1 左右,loss 非常大。cuDNN在此处下载。下载需注册。最好注册一个账号,选择对应的版本,不要用网上其他教程给的现成的包,出问题的概率非常大。

注:此时安装过程中提示是否要安装NVIDIA驱动时选择no。其他选择yes或默认即可。
Do you accept the previously read EULA? accept/decline/quit: accept Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26? (y)es/(n)o/(q)uit: n Install the CUDA 8.0 Toolkit? (y)es/(n)o/(q)uit: y Enter Toolkit Location [ default is /usr/local/cuda-8.0 ]: Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y Install the CUDA 8.0 Samples? (y)es/(n)o/(q)uit: y Enter CUDA Samples Location [ default is /home/zhou ]: Installing the CUDA Toolkit in /usr/local/cuda-8.0 …
完成后可以看到以下提示信息:
Driver: Not Selected Toolkit: Installed in /usr/local/cuda-8.0 Samples: Installed in /home/startag, but missing recommended libraries
如果在安装过程中安装了cuda8.0的显卡驱动,这时候在重启过程中,会与原先安装的Nvidia显卡驱动冲突,开机时会一直有一个图标闪动。
这时候ctrl + Alt +F1~7 均没有反应,这是因为两个版本的显卡驱动不符,冲突导致。
解决方案为:
重启(reboot)-> Esc 键 -> Ubuntu Recovery 模式
在等待一段时间之后,会进入一个选择界面,选择 root 模式,进入命令行模式之后,使用下面命令卸载Nvidia显卡驱动:
sudo apt-get purge nvidia-*
把nvidia全卸了,再reboot就能进去了。
如果在使用上面命令过程中出现apt-get不能使用的情况,那么需要使用fix 模式来对broken package进行修复,待修复完成之后再进入root 模式下卸载Nvidia显卡驱动。
解决方案见:Ubuntu GNOME 16.04,安装NVIDIA驱动后无法开机,怎么解决?
重启之后即可正常运行 cuda 程序。
Cuda 8.0 以及 Sample 安装检查:
在安装完成之后,需要检查cuda程序是否安装成功:
进入/usr/local/cuda/samples, 执行下列命令来build samples
sudo make all -j8
全部编译完成后, 进入 ./bin/x86_64/linux/release, 运行deviceQuery
./deviceQuery
如果出现下面信息,
CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 30 -> unknown error Result = FAIL
则需要使用chmod 777 -R deviceQuery 方法来给deviceQuery文件sudo 权限,解决方案见:CUDA deviceQuery returned 30 error After upgrade to nvidia 334.21-1
出现下面信息,则说明cuda 以及 cuda sample安装成功:
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX TITAN X"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 12199 MBytes (12791185408 bytes)
  (24) Multiprocessors, (128) CUDA Cores/MP:     3072 CUDA Cores
  GPU Max Clock rate:                            1076 MHz (1.08 GHz)
  Memory Clock rate:                             3505 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX TITAN X
Result = PASS
也可以使用下面命令测试是否安装成功:
nvidia-smi
输出信息:
Thu May 25 20:33:01 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:01:00.0      On |                  N/A |
| 22%   40C    P8    17W / 250W |    500MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1275    G   /usr/lib/xorg/Xorg                             267MiB |
|    0      2416    G   compiz                                         106MiB |
|    0      2978    G   ...el-token=EFAC54A3CB4FC0DBAF418394276E4C3B   124MiB |
+-----------------------------------------------------------------------------+
原文:http://www.cnblogs.com/empty16/p/6906121.html