shang 发表于 2018-11-1 17:42:35

centos7安装CUDA9.2和cuDNN以及出现问题解决的过程总结

1、环境准备1.1、检查是否安装了GPU
# lspci | grep -i nvidia
06:00.0 3D controller: NVIDIA Corporation GP100GL (rev a1)
87:00.0 3D controller: NVIDIA Corporation GP100GL (rev a1)


1.2、安装gcc编译器和kernel-devel
# yum install gcc
# yum install kernel-devel


2、安装CUDA
2.1、下载安装文件
下载网站
http://developer.nvidia.com/cuda-downloads
Legacy Releases --> CUDA Toolkit 9.2 --> linux --> x86_64 --> CentOS --> 7 --> runfile --> Base Installer

2.2、安装Driver,Toolkit和Samples
# sh cuda_9.2.148_396.37_linux.run

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37?
(y)es/(n)o/(q)uit: y

Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: y

Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]:

Install the CUDA 9.2 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
[ default is /usr/local/cuda-9.2 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.2 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
[ default is /root ]:


2.3、配置环境变量并使之生效
# vi /etc/profile
export PATH=$PATH:/usr/local/cuda-9.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.2/lib64

# source /etc/profile


3、安装cuDNN
3.1、下载网站
https://developer.nvidia.com/cudnn
选择版本:Download cuDNN v7.3.1 (Sept 28, 2018), for CUDA 9.2 --> cuDNN v7.3.1 Library for Linux


3.2、解压并拷贝文件到cuda相应目录
# tar -zxf cudnn-9.2-linux-x64-v7.3.1.20.tgz

# cd cuda/
# ll
total 40
drwxr-xr-x 2 root root    21 Oct 17 16:09 include
drwxr-xr-x 2 root root    96 Oct 17 16:09 lib64
-r--r--r-- 1 root root 38963 Aug3 03:13 NVIDIA_SLA_cuDNN_Support.txt

# cp include/cudnn.h /usr/local/cuda-9.2/include/
# cp lib64/libcudnn* /usr/local/cuda-9.2/lib64/

为文件赋权
# chmod a+r /usr/local/cuda-9.2/include/cudnn.h /usr/local/cuda-9.2/lib64/libcudnn*

4、检查cuda是否安装成功
# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

加载驱动模块
# modprobe nvidia
modprobe: FATAL: Module nvidia not found.


5、解决问题
5.1、检查系统依赖
# yum info dkms
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name      : dkms
Arch      : noarch
Version   : 2.6.1
Release   : 1.el7
Size      : 219 k
Repo      : installed
From repo   : epel
Summary   : Dynamic Kernel Module Support Framework
URL         : http://linux.dell.com/dkms
License   : GPLv2+
Description : This package contains the framework for the Dynamic Kernel Module Support (DKMS)
            : method for installing module RPMS as originally developed by Dell.

# yum info libvdpau
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name      : libvdpau
Arch      : x86_64
Version   : 1.1.1
Release   : 3.el7
Size      : 73 k
Repo      : installed
From repo   : ol7_latest
Summary   : Wrapper library for the Video Decode and Presentation API
URL         : http://freedesktop.org/wiki/Software/VDPAU
License   : MIT
Description : VDPAU is the Video Decode and Presentation API for UNIX. It provides an
            : interface to video decode acceleration and presentation hardware present in
            : modern GPUs.

Available Packages
Name      : libvdpau
Arch      : i686
Version   : 1.1.1
Release   : 3.el7
Size      : 30 k
Repo      : ol7_latest/x86_64
Summary   : Wrapper library for the Video Decode and Presentation API
URL         : http://freedesktop.org/wiki/Software/VDPAU
License   : MIT
Description : VDPAU is the Video Decode and Presentation API for UNIX. It provides an
            : interface to video decode acceleration and presentation hardware present in
            : modern GPUs.


# yum info kernel-devel
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name      : kernel-devel
Arch      : x86_64
Version   : 3.10.0
Release   : 862.6.3.el7
Size      : 37 M
Repo      : installed
From repo   : ol7_latest
Summary   : Development package for building kernel modules to match the kernel
URL         : http://www.kernel.org/
License   : GPLv2
Description : This package provides kernel headers and makefiles sufficient to build modules
            : against the kernel package.

Name      : kernel-devel
Arch      : x86_64
Version   : 3.10.0
Release   : 862.14.4.el7
Size      : 37 M
Repo      : installed
From repo   : ol7_latest
Summary   : Development package for building kernel modules to match the kernel
URL         : http://www.kernel.org/
License   : GPLv2
Description : This package provides kernel headers and makefiles sufficient to build modules
            : against the kernel package.


5.2、为内核安装nvidia模块
dkms的模块需要经过added, build, install 3个步骤才能被modinfo检测到


# dkms status
nvidia, 384.66: added
nvidia, 396.37: added

显然,nvidia模块在安装的时候只是被added,还没有生成installed模块。

安装失败
# dkms build -m nvdia -v 384.66
Error! Could not find module source directory.
Directory: /usr/src/nvdia-384.66 does not exist.


# dkms build -m nvdia -v 396.37
Error! Could not find module source directory.
Directory: /usr/src/nvdia-396.37 does not exist.



网上下载相应的驱动并安装
https://www.nvidia.cn/Download/index.aspx?lang=cn
产品类型: Tesla
产品系列: P-Series
产品家族: Tesla P100
操作系统:Linux 64-bit
CUDA Toolkit: 9.2



# sh NVIDIA-Linux-x86_64-396.44.run


# dkms status
nvidia, 384.66: added
nvidia, 396.44, 3.10.0-862.14.4.el7.x86_64, x86_64: installed



可以看到nvidia模块已经为安装状态

执行modprobe 不在报错
# modprobe nvidia

查看驱动是否安装成功
# nvidia-smi
Wed Oct 17 18:17:34 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44               Driver Version: 396.44                  |
|-------------------------------+----------------------+----------------------+
| GPUName      Persistence-M| Bus-Id      Disp.A | Volatile Uncorr. ECC |
| FanTempPerfPwr:Usage/Cap|         Memory-Usage | GPU-UtilCompute M. |
|===============================+======================+======================|
|   0Tesla P100-PCIE...Off| 00000000:06:00.0 Off |                  0 |
| N/A   45C    P0    30W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1Tesla P100-PCIE...Off| 00000000:87:00.0 Off |                  0 |
| N/A   44C    P0    29W / 250W |      0MiB / 16280MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|GPU       PID   Type   Process name                           Usage      |
|=============================================================================|
|No running processes found                                                 |
+-----------------------------------------------------------------------------+


CUDA安装配置参考:
https://www.jianshu.com/p/a201b91b3d96

问题解决参考:
https://blog.csdn.net/yijuan_hw/article/details/53439408


页: [1]
查看完整版本: centos7安装CUDA9.2和cuDNN以及出现问题解决的过程总结