centos7安装CUDA9.2和cuDNN以及出现问题解决的过程总结
1、环境准备1.1、检查是否安装了GPU# lspci | grep -i nvidia
06:00.0 3D controller: NVIDIA Corporation GP100GL (rev a1)
87:00.0 3D controller: NVIDIA Corporation GP100GL (rev a1)
1.2、安装gcc编译器和kernel-devel
# yum install gcc
# yum install kernel-devel
2、安装CUDA
2.1、下载安装文件
下载网站
http://developer.nvidia.com/cuda-downloads
Legacy Releases --> CUDA Toolkit 9.2 --> linux --> x86_64 --> CentOS --> 7 --> runfile --> Base Installer
2.2、安装Driver,Toolkit和Samples
# sh cuda_9.2.148_396.37_linux.run
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37?
(y)es/(n)o/(q)uit: y
Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: y
Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]:
Install the CUDA 9.2 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-9.2 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 9.2 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /root ]:
2.3、配置环境变量并使之生效
# vi /etc/profile
export PATH=$PATH:/usr/local/cuda-9.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.2/lib64
# source /etc/profile
3、安装cuDNN
3.1、下载网站
https://developer.nvidia.com/cudnn
选择版本:Download cuDNN v7.3.1 (Sept 28, 2018), for CUDA 9.2 --> cuDNN v7.3.1 Library for Linux
3.2、解压并拷贝文件到cuda相应目录
# tar -zxf cudnn-9.2-linux-x64-v7.3.1.20.tgz
# cd cuda/
# ll
total 40
drwxr-xr-x 2 root root 21 Oct 17 16:09 include
drwxr-xr-x 2 root root 96 Oct 17 16:09 lib64
-r--r--r-- 1 root root 38963 Aug3 03:13 NVIDIA_SLA_cuDNN_Support.txt
# cp include/cudnn.h /usr/local/cuda-9.2/include/
# cp lib64/libcudnn* /usr/local/cuda-9.2/lib64/
为文件赋权
# chmod a+r /usr/local/cuda-9.2/include/cudnn.h /usr/local/cuda-9.2/lib64/libcudnn*
4、检查cuda是否安装成功
# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
加载驱动模块
# modprobe nvidia
modprobe: FATAL: Module nvidia not found.
5、解决问题
5.1、检查系统依赖
# yum info dkms
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name : dkms
Arch : noarch
Version : 2.6.1
Release : 1.el7
Size : 219 k
Repo : installed
From repo : epel
Summary : Dynamic Kernel Module Support Framework
URL : http://linux.dell.com/dkms
License : GPLv2+
Description : This package contains the framework for the Dynamic Kernel Module Support (DKMS)
: method for installing module RPMS as originally developed by Dell.
# yum info libvdpau
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name : libvdpau
Arch : x86_64
Version : 1.1.1
Release : 3.el7
Size : 73 k
Repo : installed
From repo : ol7_latest
Summary : Wrapper library for the Video Decode and Presentation API
URL : http://freedesktop.org/wiki/Software/VDPAU
License : MIT
Description : VDPAU is the Video Decode and Presentation API for UNIX. It provides an
: interface to video decode acceleration and presentation hardware present in
: modern GPUs.
Available Packages
Name : libvdpau
Arch : i686
Version : 1.1.1
Release : 3.el7
Size : 30 k
Repo : ol7_latest/x86_64
Summary : Wrapper library for the Video Decode and Presentation API
URL : http://freedesktop.org/wiki/Software/VDPAU
License : MIT
Description : VDPAU is the Video Decode and Presentation API for UNIX. It provides an
: interface to video decode acceleration and presentation hardware present in
: modern GPUs.
# yum info kernel-devel
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name : kernel-devel
Arch : x86_64
Version : 3.10.0
Release : 862.6.3.el7
Size : 37 M
Repo : installed
From repo : ol7_latest
Summary : Development package for building kernel modules to match the kernel
URL : http://www.kernel.org/
License : GPLv2
Description : This package provides kernel headers and makefiles sufficient to build modules
: against the kernel package.
Name : kernel-devel
Arch : x86_64
Version : 3.10.0
Release : 862.14.4.el7
Size : 37 M
Repo : installed
From repo : ol7_latest
Summary : Development package for building kernel modules to match the kernel
URL : http://www.kernel.org/
License : GPLv2
Description : This package provides kernel headers and makefiles sufficient to build modules
: against the kernel package.
5.2、为内核安装nvidia模块
dkms的模块需要经过added, build, install 3个步骤才能被modinfo检测到
# dkms status
nvidia, 384.66: added
nvidia, 396.37: added
显然,nvidia模块在安装的时候只是被added,还没有生成installed模块。
安装失败
# dkms build -m nvdia -v 384.66
Error! Could not find module source directory.
Directory: /usr/src/nvdia-384.66 does not exist.
# dkms build -m nvdia -v 396.37
Error! Could not find module source directory.
Directory: /usr/src/nvdia-396.37 does not exist.
网上下载相应的驱动并安装
https://www.nvidia.cn/Download/index.aspx?lang=cn
产品类型: Tesla
产品系列: P-Series
产品家族: Tesla P100
操作系统:Linux 64-bit
CUDA Toolkit: 9.2
# sh NVIDIA-Linux-x86_64-396.44.run
# dkms status
nvidia, 384.66: added
nvidia, 396.44, 3.10.0-862.14.4.el7.x86_64, x86_64: installed
可以看到nvidia模块已经为安装状态
执行modprobe 不在报错
# modprobe nvidia
查看驱动是否安装成功
# nvidia-smi
Wed Oct 17 18:17:34 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44 Driver Version: 396.44 |
|-------------------------------+----------------------+----------------------+
| GPUName Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| FanTempPerfPwr:Usage/Cap| Memory-Usage | GPU-UtilCompute M. |
|===============================+======================+======================|
| 0Tesla P100-PCIE...Off| 00000000:06:00.0 Off | 0 |
| N/A 45C P0 30W / 250W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1Tesla P100-PCIE...Off| 00000000:87:00.0 Off | 0 |
| N/A 44C P0 29W / 250W | 0MiB / 16280MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
|GPU PID Type Process name Usage |
|=============================================================================|
|No running processes found |
+-----------------------------------------------------------------------------+
CUDA安装配置参考:
https://www.jianshu.com/p/a201b91b3d96
问题解决参考:
https://blog.csdn.net/yijuan_hw/article/details/53439408
页:
[1]