centos6编译安装tensorflow+mkl

前提

注意:
tensorflow编译路径,当前账户HOME路径均不能使用NFS文件系统 ,编译后的文件安装不受此限制.
MKL运行时设置环境变量:
MKL_NUM_THREADS=核数
OMP_NUM_THREADS=核数
KMP_AFFINITY=granularity=fine,compact
限制线程或关闭超线程,否则性能反倒会降低.
此编译过程只适用于redhat6 centos6

准备工作

下载文件

下载tensorflow-1.1.0

wget https://github.com/tensorflow/tensorflow/archive/v1.1.0.zip
unzip v1.1.0.zip -d tensorflow
# or
git clone --recurse-submodules https://github.com/tensorflow/tensorflow.git -b v1.1.0
# 更名为tensorflow-1.1.0-mkl, 可选,个人喜好.
mv tensorflow tensorflow-1.1.0-mkl

下载mklml库,并存放到tensorflow third_party mkl文件夹, tensorflow编译配置时会检测此文件夹下的mkl库文件,事实上这里是自动下载的.但在大陆网络不稳定…..

wget https://github.com/01org/mkl-dnn/releases/download/v0.5/mklml_lnx_2017.0.2.20170209.tgz
cp mklml_lnx_2017.0.2.20170209.tgz tensorflow-1.1.0-mkl/third_party/mkl/mklml_lnx_2017.0.2.20170209.tgz

更改tensorflow配置文件

tensorflow-1.1.0默认并未启用mkl, 且在redhat6/centos6上有兼容问题,因此需要更改部分设置.
更改tensorflow-1.1.0-mkl/configure
找到如下内容(第91行):

## Set up MKL related environment settings
if false; then # Disable building with MKL for now

更改为

## Set up MKL related environment settings
if true; then # Disable building with MKL for now

redhat6/centos6太老,为了顺利运行tensorflow代码,增加librt.so链接项(否则编译正常,但安装后运行时会出现 _pywrap_tensorflow_internal.so: undefined symbol: clock_gettime 等类似链接符号错误)
更改tensorflow-1.1.0-mkl/tensorflow/tensorflow.bzl
找到如下内容(第787行)

def tf_extension_linkopts():
    return []  # No extension link opts

更改为

def tf_extension_linkopts():
    return ["-lrt"]  # No extension link opts

编译tensorflow

cd tensorflow-1.1.0-mkl
# 切换编译器 gcc 4.9.
scl enable devtoolset-3 bash
# 配置tensorflow
./configure
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3.4
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 
# 不要使用jemalloc,否则后续编译会出错(Centos7/RedHat7无此问题)
Do you wish to use jemalloc as the malloc implementation? [Y/n] n
jemalloc disabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] 
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] 
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] 
No XLA support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/lib/python3.4/site-packages
  /usr/lib64/python3.4/site-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3.4/site-packages]
Using python library path: /usr/lib/python3.4/site-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] 
No OpenCL support will be enabled for TensorFlow
# 如有NVIDIA显卡并已安装CUDA Toolkit
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 
Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 5.2
Configuration finished

编译

# 仅CPU,不使用MKL
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
# 仅CPU, 使用MKL(限intel处理器)
bazel build --config=opt --config=mkl //tensorflow/tools/pip_package:build_pip_package
# 仅CPU, 使用MKL,且CPU是Intel XEON或phi处理器
bazel build --config=opt --config=mkl --copt="-DEIGEN_USE_VML" //tensorflow/tools/pip_package:build_pip_package
# 启用CUDA
bazel build --config=opt --config=mkl --copt="-DEIGEN_USE_VML" --config=cuda //tensorflow/tools/pip_package:build_pip_package
# INTEL CPU + CUDA
bazel build --config=opt --config=mkl --config=cuda //tensorflow/tools/pip_package:build_pip_package

生成python whl包

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

安装

sudo pip3 install /tmp/tensorflow_pkg/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl

5 Responses to“centos6编译安装tensorflow+mkl”

  1. JONAS
    2017年5月31日 at pm5:36 #

    感謝您這篇詳細的文章!!事實上,這幾天也在CentOS6嘗試安裝tensorflow build from source,但總是在bazel build時顯示缺乏glibc-2.14的錯誤訊息。
    想請問您是預設的glibc環境嗎? (2.12?)
    還有您bazel是安裝哪個版本呢?(或者可以說明在CentOS6中的安裝方式嗎?)

    • zhuolin
      2017年6月10日 at pm4:29 #

      bazel必须从原码编译安装,安装过程直接参考官方 https://bazel.build/versions/master/docs/install-compile-source.html
      使用的版本是bazel release 0.4.5
      服务器上只有glibc-2.12(也就是centos6/redhat6自带),只通过yum install devtoolset-3-gcc-c++
      其余则是提示碰到了缺少的工具再yum. 除bazel外,未安装任何其他非redhat/centos/epel/rh仓库的组件

  2. elfin
    2017年8月12日 at pm7:06 #

    您好,感谢您的分享,想向您请教下,基于https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-architecture文章,发现用了基于mkl的tensorflow(主要使用了conv2d函数和conv2d_transpose) 反而慢了,您有碰到过这个问题吗,谢谢了

    • zhuolin
      2017年9月19日 at am10:08 #

      关闭超线程性能会有改善,我也发现mkl并无预期快,应该是tf自己在特定任务和库交互没优化好.

  3. Yao
    2019年6月26日 at am1:37 #

    如果我的机器上包含多个phi处理器,请问安装的tensorflow在运行的时候可以自动进行调度在多个phi上并发么?

Leave a Reply to zhuolin Cancel reply

Your email address will not be published.

Time limit is exhausted. Please reload the CAPTCHA.

Proudly powered by WordPress   Premium Style Theme by www.gopiplus.com