xuanxi 发表于 2019-1-30 10:54:06

配置Ipython Nodebook 运行 Python Spark 程序

配置Ipython Nodebook 运行 Python Spark 程序

1.1、安装Anaconda
  Anaconda的官网是https://www.anaconda.com,下载对应的版本;

1.1.1、下载Anaconda

$ cd /opt/local/src/
$ wget -c https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
1.1.2、安装Anaconda

# 参数 -b 表示 batch -p 表示指定安装目录
$ bash Anaconda3-5.2.0-Linux-x86_64.sh -p /opt/local/anaconda -b
1.1.3、配置Anaconda相关环境变量


[*]配置环境变量

$ tail -n 8 ~/.bashrc
# Anaconda3
export ANACONDA_PATH=/opt/local/anaconda
export PATH=$ANACONDA_PATH/bin:$PATH
# PySpark
export PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_PATH/bin/python

[*]启用环境变量

$ source ~/.bashrc

[*]验证

$ python --version
Python 3.6.5 :: Anaconda, Inc.
1.2、在Ipython Notebook 使用pySpark

1.2.1、创建工作目录

$ mkdir~/ipynotebook
$ cd ~/ipynotebook
1.2.2、Ipython Notebook 运行pySpark


[*]运行Ipython Notebook

$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
WARNING | You likely want to use `jupyter notebook` in the future
JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab
JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab
Serving notebooks from local directory: /home/hadoop/ipynotebook
0 active kernels
The Jupyter Notebook is running at:
http://localhost:8888/?token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d
Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d&token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d
Accepting one-time-token-authenticated connection from 127.0.0.1
  会自动通过默认的浏览器打开http://localhost:8888 页面


[*]在IPython Notebook 上编写程序
http://i2.运维网.com/images/blog/201806/24/fe20f7f1c1c9c02718686ffc4c677ed4.png

1.2.3、Ipython Notebook 在Hadoop Yarn 运行pySpark


[*]运行Ipython Notebook

$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop MASTER=yarn-client pyspark
WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
WARNING | You likely want to use `jupyter notebook` in the future
JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab
JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab
Serving notebooks from local directory: /home/hadoop/ipynotebook
0 active kernels
The Jupyter Notebook is running at:
http://localhost:8888/?token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45
Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45&token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45


[*]在IPython Notebook 上编写程序
http://i2.运维网.com/images/blog/201806/24/9f34014bd818efa0486d7e8932afdce1.png


[*]在YARN查看任务

$ yarn application -list
18/06/24 14:53:06 INFO client.RMProxy: Connecting to ResourceManager at node/192.168.20.10:8032
Total number of applications (application-types: [] and states: ):1
Application-Id      Application-Name      Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URL
application_1529805293111_0001          PySparkShell                   SPARK      hadoop   default             RUNNING         UNDEFINED            10%                  http://node:4040
1.2.4、Ipython Notebook 在Spark Stand Alone 运行pySpark


[*]启动Spark Stand Alone

$ /opt/local/spark/sbin/start-master.sh
$ /opt/local/spark/sbin/start-slaves.sh
$ jps
13249 Jps
13027 Master
13188 Worker

[*]运行Ipython Notebook

$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://node:7077 pyspark --num-executors 1 --total-executor-cores 1 --executor-memory 512m
WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
WARNING | You likely want to use `jupyter notebook` in the future
JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab
JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab
Serving notebooks from local directory: /home/hadoop/ipynotebook
0 active kernels
The Jupyter Notebook is running at:
http://localhost:8888/?token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea
Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea&token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea
Accepting one-time-token-authenticated connection from 127.0.0.1

[*]在IPython Notebook 上编写程序
http://i2.运维网.com/images/blog/201806/24/01a42c49970f954e34b98ce4ef9eee3a.png


[*]查看Spark Standalone Web UI 界面
http://i2.运维网.com/images/blog/201806/24/ebbf90eb3f33895a8fd779eee96e3da7.png

1.3、总结
  启动启动Ipython Notebook,首先进入Ipython Notebook的工作目录,如~/ipynotebook这个根据实际的情况确定;

1.3.1、Local 启动Ipython Notebook

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
#### 或者
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local
[*]
1.3.2、Hadoop YARN 启动Ipython Notebook

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop MASTER=yarn-client pyspark
#### 或者
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
1.3.2、Spark Stand Alone 启动Ipython Notebook

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://node:7077 pyspark --num-executors 1 --total-executor-cores 1 --executor-memory 512m


页: [1]
查看完整版本: 配置Ipython Nodebook 运行 Python Spark 程序