首页 > 编程语言 > 详细

python spark install

时间:2016-01-09 12:29:44      阅读:261      评论:0      收藏:0      [点我收藏+]

from http://jmdvinodjmd.blogspot.in/2015/08/installing-ipython-notebook-with-apache.html
1. First of all download Apache Spark from here.
Select the pre built version like spark-1.4.1-bin-hadoop2.6.tgz otherwise you may need to build the source files.

技术分享

2. Now extract Spark form the downloaded zip file. And place at desired location(I placed it in C drive)

3. Create an Environment variable named ‘SPARK_HOME’ with value equal to path upto folder containing Apache Spark. (I placed Spark inside C drive so I set SPARK_HOME equal to ‘C:\spark-1.4.1-bin-hadoop2.6′).

4. Download Anaconda Python distribution from here.
We can download either 2.x or 3.x versions. I suggest you to download 2.x since the higher versions may not be supported by Apache Spark. I downloaded Windows 32-bit — Python 2.7 — Graphical Installer.

技术分享

5. Install Anaconda. This would install Python 2.7, IPython and other necessary libraries for Python. Moreover this should set some environment variable for you which are required to access python.

6. Open command prompt and enter command-

ipython profile create pyspark

This should create a pyspark profile where we need to make some changes.

Now we need to make changes in two configuration files. ipython_notebook_config.py and 00-pyspark-setup.py

6. Locate ipython_notebook_config.py file at C:\Users\here_your_user_name_which_is_currently_logged_in\.ipython\profile_pyspark\ipython_notebook_config.py and add following lines of code to it-

c = get_config()
c.NotebookApp.ip = ‘*’
c.NotebookApp.open_browser = True
c.NotebookApp.port = 8880 # or whatever you want; be aware of conflicts with CDH

7. Now create file named 00-pyspark-setup.pyin C:\Users\Vinod\.ipython\profile_pyspark\startup
(Note-Here path contains ‘Vinod’ which is user name so you need to replace it with your user name for currently logged in user in windows.)
Add following contents to 00-pyspark-setup.py

import os
import sys

spark_home = os.environ.get(‘SPARK_HOME’, None)
if not spark_home:
raise ValueError(‘SPARK_HOME environment variable is not set’)
sys.path.insert(0, os.path.join(spark_home, ‘python’))
sys.path.insert(0, os.path.join(spark_home, ‘python/lib/py4j-0.8.2.1-src.zip’))
execfile(os.path.join(spark_home, ‘python/pyspark/shell.py’))

Note:-All lines should remain same except the second last line. It contains location of ‘py4j’. So inside Spark folder go to python\lib folder and check the version of your py4j library and make changes into second last line accordingly.

8. Now open the command prompt and type following command to run the IPython notebook-

ipython notebook –profile=pyspark

This should launch ipython notebook in a browser.

9. Create one notebook and type ‘sc’ command in one cell and run it. You should get

<pyspark.context.SparkContext at 0x4f4a490>
This indicates your IPython notebook and Spark are successfully integrated. Otherwise if you get ‘ ‘ after running then that means integration is unsuccessful.
技术分享
NOTE: If we install IPython notebook separately instead of Anaconda +Spark then we might get following exception-
Java gateway process exited before sending the driver its port number.

python spark install

原文:http://www.cnblogs.com/fuxiaotong/p/5115704.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!