首页 > 其他 > 详细

在standalone模式下运行yarn 0.9.0对HDFS上的数据进行计算

时间:2014-02-21 22:51:16      阅读:1230      评论:0      收藏:0      [点我收藏+]

1.通读http://spark.incubator.apache.org/docs/latest/spark-standalone.html

2.在每台机器上将spark安装到/opt/spark

3.在第一台机器上启动spark master.

[root@jfp3-1 latest]# ./sbin/start-master.sh

在logs目录查看日志:

[root@jfp3-1 latest]# tail -100f logs/spark-root-org.apache.spark.deploy.master.Master-1-jfp3-1.out
Spark Command: /usr/java/default/bin/java -cp :/opt/spark/spark-0.9.0-incubating-bin-hadoop2/conf:/opt/spark/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip jfp3-1 --port 7077 --webui-port 8080
========================================

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/02/21 04:59:50 INFO Master: Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.properties
14/02/21 04:59:50 INFO Master: Starting Spark master at spark://jfp3-1:7077
14/02/21 04:59:51 INFO MasterWebUI: Started Master web UI at http://jfp3-1:8080
14/02/21 04:59:51 INFO Master: I have been elected leader! New state: ALIVE

启动http://jfp3-1:8080上看集群的状况

4.在第2,3,4太机器上启动spark worker

[root@jfp3-2 latest]# ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.0.71:7077
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/02/21 05:05:09 INFO Worker: Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.properties
14/02/21 05:05:09 INFO Worker: Starting Spark worker jfp3-2:53344 with 32 cores, 61.9 GB RAM
14/02/21 05:05:09 INFO Worker: Spark home: /opt/spark/latest
14/02/21 05:05:09 INFO WorkerWebUI: Started Worker web UI at http://jfp3-2:8081
14/02/21 05:05:09 INFO Worker: Connecting to master spark://192.168.0.71:7077...
14/02/21 05:05:30 INFO Worker: Connecting to master spark://192.168.0.71:7077...
14/02/21 05:05:50 INFO Worker: Connecting to master spark://192.168.0.71:7077...
14/02/21 05:06:10 ERROR Worker: All masters are unresponsive! Giving up.

同时在master的日志中也发现错误日志:

14/02/21 05:06:23 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@jfp3-1:7077] -> [akka.tcp://sparkWorker@jfp3-3:53721]: Error [Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: jfp3-3/192.168.0.73:53721
]
14/02/21 05:06:23 INFO Master: akka.tcp://sparkWorker@jfp3-3:53721 got disassociated, removing it.
14/02/21 05:06:23 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@jfp3-1:7077] -> [akka.tcp://sparkWorker@jfp3-3:53721]: Error [Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: jfp3-3/192.168.0.73:53721
]

用IP连spark master出现问题改用hostname:

[root@jfp3-2 latest]# ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://jfp3-1:7077
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/02/21 05:08:41 INFO Worker: Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.properties
14/02/21 05:08:41 INFO Worker: Starting Spark worker jfp3-2:60198 with 32 cores, 61.9 GB RAM
14/02/21 05:08:41 INFO Worker: Spark home: /opt/spark/latest
14/02/21 05:08:41 INFO WorkerWebUI: Started Worker web UI at http://jfp3-2:8081
14/02/21 05:08:41 INFO Worker: Connecting to master spark://jfp3-1:7077...
14/02/21 05:08:41 INFO Worker: Successfully registered with master spark://jfp3-1:7077

5.在spark master界面上查看集群状态,发现多了3个worker

6. 启动HDFS集群

7.进入spark-shell界面:

[root@jfp3-1 latest]# MASTER=spark://jfp3-1:7077 ./bin/spark-shell

计算HDFS上的一个文件包含2144这个字符的行数

scala> val textFile = sc.textFile("hdfs://192.168.0.71/user/shaochen/apsh/20111201/20111201/44-ABIS-APSH-1G-20111201")
14/02/21 10:16:18 INFO MemoryStore: ensureFreeSpace(146579) called with curMem=0, maxMem=308713881
14/02/21 10:16:18 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 143.1 KB, free 294.3 MB)
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12

 

scala> val targetRows = textFile.filter(line => line.contains("2144"))
targetRows: org.apache.spark.rdd.RDD[String] = FilteredRDD[2] at filter at <console>:14

 

scala> targetRows.count()
14/02/21 10:18:27 INFO FileInputFormat: Total input paths to process : 1
14/02/21 10:18:27 INFO SparkContext: Starting job: count at <console>:17
14/02/21 10:18:27 INFO DAGScheduler: Got job 0 (count at <console>:17) with 11 output partitions (allowLocal=false)
14/02/21 10:18:27 INFO DAGScheduler: Final stage: Stage 0 (count at <console>:17)
14/02/21 10:18:27 INFO DAGScheduler: Parents of final stage: List()
14/02/21 10:18:27 INFO DAGScheduler: Missing parents: List()
14/02/21 10:18:27 INFO DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter at <console>:14), which has no missing parents
14/02/21 10:18:27 INFO DAGScheduler: Submitting 11 missing tasks from Stage 0 (FilteredRDD[2] at filter at <console>:14)
14/02/21 10:18:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 11 tasks
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:0 as 1716 bytes in 5 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:1 as 1716 bytes in 1 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on executor 0: jfp3-4 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:2 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:3 as TID 3 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:3 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:4 as TID 4 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:4 as 1716 bytes in 1 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:5 as TID 5 on executor 0: jfp3-4 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:5 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:6 as TID 6 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:6 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:7 as TID 7 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:7 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:8 as TID 8 on executor 0: jfp3-4 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:8 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:9 as TID 9 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:9 as 1716 bytes in 1 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:10 as TID 10 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:10 as 1716 bytes in 1 ms
14/02/21 10:18:30 INFO TaskSetManager: Finished TID 10 in 2850 ms on jfp3-2 (progress: 0/11)
14/02/21 10:18:30 INFO DAGScheduler: Completed ResultTask(0, 10)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 5 in 3188 ms on jfp3-4 (progress: 1/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 5)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 8 in 3188 ms on jfp3-4 (progress: 2/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 8)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 1 in 3237 ms on jfp3-2 (progress: 3/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 1)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 7 in 3234 ms on jfp3-2 (progress: 4/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 7)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 2 in 3269 ms on jfp3-4 (progress: 5/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 2)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 9 in 3300 ms on jfp3-3 (progress: 6/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 9)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 4 in 3362 ms on jfp3-2 (progress: 7/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 4)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 3 in 3423 ms on jfp3-3 (progress: 8/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 3)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 6 in 3439 ms on jfp3-3 (progress: 9/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 6)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 0 in 3458 ms on jfp3-3 (progress: 10/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 0)
14/02/21 10:18:31 INFO TaskSchedulerImpl: Remove TaskSet 0.0 from pool
14/02/21 10:18:31 INFO DAGScheduler: Stage 0 (count at <console>:17) finished in 3.466 s
14/02/21 10:18:31 INFO SparkContext: Job finished: count at <console>:17, took 3.593541623 s
res0: Long = 12129 

附录:

命令脚本集合:

启动master:

/opt/spark/latest/sbin/start-master.sh

启动worker:

/opt/spark/latest/bin/spark-class org.apache.spark.deploy.worker.Worker spark://jfp3-1:7077

在standalone模式下运行yarn 0.9.0对HDFS上的数据进行计算

原文:http://www.cnblogs.com/littlesuccess/p/3559277.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!