学习资料:wiki,官网
Apache Hadoop版本分为两代,我们将第一代Hadoop称为Hadoop 1.0,第二代Hadoop称为Hadoop2.0。第一代Hadoop包含三个大版本,分别是0.20.x,0.21.x和0.22.x,其 中,0.20.x最后演化成1.0.x,变成了稳定版。第二代Hadoop包含两个版本,分别是0.23.x和2.x,它们完全不同于Hadoop 1.0,是一套全新的架构,均包含HDFS Federation和YARN两个系统,相比于0.23.x,2.x增加了NameNode HA和Wire-compatibility两个重大特性。
hadoop2.7.2伪分布式安装
环境
在Hadoop安装过程中需要关闭防火墙和SElinux,否则会出现异常。
软件安装要求:
1.java
2.ssh---sshd必须运行,使得能够用hadoop脚本管理hadoop节点。
$ sudo apt-get install ssh
$ sudo apt-get install rsync
hadoop2.7.2伪分布式安装
1.edit the file etc/hadoop/hadoop-env.sh
# set to the root of your Java installation export
JAVA_HOME=/usr/java/latest
2.etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
3.etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
Setup passphraseless ssh Now check that you can ssh to the localhost without a passphrase: $ ssh localhost If you cannot ssh to localhost without a passphrase, execute the following commands: $ ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys
执行
Execution The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node. Format the filesystem: $ bin/hdfs namenode -format Start NameNode daemon and DataNode daemon: $ sbin/start-dfs.sh The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs). Browse the web interface for the NameNode; by default it is available at: NameNode - http://localhost:50070/ Make the HDFS directories required to execute MapReduce jobs: $ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/<username> Copy the input files into the distributed filesystem: $ bin/hdfs dfs -put etc/hadoop input Run some of the examples provided: $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output ‘dfs[a-z.]+‘ Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them: $ bin/hdfs dfs -get output output $ cat output/* or View the output files on the distributed filesystem: $ bin/hdfs dfs -cat output/* When you’re done, stop the daemons with: $ sbin/stop-dfs.sh
YARN
YARN on a Single Node You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition. The following instructions assume that 1. ~ 4. steps of the above instructions are already executed. Configure parameters as follows:etc/hadoop/mapred-site.xml: <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> etc/hadoop/yarn-site.xml: <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> Start ResourceManager daemon and NodeManager daemon: $ sbin/start-yarn.sh Browse the web interface for the ResourceManager; by default it is available at: ResourceManager - http://localhost:8088/ Run a MapReduce job. When you’re done, stop the daemons with: $ sbin/stop-yarn.sh
伪分布式环境搭建成功!
原文:http://www.cnblogs.com/flyingbee6/p/5204363.html