Ubuntu上使用Hadoop 2.x 九 HDFS cluster拓扑管理

时间：2014-03-12 00:32:54 阅读：394 评论：0 收藏：0 [点我收藏+]

什么是Rack Awareness？

考虑大型的hadoop集群，为了保证datanode的冗余备份的可靠性，多个datanode应该放在在不同的机架，但是放在不同的机架上，也就意味着网络传输要穿过路由器，速度肯定没有一个机架中的datanode server之间传递来的快，因此性能有所影响。比较推荐的做法（之前在MongoDB相关文档中也看到）是，将两个datanode servers放在同一个机架，第三个datanode server放置在另一个机架上，如果有多个数据中心，这第三个要放在另一个数据中心。

hadoop应该通过配置信息清楚的知道datanode servers的拓扑结构，然后聪明的做到兼顾性能和可靠性。在读取的时候，尽量在同一个数据中心的同一个机架内读取，而写入时要尽可能的将一份数据的三份拷贝做如下安排，两份写入同一个数据中心同一机架的datanode servers中，第三份写入另一个数据中心的某机架的datanode server中。

如何设置拓扑信息？

因此hadoop需要知道datanode的拓扑结构，即每台datanode server所在的data center和rack id.

首先准备一个脚本文件，可以接受输入的IP地址，然后用.分割，将第二和第三段取出，第二段作为data center的id，第三段作为rack id。

#!/bin/bash
# Set rack id based on IP address.
# Assumes network administrator has complete control
# over IP addresses assigned to nodes and they are
# in the 10.x.y.z address space. Assumes that
# IP addresses are distributed hierarchically. e.g.,
# 10.1.y.z is one data center segment and 10.2.y.z is another;
# 10.1.1.z is one rack, 10.1.2.z is another rack in
# the same segment, etc.)
#
# This is invoked with an IP address as its only argument

# get IP address from the input
ipaddr=$1

# select “x.y” and convert it to “x/y”
segments=`echo $ipaddr | cut -f 2,3 -d ‘.‘ --output-delimiter=/`
echo /${segments}

运行结果如下：

dean@dean-ubuntu:~$ ./rack-awareness.sh 192.168.1.10
/168/1
dean@dean-ubuntu:~$ ./rack-awareness.sh 192.167.1.10
/167/1

该脚本来自下面的第一篇参考文章，有点bug，我将$0改为了$1即可。该脚本会被hadoop调用，接受IP地址作为参数，最后返回datacenter id和rack id组成的拓扑路径，就是类似"/167/1"的字符串。主要理解了cut命令后就很简单了。

这里我自己用newlisp实现了同样功能的脚本：

#!/usr/bin/newlisp

(set ‘ip (main-args 2))
(set ‘ip-list (parse ip "."))
(set ‘r (format "/%s/%s" (ip-list 1) (ip-list 2)))
(println r)
(exit)

这个脚本文件是需要设置给hadoop调用的，

需要设置core-site.xml文件，官方手册：http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/core-default.xml
注意，如果data center的IP地址不是按照如上规则，则该脚本是需要修改的。因此不能用于所有情况。

参考文章：

http://bigdataprocessing.wordpress.com/2013/07/30/hadoop-rack-awareness-and-configuration/

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Rack_Awareness

https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf

未完，待续...

Ubuntu上使用Hadoop 2.x 九 HDFS cluster拓扑管理,布布扣,bubuko.com

Ubuntu上使用Hadoop 2.x 九 HDFS cluster拓扑管理

原文：http://blog.csdn.net/csfreebird/article/details/20920071

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)