Hadoop blocks

时间：2015-04-21 07:13:26 阅读：275 评论：0 收藏：0 [点我收藏+]

一In cases where the last record in a block is incomplete, the input split includes location information for the next block and the byte offset of the data needed to complete the record.

假如我们有一个128M的文本文件，HADOOP blocksize默认是64M，则我们的文件上传上到HDFS需要有两个Blocks来存储，但如果我们第一个block在切分64M的时候，

是切在中间位置，即没有包含行的尾巴，那么使用Textinputformat进行处理的时候，哪个mapper会读到这条信息？

根据这句话的意思，包含行头的mapper所含的inputsplit信息会包含下一个block的信息和需要读取多少来完整读完这一行的偏移量信息。

Hadoop blocks

原文：http://www.cnblogs.com/huaxiaoyao/p/4443266.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)