——安装hadoop参考这篇blog:
http://www.cnblogs.com/lanxuezaipiao/p/3525554.html?__=1a36
——从数据库里面拿数据,因为没有找到PigStorage以多种分隔符分隔的方法,只好先从数据库里用sql先筛选好。
mysql -u greenwave -p green4irvine
show tables;
use gwr;
select * from AccountStats into outfile ‘/tmp/test.csv‘;
select * from AccountStats where StatsType like "EON_SH.heatinghours" into outfile ‘/tmp/heatinghours.csv‘ fields terminated by ‘,‘;
select * from AccountStats where StatsType like "EON_SH.hotwaterhours" into outfile ‘/tmp/hotwaterhours.csv‘ fields terminated by ‘,‘;
——把产生的csv移动到本地的命令
ssh -i ~/.ssh/eon-dev.rsa root@54.75.233.199
scp -i ~/.ssh/eon-dev.rsa root@54.75.233.199:/tmp/test.csv .
——把筛选好的csv 文件放到hadoop的hdfs系统中。(hdfs系统和操作系统类似,只不过是一个虚拟系统)
hdfs系统文件操作命令如下:
http://blog.csdn.net/bigdatahappy/article/details/10068881
http://hadoop.apache.org/docs/r1.0.4/cn/hdfs_shell.html
原文:http://www.cnblogs.com/dataclimber/p/4882913.html