By: pradeep patel

pradeep patel — Tue, 02 Aug 2016 17:27:15 +0000

Hi Satyam ,

Thanks for the quick reply!!!

I have tried with changing path as ‘flume’ but no luck. pl check below pig command.

load_tweets = LOAD ‘flume’ USING com.twitter.elephantbird.pig.load.JsonLoader(‘-nestedLoad’) AS myMap;

By: Satyam

Satyam — Tue, 02 Aug 2016 14:38:33 +0000

Hi Pradeep, While checking the files in hdfs you have given the command hadoop fs -ls flume so according to this command the path is just flume. But while loading the tweets using pig script you have given the path as /user/cloudera/flume. Both the paths are different therefore you are not getting any output. So please change the input file path in the pig script as 'flume'.

By: pradeep patel

pradeep patel — Tue, 02 Aug 2016 13:40:56 +0000

Thanks for sharing nice post!!!
I follow the above steps and i was able to fetch the data from twitter but when i am going to analysis the data using pig as mentioned in this post so i am not getting any data by using dump command. steps which i followed are:

1) Fetched data from twitter stored in hdfs:
[cloudera@localhost conf]$ hadoop fs -ls flume
Found 9 items
-rw-r–r– 3 cloudera cloudera 179237 2016-08-02 05:36 flume/FlumeData.1470141370000
-rw-r–r– 3 cloudera cloudera 66274 2016-08-02 05:36 flume/FlumeData.1470141370001
-rw-r–r– 3 cloudera cloudera 66497 2016-08-02 05:36 flume/FlumeData.1470141370002
-rw-r–r– 3 cloudera cloudera 83746 2016-08-02 05:36 flume/FlumeData.1470141370003
-rw-r–r– 3 cloudera cloudera 65313 2016-08-02 05:36 flume/FlumeData.1470141370004
-rw-r–r– 3 cloudera cloudera 84880 2016-08-02 05:36 flume/FlumeData.1470141370005
-rw-r–r– 3 cloudera cloudera 71532 2016-08-02 05:36 flume/FlumeData.1470141370006
-rw-r–r– 3 cloudera cloudera 68419 2016-08-02 05:36 flume/FlumeData.1470141370007
-rw-r–r– 3 cloudera cloudera 64983 2016-08-02 05:36 flume/FlumeData.1470141370008

2) in pig grunt shell executed below commands:

Practice:

load_tweets = LOAD ‘/user/cloudera/flume’ USING com.twitter.elephantbird.pig.load.JsonLoader(‘-nestedLoad’) AS myMap;

dump load_tweets;
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.7.0 0.11.0-cdh4.7.0 cloudera 2016-08-02 06:17:36 2016-08-02 06:18:23 UNKNOWN

Success!

Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_201608020416_0003 1 0 13 13 13 13 0 0 0 0 load_tweets MAP_ONLY hdfs://localhost.localdomain:8020/tmp/temp-128798977/tmp-1279927259,

Input(s):
Successfully read 0 records (752029 bytes) from: “/user/cloudera/flume”

Output(s):
Successfully stored 0 records in: “hdfs://localhost.localdomain:8020/tmp/temp-128798977/tmp-1279927259”

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201608020416_0003

SO here dump command is executed successfully but its not showing any data…. i don’t know, what went wrong? …can you please help me out.

Comments on: Determining Popular Hashtags in Twitter Using Pig

By: pradeep patel

By: Satyam

By: pradeep patel