Comments on: Determining Popular Hashtags in Twitter Using Pig https://acadgild.com/blog/determining-popular-hashtags-in-twitter-using-pig/ Learn. Do. Earn. Thu, 25 Aug 2016 12:12:23 +0000 hourly 1 https://wordpress.org/?v=4.5.3 By: pradeep patel https://acadgild.com/blog/determining-popular-hashtags-in-twitter-using-pig/#comment-3326 Tue, 02 Aug 2016 17:27:15 +0000 https://acadgild.com/blog/?p=1328#comment-3326 Hi Satyam ,

Thanks for the quick reply!!!

I have tried with changing path as ‘flume’ but no luck. pl check below pig command.

load_tweets = LOAD ‘flume’ USING com.twitter.elephantbird.pig.load.JsonLoader(‘-nestedLoad’) AS myMap;

]]>
By: Satyam https://acadgild.com/blog/determining-popular-hashtags-in-twitter-using-pig/#comment-3325 Tue, 02 Aug 2016 14:38:33 +0000 https://acadgild.com/blog/?p=1328#comment-3325 Hi Pradeep,

While checking the files in hdfs you have given the command hadoop fs -ls flume so according to this command the path is just flume. But while loading the tweets using pig script you have given the path as /user/cloudera/flume. Both the paths are different therefore you are not getting any output. So please change the input file path in the pig script as ‘flume’.

]]>
By: pradeep patel https://acadgild.com/blog/determining-popular-hashtags-in-twitter-using-pig/#comment-3324 Tue, 02 Aug 2016 13:40:56 +0000 https://acadgild.com/blog/?p=1328#comment-3324 Thanks for sharing nice post!!!
I follow the above steps and i was able to fetch the data from twitter but when i am going to analysis the data using pig as mentioned in this post so i am not getting any data by using dump command. steps which i followed are:

1) Fetched data from twitter stored in hdfs:
[cloudera@localhost conf]$ hadoop fs -ls flume
Found 9 items
-rw-r–r– 3 cloudera cloudera 179237 2016-08-02 05:36 flume/FlumeData.1470141370000
-rw-r–r– 3 cloudera cloudera 66274 2016-08-02 05:36 flume/FlumeData.1470141370001
-rw-r–r– 3 cloudera cloudera 66497 2016-08-02 05:36 flume/FlumeData.1470141370002
-rw-r–r– 3 cloudera cloudera 83746 2016-08-02 05:36 flume/FlumeData.1470141370003
-rw-r–r– 3 cloudera cloudera 65313 2016-08-02 05:36 flume/FlumeData.1470141370004
-rw-r–r– 3 cloudera cloudera 84880 2016-08-02 05:36 flume/FlumeData.1470141370005
-rw-r–r– 3 cloudera cloudera 71532 2016-08-02 05:36 flume/FlumeData.1470141370006
-rw-r–r– 3 cloudera cloudera 68419 2016-08-02 05:36 flume/FlumeData.1470141370007
-rw-r–r– 3 cloudera cloudera 64983 2016-08-02 05:36 flume/FlumeData.1470141370008

2) in pig grunt shell executed below commands:

Practice:

register /home/cloudera/Desktop/Jars/elephant-bird-hadoop-compat-4.1.jar;

register /home/cloudera/Desktop/Jars/elephant-bird-pig-4.1.jar;

register /home/cloudera/Desktop/Jars/json-simple-1.1.1.jar;

load_tweets = LOAD ‘/user/cloudera/flume’ USING com.twitter.elephantbird.pig.load.JsonLoader(‘-nestedLoad’) AS myMap;

dump load_tweets;
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.7.0 0.11.0-cdh4.7.0 cloudera 2016-08-02 06:17:36 2016-08-02 06:18:23 UNKNOWN

Success!

Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_201608020416_0003 1 0 13 13 13 13 0 0 0 0 load_tweets MAP_ONLY hdfs://localhost.localdomain:8020/tmp/temp-128798977/tmp-1279927259,

Input(s):
Successfully read 0 records (752029 bytes) from: “/user/cloudera/flume”

Output(s):
Successfully stored 0 records in: “hdfs://localhost.localdomain:8020/tmp/temp-128798977/tmp-1279927259”

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201608020416_0003

SO here dump command is executed successfully but its not showing any data…. i don’t know, what went wrong? …can you please help me out.

]]>