{"id":1397,"date":"2016-07-28T05:01:26","date_gmt":"2016-07-28T05:01:26","guid":{"rendered":"http:\/\/codethataint.com\/blog\/?p=1397"},"modified":"2016-08-06T07:11:09","modified_gmt":"2016-08-06T07:11:09","slug":"pig-question-and-answers","status":"publish","type":"post","link":"https:\/\/codethataint.com\/blog\/pig-question-and-answers\/","title":{"rendered":"Pig Question and Answers"},"content":{"rendered":"<p><strong>Loading CSV File using Pig Script<\/strong><\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nA = LOAD '...' USING PigStorage(',') AS (...);\r\n<\/pre>\n<p><strong>Filtering NULL values in chararray<\/strong><br \/>\n<strong>Example1: null as chararray<\/strong><br \/>\ninput.txt<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n1,2014-04-08 12:09:23.0\r\n2,2014-04-08 12:09:23.0\r\n3,null\r\n4,null\r\n<\/pre>\n<p><strong>Pig:<\/strong><\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nA = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);\r\nB = FILTER A BY f2!='null';\r\nDUMP B;\r\n<\/pre>\n<p><strong>Example2: Real null value<\/strong><br \/>\ninput.txt <\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n1,2014-04-08 12:09:23.0\r\n2,2014-04-08 12:09:23.0\r\n3,\r\n4,\r\n<\/pre>\n<p><strong>Pig:<\/strong><\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nA = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);\r\nB = FILTER A BY f2 is not null;\r\nDUMP B;\r\n<\/pre>\n<p><strong>Output:<\/strong><\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n(1,2014-04-08 12:09:23.0)\r\n(2,2014-04-08 12:09:23.0)\r\n<\/pre>\n<p><strong>Finding Max from CSV File<\/strong><\/p>\n<p><strong>Test.csv<\/strong><\/p>\n<pre>\r\nMaruthi,10\r\nMaruthi,55\r\nSuziki,50\r\nHonda,4\r\nMaruthi,40\r\nSuziki,60\r\nHonda,14\r\nBMW,140\r\nBenz,5\r\n<\/pre>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\na1 = LOAD 'Test.csv' USING PigStorage(',') AS (Car:chararray, No:int);\r\nDESCRIBE a1;\r\n<\/pre>\n<p><em>Output<\/em><\/p>\n<pre>\r\n a1: {Car: chararray,No: int}\r\n<\/pre>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nb1 = GROUP a1 BY Car;\r\nDESCRIBE b1;\r\n<\/pre>\n<pre>\r\n b1: {group: chararray,a1: {(Car: chararray,No: int)}}\r\n<\/pre>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nDUMP b1;\r\n<\/pre>\n<p><em>Output<\/em><\/p>\n<pre>\r\n(BMW,{(BMW,140)})\r\n(Benz,{(Benz,5)})\r\n(Honda,{(Honda,4),(Honda,14)})\r\n(Suziki,{(Suziki,50),(Suziki,60)})\r\n(Maruthi,{(Maruthi,10),(Maruthi,55),(Maruthi,40)})\r\n(,{(,)})\r\n<\/pre>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n c1 = FOREACH b1 GENERATE group, MAX(a1.No);\r\n<\/pre>\n<p><em>Output<\/em><\/p>\n<pre>\r\n(BMW,140)\r\n(Benz,5)\r\n(Honda,14)\r\n(Suziki,60)\r\n(Maruthi,55)\r\n<\/pre>\n<p><strong>Filtering Empty Records<\/strong><br \/>\n<em>Corrpted Record tsv content<\/em><\/p>\n<pre>\r\nHouseHold,Soap,2\r\nKitchen,Oil,2\r\nHouseHold,Sweeper,2\r\nPoojaItems,Sandal\r\nKitchen,Rice,30\r\nHouseHold,,1\r\nKitchen,Sugar,5\r\nHouseHold,Shampoo,2\r\nPoojaItems,Champor,10\r\nHouseHold,Soap,2\r\n<\/pre>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nfiltered_records = FILTER records BY Item is null OR No is null;\r\n<\/pre>\n<p><strong>Getting Count of Corrupted Records<\/strong><\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nrecords = LOAD 'Test.csv' USING PigStorage(',') AS (Type:chararray, Item:chararray, No:int);\r\nfiltered_records = FILTER records BY Item is null OR No is null;\r\ngrouped_records = GROUP filtered_records BY Type;\r\nDESCRIBE grouped_records;\r\n<\/pre>\n<pre>\r\n grouped_records: {group: chararray,filtered_records: {(Type: chararray,Item: chararray,No: int)}} \r\n<\/pre>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\ncorrupt_records = FOREACH grouped_records GENERATE group , COUNT(filtered_records);\r\n<\/pre>\n<pre>\r\n(HouseHold,1)\r\n(PoojaItems,1)\r\n<\/pre>\n<p><strong>Writing macros to find Maximum Item Sold<\/strong><\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nDEFINE max_item_sold(Records, Type, No) RETURNS c \r\n{ \r\n b = GROUP $Records BY $Type;                        \r\n $c = FOREACH b GENERATE group, MAX($Records.$No);  \r\n}; \r\n<\/pre>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n max_type_sold = max_item_sold(records, Type, No);\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Loading CSV File using Pig Script A = LOAD &#8216;&#8230;&#8217; USING PigStorage(&#8216;,&#8217;) AS (&#8230;); Filtering NULL values in chararray Example1: null as chararray input.txt 1,2014-04-08 12:09:23.0 2,2014-04-08 12:09:23.0 3,null 4,null Pig: A = LOAD &#8216;input.txt&#8217; USING PigStorage(&#8216;,&#8217;) AS (f1:int,f2:chararray); B = FILTER A BY f2!=&#8217;null&#8217;; DUMP B; Example2: Real null value input.txt 1,2014-04-08 12:09:23.0 2,2014-04-08&hellip; <a href=\"https:\/\/codethataint.com\/blog\/pig-question-and-answers\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[],"class_list":["post-1397","post","type-post","status-publish","format-standard","hentry","category-pig"],"_links":{"self":[{"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/posts\/1397","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/comments?post=1397"}],"version-history":[{"count":9,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/posts\/1397\/revisions"}],"predecessor-version":[{"id":1427,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/posts\/1397\/revisions\/1427"}],"wp:attachment":[{"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/media?parent=1397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/categories?post=1397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/tags?post=1397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}