{"id":1478,"date":"2016-08-29T12:52:47","date_gmt":"2016-08-29T12:52:47","guid":{"rendered":"http:\/\/codethataint.com\/blog\/?p=1478"},"modified":"2016-08-29T16:01:21","modified_gmt":"2016-08-29T16:01:21","slug":"hadoop-listing-files-in-directory","status":"publish","type":"post","link":"https:\/\/codethataint.com\/blog\/hadoop-listing-files-in-directory\/","title":{"rendered":"Hadoop Listing Files in Directory"},"content":{"rendered":"<p><strong>Listing Files in Directory<\/strong><\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\nimport org.apache.hadoop.fs.FileStatus;\r\nimport org.apache.hadoop.fs.FileSystem;\r\nimport org.apache.hadoop.fs.FileUtil;\r\nimport org.apache.hadoop.fs.Path;\r\n\r\npublic class MapReduceDriver extends Configured implements Tool\r\n{\r\n   public static void main(String&#x5B;] args) throws Exception \r\n   {\r\n\tMapReduceDriver objMapReduceDriver = new MapReduceDriver();\r\n\t\t\r\n\tConfiguration conf = new Configuration();\r\n\t\t\r\n\tFileSystem fs = FileSystem.get(conf);\r\n\tPath path = new Path(args&#x5B;0]);\r\n\t\t\r\n\tFileStatus&#x5B;] status = fs.listStatus(path);\r\n\tPath&#x5B;] paths = FileUtil.stat2Paths(status);\r\n\t\t\r\n\tfor (Path path2 : paths) \r\n        {\r\n\t  System.out.println(path2.toString());\r\n\t}\r\n\t\t\r\n\tint res = ToolRunner.run(objMapReduceDriver, args);\r\n\tSystem.exit(res);\r\n   }\r\n<\/pre>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\nPath path = new Path(args&#x5B;0]);\r\nFileStatus&#x5B;] status = fs.listStatus(path);\r\nPath&#x5B;] paths = FileUtil.stat2Paths(status);\r\n\t\t\r\nfor(Path path2 : paths) \r\n  csvPaths = String.join(&quot;,&quot;, path2.toString());\r\n\r\nFileInputFormat.setInputPaths(objJob, csvPaths);\r\n<\/pre>\n<p><strong>Merging Files in a Folder<\/strong><br \/>\ncopyMerge &#8211; Parameters<\/p>\n<ol>\n<li>FileSystem Object<\/li>\n<li>Input Path<\/li>\n<li>FileSystem Object<\/li>\n<li>Output Path<\/li>\n<li>Delete Orginal File<\/li>\n<li>null<\/li>\n<\/ol>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\nConfiguration conf = new Configuration();\r\nFileSystem fs = FileSystem.get(conf);\r\n\r\nPath inputPath = new Path(args&#x5B;0]);\r\nPath outPath = new Path(args&#x5B;2]);\r\n\t\t\r\nboolean Merge = FileUtil.copyMerge(fs, inputPath, fs, outPath, false, conf, null);\r\n\t\t\r\nif(Merge)\r\n  System.out.println(&quot;Merge Successful&quot;);\r\n\t\t\r\n<\/pre>\n<p><strong>globStatus takes patterns<\/strong><\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\nPath path = new Path(args&#x5B;0] + &quot;\/Inputs\/Input*&quot;);\r\nFileStatus&#x5B;] status = fs.globStatus(path);\r\n<\/pre>\n<p><strong>Merging Multiple Paths <\/strong><\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\n import org.apache.commons.lang.StringUtils;\r\n \r\n csvPaths = StringUtils.join(paths,&quot;,&quot;);\r\n String&#x5B;] arrcsvPaths = csvPaths.split(&quot;,&quot;);\r\n\r\n for (int i = 0; i &lt; arrcsvPaths.length; i++) \r\n  FileInputFormat.setInputPaths(objJob, arrcsvPaths&#x5B;i]);\t\r\n<\/pre>\n<p><strong>Passing Arguments in Command Context and Fetching It <\/strong><\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\nString filterWords =  context.getConfiguration().get(&quot;Word.Name&quot;);\r\n\t\t\t\t\r\nfor (int i = 0; i &lt; arrString.length; i++) \r\n{\t\r\n  if(filterWords.equals(arrString&#x5B;i].toString()))\r\n    context.write(new Text(arrString&#x5B;i].toString()), new IntWritable(1));\r\n}\r\n<\/pre>\n<p><strong>Input<\/strong><\/p>\n<pre>\r\n -DWord.Name=Tests \/home\/turbo\/workspace\/MapReduce5\/src\/Inputs\/Inputs[1-2] \/home\/turbo\/workspace\/MapReduce5\/src\/Outputs\/\r\n<\/pre>\n<p>Word.Name &#8211; is the parameter passed in Command Line.The Parameters should always passed as First Value.<br \/>\n<em><br \/>\nThe argument removes the parameter once the call to main method is over. So the args.length is 3 in main() and 2 in run method()<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Listing Files in Directory import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileUtil; import org.apache.hadoop.fs.Path; public class MapReduceDriver extends Configured implements Tool { public static void main(String&#x5B;] args) throws Exception { MapReduceDriver objMapReduceDriver = new MapReduceDriver(); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path path = new Path(args&#x5B;0]); FileStatus&#x5B;] status = fs.listStatus(path); Path&#x5B;] paths = FileUtil.stat2Paths(status);&hellip; <a href=\"https:\/\/codethataint.com\/blog\/hadoop-listing-files-in-directory\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[150],"tags":[],"class_list":["post-1478","post","type-post","status-publish","format-standard","hentry","category-map-reduce"],"_links":{"self":[{"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/posts\/1478","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/comments?post=1478"}],"version-history":[{"count":8,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/posts\/1478\/revisions"}],"predecessor-version":[{"id":1486,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/posts\/1478\/revisions\/1486"}],"wp:attachment":[{"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/media?parent=1478"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/categories?post=1478"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/codethataint.com\/blog\/wp-json\/wp\/v2\/tags?post=1478"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}