Spring data - hadoop job submission

I'm trying out Spring Data - Hadoop for executing the MR code on a remote cluster from my local machine's IDE.

My bean configuration file viz. applicationContext.xml is as follows :

Code:

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 

    xmlns:hdp="http://www.springframework.org/schema/hadoop"

    xmlns:context="http://www.springframework.org/schema/context"

    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd 

    http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd

    http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.2.xsd">



    <context:property-placeholder location="resources/hadoop.properties" />



    <hdp:configuration>

        fs.default.name=${hd.fs}

    </hdp:configuration>



    <hdp:job id="wc-job" mapper="com.hadoop.basics.WordCounter.WCMapper"

        reducer="com.hadoop.basics.WordCounter.WCReducer"

        input-path="${wordcount.input.path}"

        output-path="${wordcount.output.path}"

        user="bigdata">  

    </hdp:job>



    <hdp:job-runner id="myjobs-runner" job-ref="wc-job" run-at-startup="true"/>

    <hdp:resource-loader id="resourceLoader" uri="${hd.fs}"

    user="bigdata" />





</beans>

hadoop.properties

Code:

hd.fs=hdfs://cloudx-843-770:9000

wordcount.input.path=/scratchpad/input/Childhood_days.txt

wordcount.output.path=/scratchpad/output

The java class which I'm doing 'Run as ...'

Code:

package com.hadoop.basics;



import java.io.IOException;

import java.util.StringTokenizer;



import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.springframework.context.support.AbstractApplicationContext;

import org.springframework.context.support.ClassPathXmlApplicationContext;



public class WordCounter {



    private static IntWritable one = new IntWritable(1);



    public class WCMapper extends Mapper<Text, Text, Text, IntWritable> {



        @Override

        protected void map(

                Text key,

                Text value,

                org.apache.hadoop.mapreduce.Mapper<Text, Text, Text, IntWritable>.Context context)

                throws IOException, InterruptedException {

            // TODO Auto-generated method stub

            StringTokenizer strTokenizer = new StringTokenizer(value.toString());

            Text token = new Text();



            while (strTokenizer.hasMoreTokens()) {

                token.set(strTokenizer.nextToken());

                context.write(token, one);

            }



        }

    }



    public class WCReducer extends

            Reducer<Text, IntWritable, Text, IntWritable> {

        @Override

        protected void reduce(

                Text key,

                Iterable<IntWritable> values,

                org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>.Context context)

                throws IOException, InterruptedException {

            // TODO Auto-generated method stub



            int sum = 0;



            for (IntWritable value : values) {

                sum += value.get();

            }



            context.write(key, new IntWritable(sum));

        }

    }



    public static void main(String[] args) {

        AbstractApplicationContext context = new ClassPathXmlApplicationContext(

                "applicationContext.xml", WordCounter.class);

        System.out.println("Word Count Application Running");

        context.registerShutdownHook();

    }

}

The (truncated)output is :

Code:

Aug 21, 2013 6:36:13 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh

INFO: Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@1815338: startup date [Wed Aug 21 18:36:13 IST 2013]; root of context hierarchy

Aug 21, 2013 6:36:13 PM org.springframework.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitions

INFO: Loading XML bean definitions from class path resource [com/hadoop/basics/applicationContext.xml]

Aug 21, 2013 6:36:13 PM org.springframework.core.io.support.PropertiesLoaderSupport loadProperties

INFO: Loading properties file from class path resource [resources/hadoop.properties]

Aug 21, 2013 6:36:13 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons

INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@7c197e: defining beans [org.springframework.context.support.PropertySourcesPlaceholderConfigurer#0,hadoopConfiguration,wc-job,myjobs-runner,resourceLoader]; root of factory hierarchy

Aug 21, 2013 6:36:14 PM org.springframework.data.hadoop.mapreduce.JobExecutor$2 run

INFO: Starting job [wc-job]

Aug 21, 2013 6:36:14 PM org.apache.hadoop.util.NativeCodeLoader <clinit>

WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Aug 21, 2013 6:36:14 PM org.apache.hadoop.security.UserGroupInformation doAs

SEVERE: PriviledgedActionException as:bigdata via 298790 cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-298790\mapred\staging\bigdata364464136\.staging to 0700

Aug 21, 2013 6:36:14 PM org.springframework.data.hadoop.mapreduce.JobExecutor$2 run

WARNING: Cannot start job [wc-job]

java.io.IOException: Failed to set permissions of path: \tmp\hadoop-298790\mapred\staging\bigdata364464136\.staging to 0700

As is obvious, the user 298790(my local Windows machine user) is not recognized on the cluster - that's why in the config. file

1I specified user="bigdata" in the job's configuration as mentioned in the doc.

2. The doc. also mentions :

Quote:

SHDP obeys the HDFS permissions, using the identity of the current user (by default) for interacting with the file system. In particular, the HdfsResourceLoader considers when doing pattern matching, only the files that its suppose to see and does not perform any privileged action. It is possible however to specify a different user, meaning the ResourceLoader interacts with HDFS using that user's rights - however this obeys the user impersonation rules

As per the api, I decided to use HdfsResourceLoader but couldn't find any example or even configuration in the documentation - can anyone provide any pointers?