Quantcast
Channel: Spring Community Forums - Hadoop
Viewing all articles
Browse latest Browse all 27

Spring data - hadoop job submission

$
0
0
I'm trying out Spring Data - Hadoop for executing the MR code on a remote cluster from my local machine's IDE.

My bean configuration file viz. applicationContext.xml is as follows :

Code:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:hdp="http://www.springframework.org/schema/hadoop"
    xmlns:context="http://www.springframework.org/schema/context"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
    http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
    http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.2.xsd">

    <context:property-placeholder location="resources/hadoop.properties" />

    <hdp:configuration>
        fs.default.name=${hd.fs}
    </hdp:configuration>

    <hdp:job id="wc-job" mapper="com.hadoop.basics.WordCounter.WCMapper"
        reducer="com.hadoop.basics.WordCounter.WCReducer"
        input-path="${wordcount.input.path}"
        output-path="${wordcount.output.path}"
        user="bigdata"> 
    </hdp:job>

    <hdp:job-runner id="myjobs-runner" job-ref="wc-job" run-at-startup="true"/>
    <hdp:resource-loader id="resourceLoader" uri="${hd.fs}"
    user="bigdata" />


</beans>

hadoop.properties

Code:

hd.fs=hdfs://cloudx-843-770:9000
wordcount.input.path=/scratchpad/input/Childhood_days.txt
wordcount.output.path=/scratchpad/output

The java class which I'm doing 'Run as ...'

Code:

package com.hadoop.basics;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.springframework.context.support.AbstractApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

public class WordCounter {

    private static IntWritable one = new IntWritable(1);

    public class WCMapper extends Mapper<Text, Text, Text, IntWritable> {

        @Override
        protected void map(
                Text key,
                Text value,
                org.apache.hadoop.mapreduce.Mapper<Text, Text, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            // TODO Auto-generated method stub
            StringTokenizer strTokenizer = new StringTokenizer(value.toString());
            Text token = new Text();

            while (strTokenizer.hasMoreTokens()) {
                token.set(strTokenizer.nextToken());
                context.write(token, one);
            }

        }
    }

    public class WCReducer extends
            Reducer<Text, IntWritable, Text, IntWritable> {
        @Override
        protected void reduce(
                Text key,
                Iterable<IntWritable> values,
                org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            // TODO Auto-generated method stub

            int sum = 0;

            for (IntWritable value : values) {
                sum += value.get();
            }

            context.write(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) {
        AbstractApplicationContext context = new ClassPathXmlApplicationContext(
                "applicationContext.xml", WordCounter.class);
        System.out.println("Word Count Application Running");
        context.registerShutdownHook();
    }
}

The (truncated)output is :

Code:

Aug 21, 2013 6:36:13 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@1815338: startup date [Wed Aug 21 18:36:13 IST 2013]; root of context hierarchy
Aug 21, 2013 6:36:13 PM org.springframework.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitions
INFO: Loading XML bean definitions from class path resource [com/hadoop/basics/applicationContext.xml]
Aug 21, 2013 6:36:13 PM org.springframework.core.io.support.PropertiesLoaderSupport loadProperties
INFO: Loading properties file from class path resource [resources/hadoop.properties]
Aug 21, 2013 6:36:13 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@7c197e: defining beans [org.springframework.context.support.PropertySourcesPlaceholderConfigurer#0,hadoopConfiguration,wc-job,myjobs-runner,resourceLoader]; root of factory hierarchy
Aug 21, 2013 6:36:14 PM org.springframework.data.hadoop.mapreduce.JobExecutor$2 run
INFO: Starting job [wc-job]
Aug 21, 2013 6:36:14 PM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Aug 21, 2013 6:36:14 PM org.apache.hadoop.security.UserGroupInformation doAs
SEVERE: PriviledgedActionException as:bigdata via 298790 cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-298790\mapred\staging\bigdata364464136\.staging to 0700
Aug 21, 2013 6:36:14 PM org.springframework.data.hadoop.mapreduce.JobExecutor$2 run
WARNING: Cannot start job [wc-job]
java.io.IOException: Failed to set permissions of path: \tmp\hadoop-298790\mapred\staging\bigdata364464136\.staging to 0700

As is obvious, the user 298790(my local Windows machine user) is not recognized on the cluster - that's why in the config. file

  1. 1I specified user="bigdata" in the job's configuration as mentioned in the doc.

  1. 2. The doc. also mentions :


Quote:

SHDP obeys the HDFS permissions, using the identity of the current user (by default) for interacting with the file system. In particular, the HdfsResourceLoader considers when doing pattern matching, only the files that its suppose to see and does not perform any privileged action. It is possible however to specify a different user, meaning the ResourceLoader interacts with HDFS using that user's rights - however this obeys the user impersonation rules
As per the api, I decided to use HdfsResourceLoader but couldn't find any example or even configuration in the documentation - can anyone provide any pointers?

Viewing all articles
Browse latest Browse all 27

Trending Articles