I'm trying out Spring Data - Hadoop for executing the MR code on a remote cluster from my local machine's IDE.
My bean configuration file viz. applicationContext.xml is as follows :
hadoop.properties
The java class which I'm doing 'Run as ...'
The (truncated)output is :
As is obvious, the user 298790(my local Windows machine user) is not recognized on the cluster - that's why in the config. file
SHDP obeys the HDFS permissions, using the identity of the current user (by default) for interacting with the file system. In particular, the HdfsResourceLoader considers when doing pattern matching, only the files that its suppose to see and does not perform any privileged action. It is possible however to specify a different user, meaning the ResourceLoader interacts with HDFS using that user's rights - however this obeys the user impersonation rules
As per the api, I decided to use HdfsResourceLoader but couldn't find any example or even configuration in the documentation - can anyone provide any pointers?
My bean configuration file viz. applicationContext.xml is as follows :
Code:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.2.xsd">
<context:property-placeholder location="resources/hadoop.properties" />
<hdp:configuration>
fs.default.name=${hd.fs}
</hdp:configuration>
<hdp:job id="wc-job" mapper="com.hadoop.basics.WordCounter.WCMapper"
reducer="com.hadoop.basics.WordCounter.WCReducer"
input-path="${wordcount.input.path}"
output-path="${wordcount.output.path}"
user="bigdata">
</hdp:job>
<hdp:job-runner id="myjobs-runner" job-ref="wc-job" run-at-startup="true"/>
<hdp:resource-loader id="resourceLoader" uri="${hd.fs}"
user="bigdata" />
</beans>
Code:
hd.fs=hdfs://cloudx-843-770:9000
wordcount.input.path=/scratchpad/input/Childhood_days.txt
wordcount.output.path=/scratchpad/output
Code:
package com.hadoop.basics;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.springframework.context.support.AbstractApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
public class WordCounter {
private static IntWritable one = new IntWritable(1);
public class WCMapper extends Mapper<Text, Text, Text, IntWritable> {
@Override
protected void map(
Text key,
Text value,
org.apache.hadoop.mapreduce.Mapper<Text, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
StringTokenizer strTokenizer = new StringTokenizer(value.toString());
Text token = new Text();
while (strTokenizer.hasMoreTokens()) {
token.set(strTokenizer.nextToken());
context.write(token, one);
}
}
}
public class WCReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(
Text key,
Iterable<IntWritable> values,
org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) {
AbstractApplicationContext context = new ClassPathXmlApplicationContext(
"applicationContext.xml", WordCounter.class);
System.out.println("Word Count Application Running");
context.registerShutdownHook();
}
}
Code:
Aug 21, 2013 6:36:13 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@1815338: startup date [Wed Aug 21 18:36:13 IST 2013]; root of context hierarchy
Aug 21, 2013 6:36:13 PM org.springframework.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitions
INFO: Loading XML bean definitions from class path resource [com/hadoop/basics/applicationContext.xml]
Aug 21, 2013 6:36:13 PM org.springframework.core.io.support.PropertiesLoaderSupport loadProperties
INFO: Loading properties file from class path resource [resources/hadoop.properties]
Aug 21, 2013 6:36:13 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@7c197e: defining beans [org.springframework.context.support.PropertySourcesPlaceholderConfigurer#0,hadoopConfiguration,wc-job,myjobs-runner,resourceLoader]; root of factory hierarchy
Aug 21, 2013 6:36:14 PM org.springframework.data.hadoop.mapreduce.JobExecutor$2 run
INFO: Starting job [wc-job]
Aug 21, 2013 6:36:14 PM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Aug 21, 2013 6:36:14 PM org.apache.hadoop.security.UserGroupInformation doAs
SEVERE: PriviledgedActionException as:bigdata via 298790 cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-298790\mapred\staging\bigdata364464136\.staging to 0700
Aug 21, 2013 6:36:14 PM org.springframework.data.hadoop.mapreduce.JobExecutor$2 run
WARNING: Cannot start job [wc-job]
java.io.IOException: Failed to set permissions of path: \tmp\hadoop-298790\mapred\staging\bigdata364464136\.staging to 0700
- 1I specified user="bigdata" in the job's configuration as mentioned in the doc.
- 2. The doc. also mentions :
Quote:
SHDP obeys the HDFS permissions, using the identity of the current user (by default) for interacting with the file system. In particular, the HdfsResourceLoader considers when doing pattern matching, only the files that its suppose to see and does not perform any privileged action. It is possible however to specify a different user, meaning the ResourceLoader interacts with HDFS using that user's rights - however this obeys the user impersonation rules