Hi all,
I am trying to setup a job taking avro files as input and producing an avro file.
I managed to read avro records in my Mapper (extending org.apache.hadoop.mapreduce.Mapper, so new API).
My problem is I have no clue of how to write to the context and then read in org.apache.hadoop.mapreduce.Reducer (and even write again).
I have no clue neither of how to bootstrap <hadoo:job/> configuration for avro.
Is it possible to have a small sample of how to make a simple avro job in spring-hadoop ?
This would be the first sample of an avro job using new mapreduce (not mapred) API I find, with or without Spring...
EDIT: I should have provided Spring config (at least, I would'nt have lost what was making the Mapper read from avro file) :
When running that, I get (avro.schema.input.key & avro.schema.output.key are not interpreted, 'c' is first letter of package name):
I am trying to setup a job taking avro files as input and producing an avro file.
I managed to read avro records in my Mapper (extending org.apache.hadoop.mapreduce.Mapper, so new API).
My problem is I have no clue of how to write to the context and then read in org.apache.hadoop.mapreduce.Reducer (and even write again).
I have no clue neither of how to bootstrap <hadoo:job/> configuration for avro.
Is it possible to have a small sample of how to make a simple avro job in spring-hadoop ?
This would be the first sample of an avro job using new mapreduce (not mapred) API I find, with or without Spring...
EDIT: I should have provided Spring config (at least, I would'nt have lost what was making the Mapper read from avro file) :
Code:
<!-- com.c4_soft.hadoop.CustomerIdAvroMapper extends Mapper<AvroKey<SerializableCustomer>, NullWritable, LongWritable, AvroValue<SerializableCustomer>> -->
<!-- com.c4_soft.hadoop.CustomerIdAvroReducer extends Reducer<LongWritable, AvroValue<SerializableCustomer>, AvroKey<SerializableFullBill>, NullWritable> -->
<hadoop:job
id="lab5"
jar-by-class="com.c4_soft.hadoop.CustomerIdAvroMapper"
mapper="com.c4_soft.hadoop.CustomerIdAvroMapper"
map-key="org.apache.hadoop.io.LongWritable"
map-value="org.apache.avro.mapred.AvroValue"
reducer="com.c4_soft.hadoop.CustomerIdAvroReducer"
input-path="${path.customers.input}"
input-format="org.apache.avro.mapreduce.AvroKeyInputFormat"
output-path="${path.customers.output}"
output-format="org.apache.avro.mapreduce.AvroKeyOutputFormat">
avro.schema.input.key=com.c4_soft.hadoop.avro.SerializableCustomer.SCHEMA$
avro.schema.output.key=com.c4_soft.hadoop.avro.SerializableFullBill.SCHEMA$
</hadoop:job>
Code:
org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected character ('c' (code 99)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.StringReader@1e47c8ac; line: 1, column: 2]
at org.apache.avro.Schema$Parser.parse(Schema.java:929)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapreduce.AvroJob.getInputKeySchema(AvroJob.java:142)
at org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:489)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('c' (code 99)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.StringReader@1e47c8ac; line: 1, column: 2]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
at org.codehaus.jackson.impl.ReaderBasedParser._handleUnexpectedValue(ReaderBasedParser.java:630)
at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:364)
at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2439)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2377)
at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1234)
at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1209)
at org.apache.avro.Schema$Parser.parse(Schema.java:927)
... 8 more