Quantcast
Channel: Spring Community Forums - Hadoop
Viewing all articles
Browse latest Browse all 27

Avro job

$
0
0
Hi all,

I am trying to setup a job taking avro files as input and producing an avro file.
I managed to read avro records in my Mapper (extending org.apache.hadoop.mapreduce.Mapper, so new API).

My problem is I have no clue of how to write to the context and then read in org.apache.hadoop.mapreduce.Reducer (and even write again).
I have no clue neither of how to bootstrap <hadoo:job/> configuration for avro.

Is it possible to have a small sample of how to make a simple avro job in spring-hadoop ?
This would be the first sample of an avro job using new mapreduce (not mapred) API I find, with or without Spring...

EDIT: I should have provided Spring config (at least, I would'nt have lost what was making the Mapper read from avro file) :
Code:

        <!-- com.c4_soft.hadoop.CustomerIdAvroMapper extends Mapper<AvroKey<SerializableCustomer>, NullWritable, LongWritable, AvroValue<SerializableCustomer>> -->
        <!-- com.c4_soft.hadoop.CustomerIdAvroReducer extends Reducer<LongWritable, AvroValue<SerializableCustomer>, AvroKey<SerializableFullBill>, NullWritable> -->
        <hadoop:job
                id="lab5"
                jar-by-class="com.c4_soft.hadoop.CustomerIdAvroMapper"
                mapper="com.c4_soft.hadoop.CustomerIdAvroMapper"
                map-key="org.apache.hadoop.io.LongWritable"
                map-value="org.apache.avro.mapred.AvroValue"
                reducer="com.c4_soft.hadoop.CustomerIdAvroReducer"
                input-path="${path.customers.input}"
                input-format="org.apache.avro.mapreduce.AvroKeyInputFormat"
                output-path="${path.customers.output}"
                output-format="org.apache.avro.mapreduce.AvroKeyOutputFormat">
                avro.schema.input.key=com.c4_soft.hadoop.avro.SerializableCustomer.SCHEMA$
                avro.schema.output.key=com.c4_soft.hadoop.avro.SerializableFullBill.SCHEMA$
        </hadoop:job>

When running that, I get (avro.schema.input.key & avro.schema.output.key are not interpreted, 'c' is first letter of package name):
Code:

org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected character ('c' (code 99)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.StringReader@1e47c8ac; line: 1, column: 2]
        at org.apache.avro.Schema$Parser.parse(Schema.java:929)
        at org.apache.avro.Schema$Parser.parse(Schema.java:917)
        at org.apache.avro.Schema.parse(Schema.java:966)
        at org.apache.avro.mapreduce.AvroJob.getInputKeySchema(AvroJob.java:142)
        at org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:489)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('c' (code 99)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.StringReader@1e47c8ac; line: 1, column: 2]
        at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
        at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
        at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
        at org.codehaus.jackson.impl.ReaderBasedParser._handleUnexpectedValue(ReaderBasedParser.java:630)
        at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:364)
        at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2439)
        at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2377)
        at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1234)
        at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1209)
        at org.apache.avro.Schema$Parser.parse(Schema.java:927)
        ... 8 more


Viewing all articles
Browse latest Browse all 27

Trending Articles