Quantcast
Channel: Spring Community Forums - Hadoop
Viewing all articles
Browse latest Browse all 27

Limit on MAP_INPUT_BYTES when using CascadingTasklet

$
0
0
Hi,

I am currently working on a Spring Batch job that runs a Cascading job as one of the steps. In order to run the Cascading job I have made use of the CascadingTasklet class.

The problem I am facing is as follows. While my Cascading job completes successfully, the step in my Spring Batch job that calls the Cascading job fails with a IllegalArgumentException being thrown.

The message displayed is

Code:

4148563648 cannot be cast to int without changing its value.
I took a look at the source code for CascadingTasklet in GitHub, and saw that if the value of MAP_INPUT_BYTES exceeds the maximum value of Integer, the above error is thrown.

Now, the maximum value of the Integer is approximately 2 GB. My input data is currently around 3 GB with the potential to increase. So while the Cascading job completes, my Spring Batch job fails.

Currently, I am using a workaround in which I am extending CascadingTasklet. So my ExtendedCascadingTasklet class had the exact code as CascadingTasklet except that instead of throwing an IllegalArgumentException I am just logging if the MAP_INPUT_BYTES exceeds the upper limit.

I was wondering if there is some other better, more elegant workaround that I could use. I am currently using spring-data-hadoop 1.0.0.RELEASE.

Thanks!

Viewing all articles
Browse latest Browse all 27

Trending Articles