Fix NPE when iterating over an input split in CompositeRecordReader.java#436
Fix NPE when iterating over an input split in CompositeRecordReader.java#436venugit wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Can we safely advance here? The interface suggests that setKeyValue will always be called before nextKeyValue, so calling nextKeyValue here could cause us to skip a record?
There was a problem hiding this comment.
You are right, the interface offers a possibility of skipping over a record. Re-reading the code then, it might work to have setKeyValue simply set this.key and this.value when currentRecordReader is null; the next call to nextKeyValue will then invoke re-initialize currentRecordReader and invoke currentRecordReader.setKeyValue on the "cached" key/value.
Thoughts?
There was a problem hiding this comment.
I think from my reading that should work, this stuff is unfortunately hard to parse :/
|
Ping on this, got time to update as we discussed? |
|
Hi, sorry for not getting back to you earlier. I tried out the update, and that fixes the issue through one set of splits. However, there needs to be a good way to persist the last set key/value pairs between instances of CompositeRecordReader, which I have not revisited. Here is the stack trace seen where the solution I outlined is tried: Caused by: java.io.IOException: The RecordReader returned a key and value that do not match the key and value sent to it. This means the RecordReader did not properly implement com.twitter.elephantbird.mapred.input.MapredInputFormatCompatible. Current reader class : class com.twitter.elephantbird.mapreduce.input.combine.CompositeRecordReader I'm going to spend some time this week on this. |
|
Venugopal Gummuluru seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
When iterating over input splits via DeprecatedInputFormatWrapper, DeprecatedInputFormatWrapper.java always calls mifcReader.setKeyValue(key, value) before nextValue is invoked which can call through to setKeyValue in CompositeRecordReader.java. setKeyValue requires that the currentRecordReader instance be non-null; however currentRecordReader is set to null in line 113 at the end of every input split, leading to an NPE with the next call to setKeyValue after the end of an input split.
This patch address the situation by having the setKeyValue method doing a null check for currentRecordReader and in the case it is null, invoking nextKeyValue to see if there are any more elements to be found