Skip to content

Fix protobuf serde errors#425

Open
angushe wants to merge 2 commits into
twitter:masterfrom
angushe:master
Open

Fix protobuf serde errors#425
angushe wants to merge 2 commits into
twitter:masterfrom
angushe:master

Conversation

@angushe

@angushe angushe commented Dec 4, 2014

Copy link
Copy Markdown
Contributor

Hi,

This is a pull request trying to fix the same problem described in pull request #400, and the fix has been tested successfully on Hive 0.12/0.13 and Protobuf 2.4.1/2.5.0.

Any comments?

Thanks
Angus

@miltonwulei

Copy link
Copy Markdown

I used this patch in cdh5.1.2 with Hive 0.12.0-cdh5.1.0 confirmed that he bug in Issue#400 was resolved! this patch look good to me.

@cooper6581

Copy link
Copy Markdown

I used this with Hive 0.13.1-cdh5.3.1 and Protobuf 2.5.0 in order to resolve Issue #400. Any chance of this getting merged soon? Thanks for this patch angushe!

@harelglik

Copy link
Copy Markdown

Solves #400 for me on Hive 0.13.1 and Protobuf 2.5.0 on AWS AMI 3.3.1.
Great fix, will it get merged soon?

@alastrange

Copy link
Copy Markdown

Used to resolve issue #400 with Protobuf 2.5.0, Hive 0.14.0 & HDP 2.2. Thanks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this mix in structFields.hashCode() too? This is still correct, but will have more hash collisions in the case of comparing things with the same descriptor but different structFields (idk if that's common or not)

@isnotinvain

Copy link
Copy Markdown
Contributor

@rangadi I don't know too much about protobuf dynamic messages, would you mind giving this a look too?

There's a lot of casting + isntanceof going on in here where there previously wasn't -- is that part of the direct fix for the issue, or are these just the only way to use DynamicMessage?

@rangadi

rangadi commented May 20, 2015

Copy link
Copy Markdown
Contributor

We haven't used Hive serde's actively. I will take look anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this fix uses only the builder, do you ever expect Message?

@rangadi

rangadi commented May 20, 2015

Copy link
Copy Markdown
Contributor

The fix looks good. I am not sure about Alex's comment on hashCode(). I just have one comment: if we ever expect Message object.

@joshk0

joshk0 commented Feb 14, 2017

Copy link
Copy Markdown

Let's assume this patch will never be merged. In this case, I would like to optimize this pull request's SEO.

I was seeing issues like this when running Hive queries on Protobuf external tables requiring a MapReduce job. These issues would not present on queries like:

SELECT * FROM protobuf_external_table LIMIT 1;

But when running a query like this:

SELECT DISTINCT(field.subfield) FROM protobuf_external_table;

I would get a traceback:

17/02/14 10:09:37 [LocalJobRunner Map Task Executor #0]: ERROR mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable <LOTS OF BYTES>
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: FieldDescriptor does not match message type.
	at com.google.protobuf.GeneratedMessage$FieldAccessorTable.getField(GeneratedMessage.java:1536)
	at com.google.protobuf.GeneratedMessage$FieldAccessorTable.access$100(GeneratedMessage.java:1449)
	at com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:366)
	at com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:228)
	at io.arbor.elephantbird.ProtobufStructObjectInspector.setStructFieldData(ProtobufStructObjectInspector.java:148)
	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:407)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:129)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:92)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:488)
	... 10 more

Applying both revisions of this PR fixed the issue conclusively.

@sugix

sugix commented Mar 3, 2017

Copy link
Copy Markdown

Really thanks for this and please merge it asap. I took the patch and it works like a charm now.

@isnotinvain

Copy link
Copy Markdown
Contributor

Looks like way back when we had some questions on this PR that didn't get answered. Anyone interested in taking a look? I think we can merge this if someone wants to verify it's still working + address the review feedback?

@agammishra

Copy link
Copy Markdown

I am using elephant-bird-hive-4.15.jar but still I am getting the same issue why???

@CLAassistant

CLAassistant commented Jul 18, 2019

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.