Skip to content

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 2270: invalid continuation byte #41

Description

@songyuc

Hi, guys,
I am trying using the scripts in this repo to preprocess the im2latex dataset, but I met this error as,

2020-08-26 17:16:23,199 root INFO Script being executed: scripts/preprocessing/preprocess_formulas.py
Traceback (most recent call last):
File "scripts/preprocessing/preprocess_formulas.py", line 87, in
main(sys.argv[1:])
File "scripts/preprocessing/preprocess_formulas.py", line 65, in main
for line in fin:
File "/home/songyuc/software/python/anaconda/anaconda3/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 2270: invalid continuation byte

So, how can I solve this?
Any answer or idea will be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions