Skip to content

parser crash with a tag without href #62

Description

@victorforerocabarcas

htmldocx 0.0.6
An a tag without href causes the following error:

File "/home/victor/Proyectos/html2docx/./prueba.py", line 36, in
new_parser.add_html_to_document(txt2, document)
File "/home/victor/Proyectos/html2docx/venv/lib/python3.10/site-packages/htmldocx/h2d.py", line 591, in add_html_to_document
self.run_process(html)
File "/home/victor/Proyectos/html2docx/venv/lib/python3.10/site-packages/htmldocx/h2d.py", line 583, in run_process
self.feed(html)
File "/usr/lib/python3.10/html/parser.py", line 110, in feed
self.goahead(0)
File "/usr/lib/python3.10/html/parser.py", line 162, in goahead
self.handle_data(unescape(rawdata[i:j]))
File "/home/victor/Proyectos/html2docx/venv/lib/python3.10/site-packages/htmldocx/h2d.py", line 514, in handle_data
self.handle_link(link['href'], data)
KeyError: 'href''

My suggestion is:
add
if 'href' in link:
in line 514

the patch could be:
line 512:
link = self.tags.get('a')
if link:
if 'href' in link:
self.handle_link(link['href'], data)
else:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions