Skip to content

Implement support for new JMdictDB <example> element#47

Open
machinamentum wants to merge 5 commits into
neocl:mainfrom
machinamentum:jmdict-examples
Open

Implement support for new JMdictDB <example> element#47
machinamentum wants to merge 5 commits into
neocl:mainfrom
machinamentum:jmdict-examples

Conversation

@machinamentum

Copy link
Copy Markdown

Sometime in 2021, the JMdictDB project added back in examples in a new format.
DTD reference: https://gitlab.com/yamagoya/jmdictdb/-/blob/master/jmdictdb/data/dtd-jmdict.xml

There are ~30k example sentence pairs (described as "priority", whatever that means) imported from tatoeba.

Using these sentences requires rebuilding the jamdict database with JMdict_e_examp.gz found at https://web.archive.org/web/20250401012724/https://www.edrdg.org/wiki/index.php/Main_Page in place of JMdict_e.gz. As of writing this, JMdict_e_examp.gz from archive.org has creation date of 28 March 2025.
It looks like it is also relatively straightforward to generate the latest JMdict_e and JMdict_e_examp XML files using the tooling from the JMdictDB project, see https://gitlab.com/yamagoya/jmdictdb/-/blob/master/doc/OPERATION.txt

This change adds:

  • a new SQL element type, SenseExample with entries for "jpn" and "eng" sentence pair
  • a new python class, SenseExample
  • parsing support for <example> element
  • Printing examples under their associated sense, via lookup.py

Example output from lookup.py:

(penv) machinamentum@MacBook-Air jamdict % python3 jamdict/tools.py lookup 食べる
========================================
Found entries
========================================
Entry: 1358280 | Kj:  食べる, 喰べる | Kn: たべる
--------------------
1. to eat ((Ichidan verb|transitive verb))
	1. もっと果物を食べるべきです。
	   You should eat more fruit.
2. to live on (e.g. a salary)/to live off/to subsist on ((Ichidan verb|transitive verb))
	1. 僕は脚本家で食べていく決心をした。
	   I am determined to make a living as a playwright.

========================================
Found characters
========================================
Char: 食 | Strokes: 9
--------------------
Readings: shi2, si4, sig, sa, 식, 사, Thực, Tự, ショク, ジキ, く.う, く.らう, た.べる, は.む
Meanings: eat, food

Char: 喰 | Strokes: 12
--------------------
Readings: shi2, si4, sig, 식, Thặc, Thực, Tự, く.う, く.らう
Meanings: eat, drink, receive (a blow), (kokuji)


No name was found.

@machinamentum

Copy link
Copy Markdown
Author

This maybe closes #18 even though this change does not parse and store the examples directly from tatoeba.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant