Clean up Chinese word segmentation review feedback by cary-rowen · Pull Request #20205 · nvaccess/nvda

cary-rowen · 2026-05-24T17:26:30Z

Summary

Addresses preliminary review comments on #20183:

Move the Document Navigation word segmentation setting below paragraph style.
Move the matching user guide heading below the paragraph style section.
Remove redundant outer exception handling around wordSeg.initialize().
Update the changelog wording and move the useUniscribe deprecation entry to 2026.3.
Move the new word segmentation tests into tests/unit/test_wordSeg.py.
Move WordSegmenter into textUtils.wordSeg.wordSegmenter.
Narrow WordSegmenter.getSegmentForOffset exception handling and cover the fallback behavior.

cary-rowen · 2026-05-24T17:36:23Z

cc @CrazySteve0605
Just wanted to let you know that I've done this.

Copilot

Pull request overview

This PR cleans up and incorporates review feedback for the Chinese word segmentation work by reorganizing UI/help text ordering, tightening initialization/exception handling, and restructuring the WordSegmenter implementation and its unit tests.

Changes:

Reordered the Document Navigation “Word Segmentation Standard” setting (and corresponding User Guide anchor) to appear after Paragraph Style.
Removed redundant outer exception handling around wordSeg.initialize() at startup and after saving settings.
Moved WordSegmenter into textUtils.wordSeg.wordSegmenter, updated call sites/imports, and consolidated/expanded unit tests into tests/unit/test_wordSeg.py.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
user_docs/en/userGuide.md	Moves the Word Segmentation Standard help anchor below Paragraph Style to match the settings panel order.
user_docs/en/changes.md	Refines release note wording and relocates the `useUniscribe` deprecation note into 2026.3.
tests/unit/test_wordSeg.py	Adds consolidated unit tests for word segmentation initialization, segmenter behavior, and offset conversion.
tests/unit/test_textUtils.py	Removes word segmentation-related tests now covered by `test_wordSeg.py`.
source/textUtils/wordSeg/wordSegUtils.py	Updates imports to use the relocated `WordSegmenter` module.
source/textUtils/wordSeg/wordSegmenter.py	Introduces the new `WordSegmenter` module with narrowed recoverable exception handling.
source/textUtils/init.py	Removes the previous inline `WordSegmenter` implementation from the package root.
source/textInfos/offsets.py	Switches word offset calculation to the relocated `WordSegmenter` import.
source/gui/settingsDialogs.py	Reorders controls in Document Navigation panel and removes redundant `wordSeg.initialize()` exception wrapping.
source/core.py	Removes redundant startup exception wrapping around `wordSeg.initialize()`.

 You may toggle through the available paragraph styles from anywhere by assigning a key in the [Input Gestures dialog](#InputGestures).

+##### Word Segmentation Standard {#WordSegmentationStandard}
+


seanbudd · 2026-05-25T02:06:02Z

@@ -0,0 +1,23 @@
+# A part of NonVisual Desktop Access (NVDA)


why not textUtils/braille?

Addresses preliminary review comments on nvaccess#20183: Move the Document Navigation word segmentation setting below paragraph style. Move the matching user guide heading below the paragraph style section. Remove redundant outer exception handling around wordSeg.initialize(). Update the changelog wording and move the useUniscribe deprecation entry to 2026.3. Move the new word segmentation tests into tests/unit/test_wordSeg.py. Move WordSegmenter into textUtils.wordSeg.wordSegmenter. Narrow WordSegmenter.getSegmentForOffset exception handling and cover the fallback behavior.

Squash commit of: - #18548 - #18735 - #18865 - #19324 - #19747 - #20041 - #20055 - #20106 - #20162 - #20178 - #20185 - #20205 - #20233 - #20242 - #20227 - #20279 - #20278 - #20288 Previous try branch commit history: #19166 --- This pull request introduces Chinese word segmentation support in NVDA through the integration of the `cppjieba` library. It adds the `cppjieba` submodule, builds and links the library into the NVDA build process, exposes new C++ and Python APIs for word segmentation, and updates braille and text handling to take advantage of improved segmentation for Chinese text. Configuration options and documentation are also updated to reflect the new dependency and features. **Integration of Chinese Word Segmentation (cppjieba):** * Added `cppjieba` as a submodule (`include/cppjieba`), included its license, and documented its usage and commit in the project documentation and `copying.txt`. [[1]](diffhunk://#diff-fe7afb5c9c916e521401d3fcfb4277d5071798c3baf83baf11d6071742823584R45-R47) [[2]](diffhunk://#diff-93d82d0c89b85c60d37ef8cb3828604e99efd8c53e20003a3214e8bbc715a638R1030) [[3]](diffhunk://#diff-3cb01a1174e7f0d6bcc2bee827d1f8df003c04469cea0fe88c5b4ac63aa06217R64-R70) [[4]](diffhunk://#diff-1728b4fa73cf927d66cd8f6fa2052dd1a7b33300e4f69e5a468761c3d4bd6390R100) * Implemented a SCons build script and build integration for `cppjieba`, ensuring the library and its dictionaries are built and installed as part of the NVDA build process. [[1]](diffhunk://#diff-94023871807359f67c40ba97760d6117d3e994483d29aa86e8d061bd8edadf21R1-R55) [[2]](diffhunk://#diff-a833c4b2ebcb5b6b4d112dee8dd790abd86a7cf30a463c1289c1adb7fc2a73ceR229-R232) [[3]](diffhunk://#diff-618cd5b83d62060ba3d027e314a21ceaf75d36067ff820db126642944145393eR52) * Added workflow improvements to ensure recursive submodule checkouts, so all dependencies (including nested ones) are fetched. **C++ and Python API Additions:** * Developed a thread-safe singleton wrapper and C API for `cppjieba` in `nvdaHelper/cppjieba`, exposing functions for initialization, segmentation, user word management, and memory management. [[1]](diffhunk://#diff-e445387d732c898ca23a1002fdd053848021a2c77ab71f870bbcc58f13cd6f53R1-R148) [[2]](diffhunk://#diff-af466fb9159a3862f17b7fc072d55c0024d413c73b2086cb922234953db6b7e0R1-R165) [[3]](diffhunk://#diff-14344d8042e55a42197012de94eeb0b8568c3ffb5e3a3876222e7cb1656a98daR1-R8) * Exposed new DLL and dictionary path properties in `NVDAState.py` for use by Python code. **Braille and Text Handling Enhancements:** * Updated braille output logic to use word segmentation for Chinese tables, applying a new `WordSegWithSeparatorOffsetConverter` to improve cursor and offset mapping. [[1]](diffhunk://#diff-56f1d6d0f5f57f0e55ce1ce5914c0ea1c66977dcd1d8f7d53d8da1d70eda41eeL79-R81) [[2]](diffhunk://#diff-56f1d6d0f5f57f0e55ce1ce5914c0ea1c66977dcd1d8f7d53d8da1d70eda41eeL609-R635) [[3]](diffhunk://#diff-56f1d6d0f5f57f0e55ce1ce5914c0ea1c66977dcd1d8f7d53d8da1d70eda41eeL632-R650) * Modified edit text handling to enforce the use of Uniscribe for character and word segmentation, ensuring consistent behavior with the new segmentation logic. [[1]](diffhunk://#diff-75bd009f6ff4404204bbbf144c81b729e6c24dcfbb16572b9b2df07b4f1c08eaR29) [[2]](diffhunk://#diff-75bd009f6ff4404204bbbf144c81b729e6c24dcfbb16572b9b2df07b4f1c08eaR167-R173) * Updated copyright and contributor attributions. [[1]](diffhunk://#diff-75bd009f6ff4404204bbbf144c81b729e6c24dcfbb16572b9b2df07b4f1c08eaL2-R2) [[2]](diffhunk://#diff-63eadb2c933d4403ec73ca9e97c4314a4f89ed9f3d8fde080bfc11315583d348L4-R5) **Configuration and Documentation Updates:** * Added configuration options for eager initialization and selection of word segmentation standards in `configSpec.py`. * Updated documentation to reflect the new dependency and its usage. [[1]](diffhunk://#diff-3cb01a1174e7f0d6bcc2bee827d1f8df003c04469cea0fe88c5b4ac63aa06217R64-R70) [[2]](diffhunk://#diff-1728b4fa73cf927d66cd8f6fa2052dd1a7b33300e4f69e5a468761c3d4bd6390R100) **Summary of Most Important Changes:** **1. Integration of cppjieba for Chinese Word Segmentation** - Added `cppjieba` as a submodule, updated `.gitmodules`, and documented its license and commit. [[1]](diffhunk://#diff-fe7afb5c9c916e521401d3fcfb4277d5071798c3baf83baf11d6071742823584R45-R47) [[2]](diffhunk://#diff-93d82d0c89b85c60d37ef8cb3828604e99efd8c53e20003a3214e8bbc715a638R1030) [[3]](diffhunk://#diff-3cb01a1174e7f0d6bcc2bee827d1f8df003c04469cea0fe88c5b4ac63aa06217R64-R70) [[4]](diffhunk://#diff-1728b4fa73cf927d66cd8f6fa2052dd1a7b33300e4f69e5a468761c3d4bd6390R100) - Implemented SCons build scripts and build integration for `cppjieba`, including dictionary installation. [[1]](diffhunk://#diff-94023871807359f67c40ba97760d6117d3e994483d29aa86e8d061bd8edadf21R1-R55) [[2]](diffhunk://#diff-a833c4b2ebcb5b6b4d112dee8dd790abd86a7cf30a463c1289c1adb7fc2a73ceR229-R232) [[3]](diffhunk://#diff-618cd5b83d62060ba3d027e314a21ceaf75d36067ff820db126642944145393eR52) - Improved workflow to fetch all submodules recursively. **2. C++/Python API and Library Exposure** - Created thread-safe singleton and C API for `cppjieba`, exposing segmentation and user word management functions. [[1]](diffhunk://#diff-e445387d732c898ca23a1002fdd053848021a2c77ab71f870bbcc58f13cd6f53R1-R148) [[2]](diffhunk://#diff-af466fb9159a3862f17b7fc072d55c0024d413c73b2086cb922234953db6b7e0R1-R165) [[3]](diffhunk://#diff-14344d8042e55a42197012de94eeb0b8568c3ffb5e3a3876222e7cb1656a98daR1-R8) - Exposed DLL and dictionary paths in `NVDAState.py` for use by Python code. **3. Braille and Text Handling Improvements** - Enhanced braille output to use word segmentation for Chinese, improving offset and cursor mapping. [[1]](diffhunk://#diff-56f1d6d0f5f57f0e55ce1ce5914c0ea1c66977dcd1d8f7d53d8da1d70eda41eeL79-R81) [[2]](diffhunk://#diff-56f1d6d0f5f57f0e55ce1ce5914c0ea1c66977dcd1d8f7d53d8da1d70eda41eeL609-R635) [[3]](diffhunk://#diff-56f1d6d0f5f57f0e55ce1ce5914c0ea1c66977dcd1d8f7d53d8da1d70eda41eeL632-R650) - Updated edit text handling to enforce Uniscribe for segmentation. [[1]](diffhunk://#diff-75bd009f6ff4404204bbbf144c81b729e6c24dcfbb16572b9b2df07b4f1c08eaR29) [[2]](diffhunk://#diff-75bd009f6ff4404204bbbf144c81b729e6c24dcfbb16572b9b2df07b4f1c08eaR167-R173) **4. Configuration and Documentation** - Added configuration options for word segmentation initialization and standards. - Updated documentation and copyright attributions. [[1]](diffhunk://#diff-3cb01a1174e7f0d6bcc2bee827d1f8df003c04469cea0fe88c5b4ac63aa06217R64-R70) [[2]](diffhunk://#diff-75bd009f6ff4404204bbbf144c81b729e6c24dcfbb16572b9b2df07b4f1c08eaL2-R2) [[3]](diffhunk://#diff-63eadb2c933d4403ec73ca9e97c4314a4f89ed9f3d8fde080bfc11315583d348L4-R5) This lays the groundwork for robust Chinese word navigation and segmentation in NVDA, improving accessibility for Chinese users.

cary-rowen added 2 commits May 22, 2026 19:58

Fix direct Sean review items

5c8f369

Move WordSegmenter into word segmentation package

242b894

cary-rowen requested review from a team as code owners May 24, 2026 17:26

cary-rowen requested review from Qchristensen and SaschaCowley and removed request for a team May 24, 2026 17:26

seanbudd reviewed May 25, 2026

View reviewed changes

Comment thread tests/unit/test_wordSeg.py Outdated

Comment thread tests/unit/test_wordSeg.py Outdated

seanbudd requested a review from Copilot May 25, 2026 00:11

Copilot started reviewing on behalf of seanbudd May 25, 2026 00:11 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread user_docs/en/userGuide.md

You may toggle through the available paragraph styles from anywhere by assigning a key in the [Input Gestures dialog](#InputGestures).

##### Word Segmentation Standard {#WordSegmentationStandard}

cary-rowen added 2 commits May 25, 2026 09:51

Move word segmentation test helpers out of nested scopes

232b53f

Move braille offset converter helper out of braille module

5153235

seanbudd reviewed May 25, 2026

View reviewed changes

Move braille offset helper under textUtils

775c26d

seanbudd merged commit cf886e6 into nvaccess:try-chineseWordSegmentation-staging-2 May 25, 2026
30 of 33 checks passed

github-actions Bot added this to the 2026.3 milestone May 25, 2026

seanbudd mentioned this pull request May 25, 2026

Add Chinese Word Segmentation #20183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean up Chinese word segmentation review feedback#20205

Clean up Chinese word segmentation review feedback#20205
seanbudd merged 5 commits into
nvaccess:try-chineseWordSegmentation-staging-2from
cary-rowen:try-chineseWordSegmentation-sean-review-direct-fixes

cary-rowen commented May 24, 2026 •

edited

Loading

Uh oh!

cary-rowen commented May 24, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

seanbudd May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		You may toggle through the available paragraph styles from anywhere by assigning a key in the [Input Gestures dialog](#InputGestures).

		##### Word Segmentation Standard {#WordSegmentationStandard}

Uh oh!

Conversation

cary-rowen commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

cary-rowen commented May 24, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

seanbudd May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cary-rowen commented May 24, 2026 •

edited

Loading