This repository provides utilities for downloading, extracting, configuring, and training dependency parsers on both Universal Dependencies (UD) and Surface-syntactic Universal Dependencies (SUD) treebanks using the BertforDeprel architecture.
treebank_utilities.py— Download, extract, and configure UD/SUD treebanks.train_utilities.py— Prepare data and train models for each treebank.statistic_utilities.py— Collect and summarize model statistics.utils.py— Shared utilities and environment variable management.start_train.sh— Shell script to start model training.update_treebanks.sh— Shell script to update/download treebanks.update_stats.sh— Shell script to update model statistics.
-
Install dependencies
Make sure you have installed packages inrequirements.txt -
Configure environment
Create a.envfile with the following variables:PATH_TREEBANKS=/path/to/treebanks PATH_MODELS=/path/to/models PATH_BERTFORDEPREL_VENV=/path/to/venv/bin/python PATH_BERTFORDEPREL_SCRIPT=/path/to/bertfordeprel_script.py
Run the following command to download and extract UD or SUD treebanks and generate configuration files:
./update_treebanks.sh <TREEBANK_TYPE> <VERSION> <TREEBANKS_DOWNLOAD_URL><TREEBANK_TYPE>: EitherUDorSUD<VERSION>: The treebank version (e.g., 2.17)<TREEBANKS_DOWNLOAD_URL>: URL to download the treebanks archive or local file path
Examples:
# Download UD treebanks
./update_treebanks.sh UD 2.17 https://lindat.mff.cuni.cz/repository/server/api/core/items/55b06337-e49c-4631-9328-b1a38322b1d4/allzip?handleId=11234/1-5901
# Use local treebank file for SUD treebanks
./update_treebanks.sh SUD 2.17 /path/to/sud-treebanks-v2.17.tgzAfter preparing the treebanks, train models using one of the following commands:
# Train all UD treebanks (default)
./start_train.sh <VERSION>
./start_train.sh <VERSION> UD
# Train all SUD treebanks
./start_train.sh <VERSION> SUD
# Train a specific treebank
./start_train.sh <VERSION> <TREEBANK_NAME>Examples:
# Train all UD treebanks for version 2.17
./start_train.sh 2.17 UD
# Train all SUD treebanks for version 2.17
./start_train.sh 2.17 SUD
# Train a specific UD treebank
./start_train.sh 2.17 UD_English-Atis
# Train a specific SUD treebank
./start_train.sh 2.17 SUD_French-GSDAfter training, update the statistics summary:
./update_stats.sh