Skip to content

peter-grajcar/clause-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clause Extraction

An implementation of algorithm for clause extraction described in paper L. Zhang, H. Zhu, S. Brahma, and Y. Li, “Small but Mighty: New Benchmarks for Split and Rephrase,” 2020.

Usage

./split.py [-h] [--udpipe_model UDPIPE_MODEL] [--udpipe_server UDPIPE_SERVER] [--srl_model SRL_MODEL] [--sep SEP] --input INPUT --output OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  --udpipe_model UDPIPE_MODEL
                        Path to the UDPipe 1 model (used for tokenisation).
  --udpipe_server UDPIPE_SERVER
                        URL of the UDPipe 2 server (used for parsing).
  --srl_model SRL_MODEL
                        Path to the AllenNLP semantic role labeling model.
  --sep SEP             Clause separator used in the output file.
  --input INPUT         Input file.
  --output OUTPUT       Output file.

Example Outputs

Original sentence:

Twilight performs black metal music which when part of a musical fusion is called death metal.

Extracted clauses:

  • Twilight performs black metal music .
  • black metal music when part of a musical fusion is called death metal

Original sentence:

William Anders who is originally from Hong Kong worked for NASA and became a crew member for Apollo 8 along with Buzz Aldrin and Frank Borman.

Extracted clauses:

  • William Anders is originally from Hong Kong
  • William Anders worked for NASA .
  • William Anders who is originally from Hong Kong became a crew member for Apollo 8 along with Buzz Aldrin and Frank Borman

The Algorithm

The description of the algorithm from the paper (Appendix A):

Given a complex sentence, the model runs the following processes once each.

Wh Handling

Using semantic role labeling, the model looks for a Relational Argument (R-ARG), and the Subject Argument (asserted to be the ARG preceding the R-ARG). Then, a split is made with the Relational Argument replaced by the Subject Argument.

Conjunction Handling

The model looks for the word “and”. Using semantic role labeling, if the word following “and” is an argument (ARG), assert that “and” is followed by a sentence, and a split is made. Or, if the word following “and” is a verb (V), the model asserts the Subject Argument to be the ARG preceding the V; a split is made with “and” replaced by the Subject Argument.

Insertion Handling

Using dependency parsing, the model looks for a node with type participle modifier, relative clause modifier, prepositional modifier, adjective modifier, or appositional modifier. The clause with the node as the root is extracted, prepended with the subject, and split as a new simple sentence. The rest of the original complex sentence is split as another new simple sentence.

About

Utility for clause extraction from complex sentences

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors