depccg

Codebase for A* CCG Parsing with a Supertag and Dependency Factored Model

Requirements

Python (Either 2 or 3)
Chainer (newer versions)
Cython
A C++ compiler supporting C++11 standard
OpenMP (optional)
CMake

Build

if you have not installed Chainer or Cython, do pip install chainer cython. Then,

mkdir build
cd build
cmake ..
# In pyenv environment, you may need to pass the path to libpython.so explicitly.
# cmake -DPYTHON_LIBRARY=$HOME/.pyenv/versions/3.6.1/lib/libpython3.so ..
make

Pretrained models

Pretrained models are available:

English (189M)
Japanese (56M)

Running parser

Having successfully built the sources, you'll see depccg.so in build/src directory. In python,

from depccg import PyAStarParser
model = "/path/to/model/directory"
parser = PyAStarParser(model)
res = parser.parse("this is a test sentence .")
# print res.deriv
#  this      is         a     test   sentence  . 
#   NP   ((S\NP)/NP)  (NP/N)  (N/N)     N      . 
#                            ----------------->
#                                  N ->
#                    ------------------------->
#                              NP ->
#       -------------------------------------->
#                     (S\NP) ->
# --------------------------------------------<
#                     S ->
# -----------------------------------------------<rp>
#                      S ->

# parser.parse_doc performs A* search in threads (using OpenMP), which is highly efficient. 
res = praser.parse_doc(sents) # sents: list of (python2: unicode, 3: str)
for tree in res:
    print tree.deriv

For Japanese CCG parsing, use depccg.PyJaAStarParser, which has the exactly same interface.
Note that the Japanese parser accepts pre-tokenized sentences as input.

src/run.py implements example running code. Please refer to it for the detailed usage of the parser.

Training model

TODO

$ python -m py.lstm_parser_bi create
usage: CCG parser's LSTM supertag tagger create [-h]
                                                [--cat-freq-cut CAT_FREQ_CUT]
                                                [--word-freq-cut WORD_FREQ_CUT]
                                                [--afix-freq-cut AFIX_FREQ_CUT]
                                                [--subset {train,test,dev,all}]
                                                [--mode {train,test}]
                                                path out

$ python -m py.lstm_parser_bi train
usage: CCG parser's LSTM supertag tagger train [-h] [--gpu GPU]
                                               [--tritrain TRITRAIN]
                                               [--tri-weight TRI_WEIGHT]
                                               [--batchsize BATCHSIZE]
                                               [--epoch EPOCH]
                                               [--word-emb-size WORD_EMB_SIZE]
                                               [--afix-emb-size AFIX_EMB_SIZE]
                                               [--nlayers NLAYERS]
                                               [--hidden-dim HIDDEN_DIM]
                                               [--dep-dim DEP_DIM]
                                               [--dropout-ratio DROPOUT_RATIO]
                                               [--initmodel INITMODEL]
                                               [--pretrained PRETRAINED]
                                               model train val

We make tri-training dataset publicly available: English Tri-training Dataset (309M)

Evaluation

You can evaluate the performance of a supertagger with src/py/eval_tagger.py:

$ python eval_tagger.py 
usage: evaluate lstm tagger [-h] [--save SAVE] model defs_dir test_data

For the evaluation in CCG-based dependencies, please use evaluation scripts in EasyCCG and C&C.

Citation

If you make use of this software, please cite the following:

@inproceedings{yoshikawa:2017acl,
  author={Yoshikawa, Masashi and Noji, Hiroshi and Matsumoto, Yuji},
  title={A* CCG Parsing with a Supertag and Dependency Factored Model},
  booktitle={Proc. ACL},
  year=2017,
}

Licence

MIT Licence

Contact

For questions and usage issues, please contact [email protected] .

Acknowledgement

In creating the parser, I owe very much to:

EasyCCG: from which I learned everything
NLTK: for nice pretty printing for parse derivation

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
cmake		cmake
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
readme		readme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

depccg

Requirements

Build

Pretrained models

Running parser

Training model

Evaluation

Citation

Licence

Contact

Acknowledgement

About

Releases

Packages

Languages

License

ParallelMeaningBank/depccg

Folders and files

Latest commit

History

Repository files navigation

depccg

Requirements

Build

Pretrained models

Running parser

Training model

Evaluation

Citation

Licence

Contact

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages