Name	Name	Last commit message	Last commit date
parent directory ..
corefgraph	corefgraph
site-packages	site-packages
vendor	vendor
README.md	README.md
config.default	config.default

Core Installation

pip install --upgrade --user hg+https://bitbucket.org/Josu/pykaf#egg=pykaf
pip install --upgrade --user networkx

(optional if logging is wanted)

pip install --upgrade --user pyYAML

Core Usage

For coreference resolution, execute, from core/ directory:

cat input.kaf | python -m corefgraph.process.file --language (de|en|es|fr|it|nl)

for singleton clusters (automatic markables in annotation jargon):

cat input.kaf | python -m corefgraph.process.file --language (de|en|es|fr|it|nl) --singleton --sieves NO

for drone.io testing:

cat input.kaf | python -m corefgraph.process.file --language (de|en|es|fr|it|nl) --time_stamp now

if unsure that the constituent parsing KAF layer is well-formattted:

cat input.kaf | python -m corefgraph.process.file --language (de|en|es|fr|it|nl) --unsafe_tree

for help:

python -m corefgraph.process.file --help

LONG VERSION

How to install

The Easy Ride: pip installation

PIP can install the module and every dependency in one command. It can be installed globally or at a user level.

Globally (requires sudo privileges in linux systems):

  sudo pip install hg+https://bitbucket.org/Josu/corefgraph#egg=corefgraph

The module can be updated to the newest version with:

  sudo pip install -U hg+https://bitbucket.org/Josu/corefgraph#egg=corefgraph

User available (is installed in the user disk space and only available to him)

  pip install --user hg+https://bitbucket.org/Josu/corefgraph#egg=corefgraph

The module can be updated to the newest version with:

  pip install --user -U hg+https://bitbucket.org/Josu/corefgraph#egg=corefgraph

Long way: repository installation

For more control the module can be directly downloaded and copied into the file system.

hg clone https://bitbucket.org/Josu/corefgraph
cp corefgraph/corefgraph /usr/local/lib/python2.7/dist-packages/

To update the module use:

hg update
cp corefgraph/corefgraph /usr/local/lib/python2.7/dist-packages/

####Install dependencies

In order to use corefgraph, some dependencies are needed:

pyKAF:

  hg clone https://bitbucket.org/Josu/pyKAF#egg=pyKAF
  cp pyKAF/pyKAF /usr/local/lib/python2.7/dist-packages/

To update the dependency use:

  hg update
  cp pyKAF/pyKAF /usr/local/lib/python2.7/dist-packages/

pycorpus:

  hg clone https://bitbucket.org/Josu/pyCorpus#egg=pyCorpus
  cp pyCorpus/pyCorpus /usr/local/lib/python2.7/dist-packages/

To update the dependency use:

  hg update
  cp pyCorpus/pyCorpus /usr/local/lib/python2.7/dist-packages/

networkx: We recommend install networkx with PIP
```
  pip install networkx
```
or
```
  pip install --user networkx
```
To update the dependency use:
```
  pip install -U networkx
```
or
```
  pip install --U -user networkx
```
For more installation instructions, please visit its home page
pyYALM:

While install pyYAML is recommended, which is used in logging, this is not compulsory.

We recommend install pyYAML with pip.
```
  pip install --user pyYAML
```
or
```
  sudo pip install pyYAML
```
To update the dependency use:
```
  pip install -U pyYAML
```
or
```
  pip install --U -user pyYAML
```
For more installation instructions, please visit its home page.

Usage

This module may be used to process single files or directories (corpus). CorefGraph takes KAF or NAF documents as input. The input KAF/NAF documents must contain:

Tokenized text
Part of Speech tags and Lemmas
Named Entities
Constituent Parsing with headwords for each constituent marked.

The KAF specification is available here.

The NAF specification can be found here

Single file

The most simple way to use this module is this:

python -m corefgraph.process.file --file your_file.KAF --language (de|en|es|fr|it|nl)

This sentence outputs a KAF file containing all the original file info plus the coreference clusters.

The module is usable as a pipe:

cat your_file.cat | python -m corefgraph.process.file --language es > output.KAF

Options

The system comes with a lot of options. There are grouped an described for review. Use --help parameter for the default and possible values.

Input file related

--file -f The name of the file to process.

--treebank (Optional) A file with the treebanks of the file. 
           If provided the syntactic info in the KAF is ignored.
--speakers (Optional) A file containing the speakers of the text. 
           One line per word, sentences separated by blank lines, no speaker 
           is marked with '-'.
--reader   (Optional) Switch into different input formats. For the moment only
           NAF and KAF.

Algorithm related:

--language          Select the language resources to use; 'es' and 'en' are 
                    included so far.
--sieves            (Optional) The plain name of the sieves that must be use
                    by the module.
--sieves_options    (Optional) The options passed (as keywords) to 
                    the sieves.
--extractor_options (Optional) The options used during the mention extraction.

--singleton         (Optional) No filter the singleton mentions from results.

Output:

General:

  --encoding    (Optional) Set the encoding of the output file; By default utf-8.

ConLL related:

  --conll       (Optional) Output the result in ConLL format instead of KAF.
  
  --document_id (Optional) The document ID used in ConLL format.
  
  --part_id     (Optional) The part ID used in multiple part ConLL files.

KAF/NAF related:

  --linguisticParserName    (Optional) The parser name printed in KAF 
                            metadata.
  --linguisticParserVersion (Optional) The parser version printed in KAF 
                            metadata.
  --linguisticParserLayer   (Optional) The parser layer printed in KAF 
                            metadata.
  --time_stamp              (Optional) The TIMESTAMP printed in KAF 
                            metadata.

Other:

--verbose (Optional) Print more output while processing.

--help (Optional) Show corefgraph help

Multiple files

Multiple files mode, or corpus mode, can process multiple files concurrently. This mode is only usable in linux environments.

To use this mode first write a file that sets the parameters to process the files, and then use a command like this:

python -m corefgraph.process.corpus --directories /home/KAF_dir -config configfile

Parameters

The multi file processor needs two basic parameters: a list of files and/or a list of input directories, plus a list of configuration files. Both lists should at least contain one element, otherwise the processing will end with empty results.

Input files

--files       (Optional) List of files to process (if a directory is specified
              these are added to the list.
--directories (Optional) Recursive files of the directories.

--extension   (Optional) File extension (without dot) that must be
              processed. This option only works with --directories. Use '*' t
              to process every extension. This options defaults to 'txt'.

Configuration

--config The config file name. May be multiple files separated by ':' .

--extra  (Optional) A common config for all config files. May be multiple 
         files separated by ':'.

Evaluation

--evaluate (Optional) Activates the evaluation.

--report   (Optional) Activates report system.

Config file

The config file uses the same parameters as in the single file usage mode plus the following:

Additional parameters

Output file related

--result             (Optional) The extension of the result file. The 
                     file is stored next to the original file with
                     the same base name. 
--speaker_extension  (Optional) If set, the module searches for a file with 
                     the same base name plus the extension and uses
                     it as speaker file. 
                     This option is switched off by default.  
--treebank_extension (Optional) If set, the module searches for a file with 
                     the same base name plus the extension and uses it
                     as treebank file. This is used when the input KAF does 
                     not contain the constituent parsing layer. 
                     This option is switched off by default.

Evaluation parameters

--metrics           (Optional) When the evaluation parameter is on, 
                    it is possible to specify the evaluation metric used.
--output_eval_name  (Optional) When the evaluation parameter is on, 
                    it completes the results file name.
--evaluation_script (Optional) When the evaluation parameter is on, 
                    it determines the script used to evaluate.
--gold              (Optional) When the evaluation parameter is on, 
                    it determines where to find the gold standard corpus.

The following parameters are NOT AVAILABLE for its use in the configuration file:

Forbidden parameters

--file

--treebank

--speakers

When using the --conll parameter, the conll document name and part must be provided using this pattern: document_id#document_part.kaf

So these parametters are disabled:

--document_id

--part_id

Troubleshooting

Make sure you have python 2.7.1 or higher.
```
  python --version
```
If you have problems using the --user option you may consider to update pip.
```
  sudo pip install --upgrade pip
```
The python dist-package directory might be in diferent location than:
```
  /usr/local/lib/python2.7/dist-packages/
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core

core

README.md

Core Installation

Core Usage

LONG VERSION

How to install

The Easy Ride: pip installation

Long way: repository installation

Usage

Single file

Options

Multiple files

Parameters

Config file

Troubleshooting

Files

core

Directory actions

More options

Directory actions

More options

Latest commit

History

core

Folders and files

parent directory

README.md

Core Installation

Core Usage

LONG VERSION

How to install

The Easy Ride: pip installation

Long way: repository installation

Usage

Single file

Options

Multiple files

Parameters

Config file

Troubleshooting