Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ’« Port master changes over to develop #2979

Merged
merged 136 commits into from
Nov 29, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
db2c2b2
Create aryaprabhudesai.md (#2681)
aryaprabhudesai Aug 20, 2018
ca747f5
Update _install.jade (#2688)
Aug 22, 2018
559f413
Add FAC to spacy.explain (resolves #2706)
ines Aug 26, 2018
e9022f7
Remove docstrings for deprecated arguments (see #2703)
ines Aug 26, 2018
2684987
When calling getoption() in conftest.py, pass a default option (#2709)
njsmith Sep 3, 2018
4530ddc
update bengali token rules for hyphen and digits (#2731)
aniruddha-adhikary Sep 5, 2018
bdb2165
Less norm computations in token similarity (#2730)
Sep 5, 2018
cebe50b
Remove ')' for clarity (#2737)
mbkupfer Sep 10, 2018
97e2874
added contributor agreement for mbkupfer (#2738)
mbkupfer Sep 10, 2018
77139bc
Basic support for Telugu language (#2751)
sainathadapa Sep 10, 2018
476472d
Lex _attrs for polish language (#2750)
tyburam Sep 10, 2018
aeba99a
Introduces a bulk merge function, in order to solve issue #653 (#2696)
grivaz Sep 10, 2018
885691a
Describe converters more explicitly (see #2643)
ines Sep 12, 2018
907df53
Add multi-threading note to Language.pipe (resolves #2582) [ci skip]
ines Sep 12, 2018
0729d1e
Fix formatting
ines Sep 12, 2018
4e89cfa
Fix dependency scheme docs (closes #2705) [ci skip]
ines Sep 12, 2018
5001d31
Don't set stop word in example (closes #2657) [ci skip]
ines Sep 12, 2018
fe51508
Add words to portuguese language _num_words (#2759)
filipecaixeta Sep 14, 2018
81564cc
Update Indonesian model (#2752)
aongko Sep 14, 2018
2d15859
Fixed spaCy+Keras example (#2763)
free-variation Sep 15, 2018
68b3c54
Adding French hyphenated first name (#2786)
mauryaland Sep 21, 2018
3c4e3ad
Fix typo (closes #2784)
ines Sep 21, 2018
9fd27d7
Fix typo (#2795) [ci skip]
pmj642 Sep 25, 2018
9a016d1
Adding basic support for Sinhala language. (#2788)
keshan Sep 25, 2018
70f4e8a
Also include lowercase norm exceptions
ines Sep 25, 2018
5e0dfb3
Merge branch 'master' of https://github.com/explosion/spaCy
ines Sep 26, 2018
8227566
Fix error (#2802)
darindf Sep 26, 2018
94ad3c5
Add charlax's contributor agreement (#2805)
charlax Sep 27, 2018
966b583
agreement of contributor, may I introduce a tiny pl languge contribut…
phojnacki Sep 27, 2018
014dd47
Add jupyter=True to displacy.render in documentation (#2806)
charlax Sep 27, 2018
71cdbea
Revert "Also include lowercase norm exceptions"
ines Sep 27, 2018
bae6b3e
Merge branch 'master' of https://github.com/explosion/spaCy
honnibal Sep 27, 2018
8809dc4
Remove deprecated encoding argument to msgpack
honnibal Sep 27, 2018
bbdc645
Set up dependency tree pattern matching skeleton (#2732)
skrcode Sep 27, 2018
96fe314
Fix bug when too many entity types. Fixes #2800
honnibal Sep 27, 2018
7277837
Merge branch 'master' of https://github.com/explosion/spaCy
honnibal Sep 27, 2018
2ac69fa
Fix Python 2 test failure
honnibal Sep 27, 2018
276aa83
Require older msgpack-numpy
honnibal Sep 27, 2018
6430b1f
Restore encoding arg on msgpack-numpy
honnibal Sep 27, 2018
05b6103
Try to fix version pin for msgpack-numpy
honnibal Sep 28, 2018
6c498f9
Update Portuguese Language (#2790)
filipecaixeta Sep 29, 2018
405a826
Correct error in spacy universe docs concerning spacy-lookup (#2814)
giannisdaras Oct 1, 2018
9faea3f
Update Keras Example for (Parikh et al, 2016) implementation (#2803)
free-variation Oct 1, 2018
7806dec
Fix typo (closes #2815) [ci skip]
ines Oct 1, 2018
9937ff9
Update regex version dependency
honnibal Oct 2, 2018
40f228c
Set version to 2.0.13.dev3
honnibal Oct 2, 2018
9e4079d
Merge branch 'master' of https://github.com/explosion/spaCy
honnibal Oct 2, 2018
6afc6ff
Skip seemingly problematic test
honnibal Oct 2, 2018
bdebbef
Remove problematic test
honnibal Oct 2, 2018
e4fd2cc
Try previous version of regex
honnibal Oct 2, 2018
4cf5ce2
Revert "Remove problematic test"
honnibal Oct 2, 2018
67ddce6
Unskip test
honnibal Oct 2, 2018
f784e42
Try older version of regex
honnibal Oct 2, 2018
4cd9ec0
πŸ’« Update training examples and use minibatching (#2830)
ines Oct 9, 2018
42c4237
Visual C++ link updated (#2842) (closes #2841) [ci skip]
jacopofar Oct 12, 2018
b76fe08
Correcting lang/ru/examples.py (#2845)
Cinnamy Oct 13, 2018
c3ddf98
Set version to 2.0.13.dev4
honnibal Oct 13, 2018
74a30d8
Add Persian(Farsi) language support (#2797)
JKhakpour Oct 13, 2018
cb57b35
Also include lowercase norm exceptions
ines Oct 13, 2018
fa23be0
Remove in favour of https://github.com/explosion/spaCy/graphs/contrib…
ines Oct 13, 2018
de46286
Merge branch 'master' of https://github.com/explosion/spaCy
honnibal Oct 13, 2018
36514b5
Rule-based French Lemmatizer (#2818)
mauryaland Oct 13, 2018
6a6ae5b
Merge branch 'master' of https://github.com/explosion/spaCy
honnibal Oct 13, 2018
9cfab59
Set version to 2.0.13
honnibal Oct 13, 2018
f0e7da6
Fix formatting and consistency
ines Oct 13, 2018
23d5b4f
Update docs for new version [ci skip]
ines Oct 13, 2018
30aa7f8
Increment version [ci skip]
ines Oct 13, 2018
ac4cadd
Add info on wheels [ci skip]
ines Oct 13, 2018
cb075c8
Adding "This is a sentence" example to Sinhala (#2846)
keshan Oct 13, 2018
8f393b1
Add wheels badge
ines Oct 13, 2018
3decf44
Update badge [ci skip]
ines Oct 13, 2018
76c4338
Update README.rst [ci skip]
ines Oct 13, 2018
7de0dcb
Merge branch 'master' of https://github.com/explosion/spaCy
honnibal Oct 14, 2018
2e675d9
Update murmurhash pin
ines Oct 14, 2018
295da0f
Increment version to 2.0.14.dev0
ines Oct 14, 2018
5a4c5b7
Update GPU docs for v2.0.14
ines Oct 14, 2018
9ebe607
Add wheel to setup_requires
ines Oct 14, 2018
62c70b3
Import prefer_gpu and require_gpu functions from Thinc
honnibal Oct 14, 2018
91593b7
Add tests for prefer_gpu() and require_gpu()
honnibal Oct 14, 2018
6e6f6be
Update requirements and setup.py
honnibal Oct 14, 2018
38aa835
Workaround bug in thinc require_gpu
honnibal Oct 14, 2018
41adf35
Set version to v2.0.14
honnibal Oct 14, 2018
2ad3a4e
Update push-tag script
honnibal Oct 14, 2018
8ccfa52
Unhack prefer_gpu
honnibal Oct 14, 2018
b305b24
Require thinc 6.10.6
honnibal Oct 14, 2018
f02bb08
Update prefer_gpu and require_gpu docs [ci skip]
ines Oct 14, 2018
7202abd
Fix specifiers for GPU
honnibal Oct 14, 2018
d6e9cf8
Set version to 2.0.14.dev1
honnibal Oct 14, 2018
8612b75
Set version to 2.0.14
honnibal Oct 14, 2018
051a6b7
Update Thinc version pin
ines Oct 14, 2018
7bc7fa8
Increment version
ines Oct 14, 2018
fd750ec
Fix msgpack-numpy version pin
ines Oct 15, 2018
a0f6647
Increment version
ines Oct 15, 2018
48b1bc4
Update version to 2.0.16
ines Oct 15, 2018
c6a320c
Update version [ci skip]
ines Oct 15, 2018
5766d09
Redundant ')' in the Stop words' example (#2856)
digest0r Oct 18, 2018
0717894
Documentation improvement regarding joblib and SO (#2867)
Oct 24, 2018
57f274b
raise error when setting overlapping entities as doc.ents (#2880)
grivaz Oct 26, 2018
ad068f5
Fix out-of-bounds access in NER training
honnibal Oct 26, 2018
9447739
Merge branch 'master' of https://github.com/explosion/spaCy
honnibal Oct 26, 2018
2d2765f
Change PyThaiNLP Url (#2876)
wannaphong Oct 27, 2018
b2e2bba
Fix missing comma
honnibal Oct 27, 2018
5a4aeb9
Add example showing a fix-up rule for space entities
honnibal Oct 28, 2018
d4fa9af
Set version to 2.0.17.dev0
honnibal Oct 28, 2018
62358dd
Update regex version
honnibal Oct 28, 2018
a2745d3
Revert "Update regex version"
honnibal Oct 28, 2018
e2ae25d
Try setting older regex version, to align with conda
honnibal Oct 29, 2018
db08b16
Set version to 2.0.17
honnibal Oct 29, 2018
c235ddf
Add spacy-js to universe [ci-skip]
ines Nov 6, 2018
a9fda63
Add spacy-raspberry to universe (closes #2889)
ines Nov 6, 2018
11db4d2
Add script to validate universe json [ci skip]
ines Nov 6, 2018
75e7d50
Removed space in docs + added contributor indo (#2909)
mikelibg Nov 8, 2018
d3d419e
Allow input text of length up to max_length, inclusive (#2922)
danielhers Nov 13, 2018
be99f1c
Include universe spec for spacy-wordnet component (#2919)
frascuchon Nov 13, 2018
1aa91e9
Minor formatting changes [ci skip]
ines Nov 13, 2018
dfcc8f0
Fix image [ci skip]
ines Nov 14, 2018
87ce435
Check if the word is in one of the regular lists specific to each POS…
mauryaland Nov 14, 2018
02fc73c
πŸ’« Create random IDs for SVGs to prevent ID clashes (#2927)
ines Nov 15, 2018
696acb0
Fix typo [ci skip]
ines Nov 24, 2018
7601ae0
fixes symbolic link on py3 and windows (#2949)
cicorias Nov 24, 2018
048416f
Fix formatting
ines Nov 26, 2018
1844bc2
Update universe [ci skip]
ines Nov 26, 2018
98fe1ab
Catalan Language Support (#2940)
mpuig Nov 26, 2018
c80c20e
Sort languages alphabetically [ci skip]
ines Nov 26, 2018
968aff2
Update tests for pytest 4.x (#2965)
ines Nov 26, 2018
9e2ff2f
Fix regex pin to harmonize with conda (#2964)
honnibal Nov 26, 2018
58757c5
Update README.rst
ines Nov 26, 2018
0056694
Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
ALSchwalm Nov 28, 2018
0a872ef
Merge branch 'master' into develop
ines Nov 29, 2018
7c547e8
Fix typo
ines Nov 29, 2018
89005a1
Fix typo
ines Nov 29, 2018
7c0725a
Remove duplicate file
ines Nov 29, 2018
d260626
Require thinc 7.0.0.dev2
ines Nov 29, 2018
b468bdb
Add missing import
ines Nov 29, 2018
925f28d
Fix error IDs
ines Nov 29, 2018
2aee46a
Fix tests
ines Nov 29, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Set up dependency tree pattern matching skeleton (#2732)
  • Loading branch information
skrcode authored and honnibal committed Sep 27, 2018
commit bbdc6456c61c079dc4a6f0f7da8379890262c5d8
244 changes: 239 additions & 5 deletions spacy/matcher.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ from .attrs import FLAG37 as L8_ENT
from .attrs import FLAG36 as L9_ENT
from .attrs import FLAG35 as L10_ENT

DELIMITER = '||'

cpdef enum quantifier_t:
_META
Expand All @@ -66,7 +67,7 @@ cdef enum action_t:
ACCEPT_PREV
PANIC

# A "match expression" conists of one or more token patterns
# A "match expression" consists of one or more token patterns
# Each token pattern consists of a quantifier and 0+ (attr, value) pairs.
# A state is an (int, pattern pointer) pair, where the int is the start
# position, and the pattern pointer shows where we're up to
Expand All @@ -76,16 +77,16 @@ cdef struct AttrValueC:
attr_id_t attr
attr_t value


cdef struct TokenPatternC:
AttrValueC* attrs
int32_t nr_attr
quantifier_t quantifier


ctypedef TokenPatternC* TokenPatternC_ptr
ctypedef pair[int, TokenPatternC_ptr] StateC

DEF PADDING = 5


cdef TokenPatternC* init_pattern(Pool mem, attr_t entity_id,
object token_specs) except NULL:
Expand All @@ -105,7 +106,6 @@ cdef TokenPatternC* init_pattern(Pool mem, attr_t entity_id,
pattern[i].nr_attr = 0
return pattern


cdef attr_t get_pattern_key(const TokenPatternC* pattern) except 0:
while pattern.nr_attr != 0:
pattern += 1
Expand Down Expand Up @@ -262,7 +262,7 @@ cdef class Matcher:

key (unicode): The match ID.
on_match (callable): Callback executed on match.
*patterns (list): List of token descritions.
*patterns (list): List of token descriptions.
"""
for pattern in patterns:
if len(pattern) == 0:
Expand Down Expand Up @@ -526,6 +526,7 @@ cdef class PhraseMatcher:
self.phrase_ids.set(phrase_hash, <void*>ent_id)

def __call__(self, Doc doc):

"""Find all sequences matching the supplied patterns on the `Doc`.

doc (Doc): The document to match over.
Expand Down Expand Up @@ -573,3 +574,236 @@ cdef class PhraseMatcher:
return None
else:
return ent_id

cdef class DependencyTreeMatcher:
"""Match dependency parse tree based on pattern rules."""
cdef Pool mem
cdef readonly Vocab vocab
cdef readonly Matcher token_matcher
cdef public object _patterns
cdef public object _keys_to_token
cdef public object _root
cdef public object _entities
cdef public object _callbacks
cdef public object _nodes
cdef public object _tree

def __init__(self, vocab):
"""Create the DependencyTreeMatcher.

vocab (Vocab): The vocabulary object, which must be shared with the
documents the matcher will operate on.
RETURNS (DependencyTreeMatcher): The newly constructed object.
"""
size = 20
self.token_matcher = Matcher(vocab)
self._keys_to_token = {}
self._patterns = {}
self._root = {}
self._nodes = {}
self._tree = {}
self._entities = {}
self._callbacks = {}
self.vocab = vocab
self.mem = Pool()

def __reduce__(self):
data = (self.vocab, self._patterns,self._tree, self._callbacks)
return (unpickle_matcher, data, None, None)

def __len__(self):
"""Get the number of rules, which are edges ,added to the dependency tree matcher.

RETURNS (int): The number of rules.
"""
return len(self._patterns)

def __contains__(self, key):
"""Check whether the matcher contains rules for a match ID.

key (unicode): The match ID.
RETURNS (bool): Whether the matcher contains rules for this match ID.
"""
return self._normalize_key(key) in self._patterns


def add(self, key, on_match, *patterns):

# TODO : validations
# 1. check if input pattern is connected
# 2. check if pattern format is correct
# 3. check if atleast one root node is present
# 4. check if node names are not repeated
# 5. check if each node has only one head

for pattern in patterns:
if len(pattern) == 0:
raise ValueError(Errors.E012.format(key=key))

key = self._normalize_key(key)

_patterns = []
for pattern in patterns:
token_patterns = []
for i in range(len(pattern)):
token_pattern = [pattern[i]['PATTERN']]
token_patterns.append(token_pattern)
# self.patterns.append(token_patterns)
_patterns.append(token_patterns)

self._patterns.setdefault(key, [])
self._callbacks[key] = on_match
self._patterns[key].extend(_patterns)

# Add each node pattern of all the input patterns individually to the matcher.
# This enables only a single instance of Matcher to be used.
# Multiple adds are required to track each node pattern.
_keys_to_token_list = []
for i in range(len(_patterns)):
_keys_to_token = {}
# TODO : Better ways to hash edges in pattern?
for j in range(len(_patterns[i])):
k = self._normalize_key(unicode(key)+DELIMITER+unicode(i)+DELIMITER+unicode(j))
self.token_matcher.add(k,None,_patterns[i][j])
_keys_to_token[k] = j
_keys_to_token_list.append(_keys_to_token)

self._keys_to_token.setdefault(key, [])
self._keys_to_token[key].extend(_keys_to_token_list)

_nodes_list = []
for pattern in patterns:
nodes = {}
for i in range(len(pattern)):
nodes[pattern[i]['SPEC']['NODE_NAME']]=i
_nodes_list.append(nodes)

self._nodes.setdefault(key, [])
self._nodes[key].extend(_nodes_list)

# Create an object tree to traverse later on.
# This datastructure enable easy tree pattern match.
# Doc-Token based tree cannot be reused since it is memory heavy and tightly coupled with doc
self.retrieve_tree(patterns,_nodes_list,key)

def retrieve_tree(self,patterns,_nodes_list,key):

_heads_list = []
_root_list = []
for i in range(len(patterns)):
heads = {}
root = -1
for j in range(len(patterns[i])):
token_pattern = patterns[i][j]
if('NBOR_RELOP' not in token_pattern['SPEC']):
heads[j] = j
root = j
else:
# TODO: Add semgrex rules
# 1. >
if(token_pattern['SPEC']['NBOR_RELOP'] == '>'):
heads[j] = _nodes_list[i][token_pattern['SPEC']['NBOR_NAME']]
# 2. <
if(token_pattern['SPEC']['NBOR_RELOP'] == '<'):
heads[_nodes_list[i][token_pattern['SPEC']['NBOR_NAME']]] = j

_heads_list.append(heads)
_root_list.append(root)

_tree_list = []
for i in range(len(patterns)):
tree = {}
for j in range(len(patterns[i])):
if(j == _heads_list[i][j]):
continue
head = _heads_list[i][j]
if(head not in tree):
tree[head] = []
tree[head].append(j)
_tree_list.append(tree)

self._tree.setdefault(key, [])
self._tree[key].extend(_tree_list)

self._root.setdefault(key, [])
self._root[key].extend(_root_list)

def has_key(self, key):
"""Check whether the matcher has a rule with a given key.

key (string or int): The key to check.
RETURNS (bool): Whether the matcher has the rule.
"""
key = self._normalize_key(key)
return key in self._patterns

def get(self, key, default=None):
"""Retrieve the pattern stored for a key.

key (unicode or int): The key to retrieve.
RETURNS (tuple): The rule, as an (on_match, patterns) tuple.
"""
key = self._normalize_key(key)
if key not in self._patterns:
return default
return (self._callbacks[key], self._patterns[key])

def __call__(self, Doc doc):
matched_trees = []

matches = self.token_matcher(doc)
for key in list(self._patterns.keys()):
_patterns_list = self._patterns[key]
_keys_to_token_list = self._keys_to_token[key]
_root_list = self._root[key]
_tree_list = self._tree[key]
_nodes_list = self._nodes[key]
length = len(_patterns_list)
for i in range(length):
_keys_to_token = _keys_to_token_list[i]
_root = _root_list[i]
_tree = _tree_list[i]
_nodes = _nodes_list[i]

id_to_position = {}

# This could be taken outside to improve running time..?
for match_id, start, end in matches:
if match_id in _keys_to_token:
if _keys_to_token[match_id] not in id_to_position:
id_to_position[_keys_to_token[match_id]] = []
id_to_position[_keys_to_token[match_id]].append(start)

length = len(_nodes)
if _root in id_to_position:
candidates = id_to_position[_root]
for candidate in candidates:
isVisited = {}
self.dfs(candidate,_root,_tree,id_to_position,doc,isVisited)
# to check if the subtree pattern is completely identified
if(len(isVisited) == length):
matched_trees.append((key,list(isVisited)))

for i, (ent_id, nodes) in enumerate(matched_trees):
on_match = self._callbacks.get(ent_id)
if on_match is not None:
on_match(self, doc, i, matches)

return matched_trees

def dfs(self,candidate,root,tree,id_to_position,doc,isVisited):
if(root in id_to_position and candidate in id_to_position[root]):
# color the node since it is valid
isVisited[candidate] = True
candidate_children = doc[candidate].children
for candidate_child in candidate_children:
if root in tree:
for root_child in tree[root]:
self.dfs(candidate_child.i,root_child,tree,id_to_position,doc,isVisited)


def _normalize_key(self, key):
if isinstance(key, basestring):
return self.vocab.strings.add(key)
else:
return key
51 changes: 48 additions & 3 deletions spacy/tests/test_matcher.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals

from ..matcher import Matcher, PhraseMatcher
from numpy import sort

from ..matcher import Matcher, PhraseMatcher, DependencyTreeMatcher
from .util import get_doc
from ..tokens import Doc

import pytest

import re

@pytest.fixture
def matcher(en_vocab):
Expand All @@ -20,7 +22,6 @@ def matcher(en_vocab):
matcher.add(key, None, *patterns)
return matcher


def test_matcher_from_api_docs(en_vocab):
matcher = Matcher(en_vocab)
pattern = [{'ORTH': 'test'}]
Expand Down Expand Up @@ -258,3 +259,47 @@ def test_matcher_end_zero_plus(matcher):
assert len(matcher(nlp(u'a b c'))) == 1
assert len(matcher(nlp(u'a b b c'))) == 1
assert len(matcher(nlp(u'a b b'))) == 1


@pytest.fixture
def text():
return u"The quick brown fox jumped over the lazy fox"

@pytest.fixture
def heads():
return [3,2,1,1,0,-1,2,1,-3]

@pytest.fixture
def deps():
return ['det', 'amod', 'amod', 'nsubj', 'prep', 'pobj', 'det', 'amod']

@pytest.fixture
def dependency_tree_matcher(en_vocab):
is_brown_yellow = lambda text: bool(re.compile(r'brown|yellow|over').match(text))
IS_BROWN_YELLOW = en_vocab.add_flag(is_brown_yellow)
pattern1 = [
{'SPEC': {'NODE_NAME': 'fox'}, 'PATTERN': {'ORTH': 'fox'}},
{'SPEC': {'NODE_NAME': 'q', 'NBOR_RELOP': '>', 'NBOR_NAME': 'fox'},'PATTERN': {'LOWER': u'quick'}},
{'SPEC': {'NODE_NAME': 'r', 'NBOR_RELOP': '>', 'NBOR_NAME': 'fox'}, 'PATTERN': {IS_BROWN_YELLOW: True}}
]

pattern2 = [
{'SPEC': {'NODE_NAME': 'jumped'}, 'PATTERN': {'ORTH': 'jumped'}},
{'SPEC': {'NODE_NAME': 'fox', 'NBOR_RELOP': '>', 'NBOR_NAME': 'jumped'},'PATTERN': {'LOWER': u'fox'}},
{'SPEC': {'NODE_NAME': 'over', 'NBOR_RELOP': '>', 'NBOR_NAME': 'fox'}, 'PATTERN': {IS_BROWN_YELLOW: True}}
]
matcher = DependencyTreeMatcher(en_vocab)
matcher.add('pattern1', None, pattern1)
matcher.add('pattern2', None, pattern2)
return matcher



def test_dependency_tree_matcher_compile(dependency_tree_matcher):
assert len(dependency_tree_matcher) == 2

def test_dependency_tree_matcher(dependency_tree_matcher,text,heads,deps):
doc = get_doc(dependency_tree_matcher.vocab,text.split(),heads=heads,deps=deps)
matches = dependency_tree_matcher(doc)
assert len(matches) == 2