forked from google/gumbo-parser
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tags: Fix issues in the perfect hash implementation
The result of the perfect hash for tag names was not being properly compared against the candidate string, as no length checks were being taken into account. Hence, looking up `t` would match `textarea` because both strings hashed to the same value (by coincidence), and yet only the first byte of the strings was being compared for a full match. The issue has been fixed by adding a table of tag lengths, which will additionally speed up the rejection of invalid strings. In order to simplify the generation of the several automated Tag tables, and to remove the depedency on `sed` from the Makefile, a simple `gentags.py` has been added to the codebase. Running `make gentags` will generate all the tables **and** the perfect hash function, assuming that Python and the correct version of MPH is in the path. An updated version of mph has been pushed to the following repository: https://github.com/vmg/mph It contains all the changes required to generate case-insensitive hashes just like the ones used in the library, with no further modification to the hash output. Conflicts: src/tag.c
- Loading branch information
Showing
5 changed files
with
52 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
import sys | ||
|
||
tag_strings = open("src/tag_strings.h", "w") | ||
tag_enum = open("src/tag_enum.h", "w") | ||
tag_sizes = open("src/tag_sizes.h", "w") | ||
|
||
tag_py = open("python/gumbo/gumboc_tags.py", "w") | ||
tag_py.write('TagNames = [\n') | ||
|
||
tagfile = open(sys.argv[1]) | ||
|
||
for tag in tagfile: | ||
tag = tag.strip() | ||
tag_upper = tag.upper().replace('-', '_') | ||
tag_strings.write('"%s",\n' % tag) | ||
tag_enum.write('GUMBO_TAG_%s,\n' % tag_upper) | ||
tag_sizes.write('%d, ' % len(tag)) | ||
tag_py.write('\t"%s",\n' % tag_upper) | ||
|
||
tagfile.close() | ||
|
||
tag_strings.close() | ||
tag_enum.close() | ||
tag_sizes.close() | ||
|
||
tag_py.write(']\n') | ||
tag_py.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
4, 4, 5, 4, 4, 4, 5, 6, 8, 8, 4, 7, 7, 3, 5, 2, 2, 2, 2, 2, 2, 6, 6, 6, 7, 1, 2, 3, 10, 2, 2, 2, 2, 2, 2, 6, 10, 4, 3, 1, 2, 6, 5, 1, 4, 1, 3, 4, 4, 4, 4, 3, 4, 3, 3, 3, 1, 1, 1, 4, 4, 2, 2, 3, 3, 4, 2, 3, 3, 3, 5, 3, 6, 5, 6, 5, 5, 5, 6, 5, 6, 3, 4, 4, 2, 2, 2, 2, 5, 6, 10, 14, 3, 13, 4, 5, 7, 8, 3, 5, 5, 5, 2, 2, 2, 4, 8, 6, 5, 5, 6, 6, 8, 8, 6, 8, 6, 6, 8, 5, 7, 7, 4, 8, 6, 7, 7, 3, 5, 8, 8, 7, 7, 3, 6, 7, 9, 2, 6, 8, 3, 5, 6, 4, 7, 8, 4, 6, 2, 3, |