Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Fix for correctly parsing closing tags #38

Closed
wants to merge 4 commits into from

Conversation

LDSpits
Copy link
Contributor

@LDSpits LDSpits commented Apr 17, 2022

Fixes #37

This pull request fixes the issue where void closing tags would not be ignored. Causing the closing tag to close the parent instead of the voided tag.

To implement this, I had to also fix the tag_raw_abrupt_stop test. Because this would not treat the eof as a valid end of a identifier. Causing the identifier parsing of the closing tag to fail.

The previous implementation of reading closing tags did not ignore voided tags.
causing the parent tag to be closed prematurely.
@LDSpits LDSpits changed the title Fix/voided closing tags Fix for correctly parsing closing tags Apr 17, 2022
@y21
Copy link
Owner

y21 commented Apr 18, 2022

Okay, I did some testing/benchmarking to see if it regresses perf and I got some very strange results.
The PR itself (with a few changes that I mentioned in my review) isn't too bad (630 µs -> 662 µs and 496 MiB/s -> 471 MiB/s) and it's probably possible to get that back but I realized that I was using an older nightly build for benchmarking (2022-01-14).

After updating to latest nightly, the benchmark performed consistently significantly worse for some reason (~12-18% slower)

rustc 1.60.0-nightly (ad46af247 2022-01-14):

tl-wikipedia.html/tl-wikipedia.html
                        time:   [629.70 us 630.39 us 631.15 us]
                        thrpt:  [495.57 MiB/s 496.17 MiB/s 496.71 MiB/s]

This PR:

tl-wikipedia.html/tl-wikipedia.html
                        time:   [662.13 us 662.90 us 663.85 us]
                        thrpt:  [471.16 MiB/s 471.83 MiB/s 472.39 MiB/s]

rustc 1.62.0-nightly (878c7833f 2022-04-16):

tl-wikipedia.html/tl-wikipedia.html
                        time:   [711.75 us 712.97 us 714.22 us]
                        thrpt:  [437.93 MiB/s 438.70 MiB/s 439.45 MiB/s]

This PR:

tl-wikipedia.html/tl-wikipedia.html
                        time:   [749.81 us 750.85 us 751.92 us]
                        thrpt:  [415.98 MiB/s 416.57 MiB/s 417.14 MiB/s]

Though this isn't really related to this PR and more an issue affecting the library as a whole.

src/parser/base.rs Outdated Show resolved Hide resolved
src/parser/base.rs Outdated Show resolved Hide resolved
…l spec.

According to the W3C html spec, having whitespace after/before the '/' for the closing tag is not valid.
Since we assume we are parsing well-formed html docs we skip any seeking to the tag name.
@LDSpits
Copy link
Contributor Author

LDSpits commented Apr 18, 2022

Okay, I did some testing/benchmarking to see if it regresses perf and I got some very strange results.
I'm not sure what would cause such a huge regression. Maybe just unlucky nightly build?

I guess the only way to test this would be to compare the version in this pull request with the version on crates.io. Then we can narrow down if this pull request is the culprit. I would not expect such a huge hit as well.

@y21
Copy link
Owner

y21 commented Apr 18, 2022

Then we can narrow down if this pull request is the culprit.

Nah, I think this PR is fine. I think my previous comment was a bit unclear and it's not related to this PR at all. I'd understand if this PR makes it a little slower because it has to do more work, but what confused me was that the same tl version that is on crates.io consistently runs around 15% slower on newer nightly builds compared to a nightly build from january. I'll try to track it down some day to see in which nightly build this started happening. Probably deserves its own tracking issue too.

I actually managed to get the perf of this fix "roughly" on par with the version on crates.io by changing one thing (will add a review for this), which surprised me, considering it checks the name of every tag in the average case, but I guess you really can't predict perf

src/parser/base.rs Outdated Show resolved Hide resolved
src/parser/base.rs Outdated Show resolved Hide resolved
Copy link
Owner

@y21 y21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for reporting the bug and the fix!
bors r+

bors bot pushed a commit that referenced this pull request Apr 18, 2022
Fixes #37 

This pull request fixes the issue where void closing tags would not be ignored. Causing the closing tag to close the parent instead of the voided tag.

To implement this, I had to also fix the ``tag_raw_abrupt_stop`` test. Because this would not treat the ``eof`` as a valid end of a identifier. Causing the identifier parsing of the closing tag to fail.
@LDSpits
Copy link
Contributor Author

LDSpits commented Apr 18, 2022

No problem! helping fixing bugs let's me learn as well.

@bors
Copy link
Contributor

bors bot commented Apr 18, 2022

Pull request successfully merged into master.

Build succeeded:

@bors bors bot changed the title Fix for correctly parsing closing tags [Merged by Bors] - Fix for correctly parsing closing tags Apr 18, 2022
@bors bors bot closed this Apr 18, 2022
@y21
Copy link
Owner

y21 commented Apr 18, 2022

published a new release (0.7.5) to crates.io with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tl does not properly handle "void" closing tags.
2 participants