Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tl does not properly handle "void" closing tags. #37

Closed
LDSpits opened this issue Apr 17, 2022 · 2 comments
Closed

tl does not properly handle "void" closing tags. #37

LDSpits opened this issue Apr 17, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@LDSpits
Copy link
Contributor

LDSpits commented Apr 17, 2022

When I was attempting to parse a html page. I received a puzzling result from the parser. where the head tag would terminate early on a child tag. And the closing html tag would be omitted.

I have found that this is due to how "void" tags are ignored. And would like to get your opinion on my deduction of the problem.

let's take the following sample html:

<html>
    <head>
        <base href='single_quoted_item'></base>
        <link rel="stylesheet" type="text/css" href="non-exising"/>
    </head>
</html>

the output of the parser currently is

<html>
    <head>
        <base href="single_quoted_item"></base></head>
        <link rel="stylesheet" href="non-exising" type="text/css"></link>
    </html>

as you can see, the head closing tag is abnormally placed after the base closing tag.

after some debugging with tests i found the issue in the parser code, the tag parser properly ignores start tags. but if you hit a "void" closing tag it just treats the previously found tag as the closed tag. causing it to terminate the parent tag early.

I have written a fix for this by parsing the closing tag, and then not pushing the result if the closing tag is one of the "void" tags. Would you accept a pull request for this bug?

@y21
Copy link
Owner

y21 commented Apr 17, 2022

Ah right, good catch. I suspect this might affect performance quite a lot if we always have to check if the closing tag is a void tag every time (although I could be wrong).
I wonder if we can simply check that the last tag name in the stack matches with the current one being parsed (one string comparison as opposed to comparing it to all 15 strings that we would have to do). In your example, when the parser sees the closing tag </base> , it would notice that the last tag in the stack is <head> and not <base>, and just not pop it.
I have some more ideas for making things in the parser faster so it's not the end of the world if we lose a bit of performance here because of this.

@LDSpits
Copy link
Contributor Author

LDSpits commented Apr 17, 2022

the only case that wouldn't work is nested "voided" tags. however, the tags are never pushed to the parser stack. So there is no chance of them becoming nested as far as I'm aware. This does make me question how they become visible in the output. Are they treated as raw tags?

I'll adjust my fix to use the method you just mentioned and I'll open a pull request.

@y21 y21 added the bug Something isn't working label Apr 18, 2022
bors bot pushed a commit that referenced this issue Apr 18, 2022
Fixes #37 

This pull request fixes the issue where void closing tags would not be ignored. Causing the closing tag to close the parent instead of the voided tag.

To implement this, I had to also fix the ``tag_raw_abrupt_stop`` test. Because this would not treat the ``eof`` as a valid end of a identifier. Causing the identifier parsing of the closing tag to fail.
@y21 y21 closed this as completed Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants