-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help to handle brew livecheck from a non-UTF8 web page #11498
Comments
@samford or @nandahkrishna may be able to help here. We'd accept a PR for this. |
I would be interested to hear whether this issue still occurs now that #10834 is merged. This change is part of the most recent version of |
@samford Really good news with this merge.
Thank you very much. |
I should clarify that Would you be willing to link me to the page you're using (so I can use it for testing)? If the answer's no (completely understandable), I can probably come up with something but it would save me the trouble. |
Oh a single replacement character, I understand. |
This issue is basically what I expected: the page content provided by Usually we could use the What makes this tricky is that Looking to the future, I think we'll have to do two things:
It would be good to address this issue at some point but I'm currently busy with other livecheck work and this isn't a pressing issue from the standpoint of homebrew/core and homebrew/cask. We tend to only use ASCII characters in regexes, so the current If you're invested in this and want to create a PR, the first item on the list above (automatic re-encoding) is feasible right now but the second item can't be implemented until after I create a PR for my work to allow configuration options in Otherwise, if the current setup is generally fine for your use case, you can technically match words that use accented characters by replacing the accented character with a dot, which matches anything (e.g., |
Hello @samford what a nice investigation!
Good catch.
Interesting play with the character encoding 😎
Yes. Just for the try, I already tested this with success. But as said above, I could set a regex with ASCII-only, so your recent merge in the code solved my main problem. |
This seems like the preferable option to me 👍🏻 |
This seems like a pretty niche issue, a misconfigured server and something that's better suited to a PR for any further discussion. Sorry! |
Provide a detailed description of the proposed feature
I'm not really proposing a feature or an evolution of
brew livecheck
: I'm rather looking for a workaround.It's about processing (with regex) a web page not encoded in UTF-8 but latin1.
What is the motivation for the feature?
I try to write a
livecheck
block for my formula, using:page_match
strategy. (That I use without problem in other own formulae)Unfortunately, for this case, the upstream web page is very old (2005 😉) and the HTML encoding is not in UTF-8.
The HTML code clearly declares using
iso-8859-1
(latin1) as encoding in the first line of the HTML:Then,
brew livecheck
fails with the following error message:invalid byte sequence in UTF-8
which is quite true, since these are accented French letters in the encoding mentioned by the source file: iso-8859-1
:page_match
strategy to process the content all the same? (my regex tagets only simple ASCII characters)brew livecheck
to follow the defined encoding?Suggestions are welcome! (other that asking the upstream to rewrite their web page 😉)
How will the feature be relevant to at least 90% of Homebrew users?
Not sure that it is useful for 90% of Homebrew users.
That's why I'm rather looking for a workaround.
Regards.
What alternatives to the feature have been considered?
The text was updated successfully, but these errors were encountered: