Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with prefixes created for URIs containing %20 #801

Closed
Aleksander-Drozd opened this issue Dec 19, 2017 · 6 comments · Fixed by #1044
Closed

Problem with prefixes created for URIs containing %20 #801

Aleksander-Drozd opened this issue Dec 19, 2017 · 6 comments · Fixed by #1044
Labels
discussion enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed low priority serialization Related to serialization.
Milestone

Comments

@Aleksander-Drozd
Copy link

The result of runnig this code

from rdflib import Namespace, Graph, BNode, Literal

graph = Graph()
namespace = Namespace('http://example.org/')
graph.bind('', namespace)
node = BNode()

graph.add((node, namespace['first%20name'], Literal('John')))
print(graph.serialize(format='turtle').decode())

is

@prefix : <http://example.org/> .
@prefix ns1: <http://example.org/first%20> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[] ns1:name "John" .

I expect that triple would be displayed as

[] :first%20name "John" .
@joernhees
Copy link
Member

this is not a bug but a feature... (i agree that it's confusing, but it would break a lot of things in XML otherwise).

The reason for the behavior you see is that QNames and CURIEs restict the allowed amount of characters in the so called LocalPart a lot more than one would think. The reason is that they are intended to be valid XML attribute names. In other words: <foo :first%20name="something" /> wouldn't work, cause the % is not allowed in that place.

Here's the spec: https://www.w3.org/TR/REC-xml-names/#ns-qualnames
Clicking through you'll arrive at https://www.w3.org/TR/REC-xml/#NT-Name ...

hex(ord('%')) == '0x25' is simply not in the char ranges allowed by NameChar.

I'll close this for now, feel free to reopen.

@joernhees joernhees reopened this Dec 20, 2017
@joernhees joernhees added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed low priority serialization Related to serialization. labels Dec 20, 2017
@joernhees joernhees added this to the rdflib 5.0.0 milestone Dec 20, 2017
@joernhees
Copy link
Member

on second thought i'll re-open this as Turtle as of RDF 1.1 actually allows such chars in PrefixedName via PN_LOCAL.

I can only guess that this (as so many other issues) has a historical reason or is due to the interplay of N3, TTL, NT...

As the serialization is not really wrong (just a little more cryptic than it needs to be), i'll give this low prio... if anyone else wants to investigate the origin and how to solve this (in a backwards compatible way), feel free

@Aleksander-Drozd
Copy link
Author

Aleksander-Drozd commented Dec 20, 2017

I will only add that in test files defined for applications implementing W3C CSV to RDF conversion recommendation, there are examples containing '%', e.g. https://github.com/w3c/csvw/blob/gh-pages/tests/test009.ttl

@aayush17002
Copy link
Contributor

I would like to resolve the issue

@white-gecko
Copy link
Member

@aayush17002 you are welcome. The best is, if you create a pull-request and refer to this issue in the pull-request description. We can then discuss if we want to introduce it in 5.0.0 or a later release.

@white-gecko white-gecko modified the milestones: rdflib 5.0.0, rdflib 5.1.0 Apr 6, 2020
@white-gecko white-gecko modified the milestones: rdflib 5.1.0, rdflib 6.0.0 May 1, 2020
aayush17002 added a commit to aayush17002/rdflib that referenced this issue May 10, 2020
Added acceptance clause for "%" in Allowed name chars.
aayush17002 added a commit to aayush17002/rdflib that referenced this issue May 10, 2020
Test file to increase the scope of n amespaces.
@aayush17002
Copy link
Contributor

I have provided a solution to solve the above problem. I would like you to review the solution and revert back.

aayush17002 added a commit to aayush17002/rdflib that referenced this issue May 25, 2020
Updated test file to test issue 801
aayush17002 added a commit to aayush17002/rdflib that referenced this issue May 25, 2020
Added assertions for testing issue RDFLib#801
aayush17002 added a commit to aayush17002/rdflib that referenced this issue May 25, 2020
Removed print statement
Reformatting assert statement
aayush17002 added a commit to aayush17002/rdflib that referenced this issue May 26, 2020
nicholascar added a commit that referenced this issue Jul 30, 2020
Updating namespace.py to solve issue #801
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed low priority serialization Related to serialization.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants