Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signing named graphs #682

Open
LibrEars opened this issue Jan 12, 2017 · 8 comments
Open

Signing named graphs #682

LibrEars opened this issue Jan 12, 2017 · 8 comments
Labels
bug Something isn't working id-as-cntxt tracking related issues in-resolution

Comments

@LibrEars
Copy link

LibrEars commented Jan 12, 2017

Hi all,

I would like to store my experiment data with RDFLib. Moreover I want to sign the data of every single experiment to make it sharable for the future. My approach is to use one named graph for each experiment, hash this graph and sign the hash.

It seems that I don't really understand the concept of isomorphic graphs. Why is the hash of the isomorphic named graph the same as the hash of its isomorphic conjunctive graph?

Should I sign the _TripleCanonicalizer(gmary).to_hash() instead or do I run in problems with blank nodes with this approach?

Here is some code to clarify what I want to do:

from rdflib import Namespace, Literal, URIRef, BNode
from rdflib.graph import Graph, ConjunctiveGraph
from rdflib.plugins.memory import IOMemory

from rdflib.compare import to_isomorphic, _TripleCanonicalizer


ns = Namespace("http://love.com#")

mary = BNode()
john = URIRef("http://love.com/lovers/john#")

cmary=URIRef("http://love.com/lovers/mary#")
cjohn=URIRef("http://love.com/lovers/john#")

store = IOMemory()

g = ConjunctiveGraph(store=store)
g.bind("love",ns)

gmary = Graph(store=store, identifier=cmary)

gmary.add((mary, ns['hasName'], Literal("Mary")))
gmary.add((mary, ns['loves'], john))

gjohn = Graph(store=store, identifier=cjohn)
gjohn.add((john, ns['hasName'], Literal("John")))

print("The internal hash of an named graph is the same as the internal hash of the Conjunctive graph: " +
      str(to_isomorphic(g).internal_hash() == to_isomorphic(gmary).internal_hash()) + "\n")

# Prints to 'True'

print("The internal hash of an named graph is the same as the internal hash of the Conjunctive graph: " +
      str(_TripleCanonicalizer(g).to_hash() == _TripleCanonicalizer(gmary).to_hash()))

# Prints to 'False'


# Example how I think to proove the signature of signed graphs:
for h in g.objects(mary_public_keys, wot.signed):
    # First verify the signature
    if gpg.verify(str(h)):
        # Second compare hash
        sigHash = gpg.decrypt(h).data.decode("utf-8").strip() # gpg.decrypt(h): bytes --> string
        
        identifier = str(g.value(h, RDFS.label))
        signedG = g.get_context(identifier)
        realHash = str(to_isomorphic(signedG).internal_hash())  # Gives the wrong hash?
                                                                # (Whath happens with two existing equal identifiers/contexts?)
        print(sigHash)
        print(realHash)
        
        if sigHash == realHash:
            print("Graph verified")
        
        else:
            print("Signature verified but graph has changed")
    
    else:
        print("Signature verification failed")

Cheers,
LibrEars

@nicholascar
Copy link
Member

I'm interested in this too. In the past (early rdflib days) I implemented my own graph hasher and wrote my own code to serialise the graph with deterministic blank node names. I'll be happy to see an answer here too!

@joernhees
Copy link
Member

i briefly looked into this before, but maybe @jimmccusker could have a look...

seems to be a bug in to_isomorphic:

In [5]: list(g)
Out[5]:
[(rdflib.term.BNode('N387ffec6cfcf427499ff7c3a00db24dc'),
  rdflib.term.URIRef(u'http://love.com#loves'),
  rdflib.term.URIRef(u'http://love.com/lovers/john#')),
 (rdflib.term.BNode('N387ffec6cfcf427499ff7c3a00db24dc'),
  rdflib.term.URIRef(u'http://love.com#hasName'),
  rdflib.term.Literal(u'Mary')),
 (rdflib.term.URIRef(u'http://love.com/lovers/john#'),
  rdflib.term.URIRef(u'http://love.com#hasName'),
  rdflib.term.Literal(u'John'))]

In [6]: list(gmary)
Out[6]:
[(rdflib.term.BNode('N387ffec6cfcf427499ff7c3a00db24dc'),
  rdflib.term.URIRef(u'http://love.com#loves'),
  rdflib.term.URIRef(u'http://love.com/lovers/john#')),
 (rdflib.term.BNode('N387ffec6cfcf427499ff7c3a00db24dc'),
  rdflib.term.URIRef(u'http://love.com#hasName'),
  rdflib.term.Literal(u'Mary'))]

In [7]: list(gjohn)
Out[7]:
[(rdflib.term.URIRef(u'http://love.com/lovers/john#'),
  rdflib.term.URIRef(u'http://love.com#hasName'),
  rdflib.term.Literal(u'John'))]

In [8]: list(to_isomorphic(g))
Out[8]:
[(rdflib.term.BNode('N387ffec6cfcf427499ff7c3a00db24dc'),
  rdflib.term.URIRef(u'http://love.com#loves'),
  rdflib.term.URIRef(u'http://love.com/lovers/john#')),
 (rdflib.term.BNode('N387ffec6cfcf427499ff7c3a00db24dc'),
  rdflib.term.URIRef(u'http://love.com#hasName'),
  rdflib.term.Literal(u'Mary')),
 (rdflib.term.URIRef(u'http://love.com/lovers/john#'),
  rdflib.term.URIRef(u'http://love.com#hasName'),
  rdflib.term.Literal(u'John'))]

In [9]: list(to_isomorphic(gmary))
Out[9]:
[(rdflib.term.BNode('N387ffec6cfcf427499ff7c3a00db24dc'),
  rdflib.term.URIRef(u'http://love.com#loves'),
  rdflib.term.URIRef(u'http://love.com/lovers/john#')),
 (rdflib.term.BNode('N387ffec6cfcf427499ff7c3a00db24dc'),
  rdflib.term.URIRef(u'http://love.com#hasName'),
  rdflib.term.Literal(u'Mary')),
 (rdflib.term.URIRef(u'http://love.com/lovers/john#'),
  rdflib.term.URIRef(u'http://love.com#hasName'),
  rdflib.term.Literal(u'John'))]

is it maybe getting confused by the store being re-used?

@joernhees joernhees added the bug Something isn't working label Jan 30, 2017
@joernhees joernhees added this to the rdflib 5.0.0 milestone Jan 30, 2017
@joernhees
Copy link
Member

apart form that, as sha256 is used as a checksum, i think this would be a good approach to sign graphs, yes

@jpmccu
Copy link
Contributor

jpmccu commented Jan 30, 2017

To answe the original question, the isomorphic graph has a special method called graph_digest() that will output a graph-level hash using the Sayers and Karp algorithm. It looks like. The blank node has the same context (surrounding triples, grounding out at liberals or URIs) in all the graphs, so they are getting the same BNode IDs. So the canonicalized graphs should have the same ID fo the blank node version of Mary. I'll look into why they are the same as the non-canonicalized BNodes though. It could have something to do with the reuse of the Mary BNode across graphs.

@joernhees
Copy link
Member

notice how list(gmary) contains less triples than list(to_isomorphic(gmary))

@jpmccu
Copy link
Contributor

jpmccu commented Jan 30, 2017

Ah, yes, sorry, still early here. Will investigate ASAP.

@LibrEars
Copy link
Author

If you are interested in my approach for experiment-data management you can have a look at Linked-data-for-scientists-with-python.
There is also a quick way for graph visualization included.

@jpmccu
Copy link
Contributor

jpmccu commented Feb 16, 2017

The simple fix is for to_isomorphic() to not use the same store (which is what I do above). I tried using the same store but with an identifier, and the problem persists. The downside is that the triples are duplicated before actually being canonicalized.

joernhees added a commit that referenced this issue Feb 20, 2017
Added test for Issue #682 and fixed.
@white-gecko white-gecko modified the milestones: rdflib 5.0.0, rdflib 5.1.0 Apr 6, 2020
@white-gecko white-gecko modified the milestones: rdflib 5.1.0, rdflib 6.0.0 May 1, 2020
@ghost ghost added the id-as-cntxt tracking related issues label Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working id-as-cntxt tracking related issues in-resolution
Projects
None yet
Development

No branches or pull requests

5 participants