OntologyReasoner
What it is
- Utility class for de-duplicating RDF resources in a Turtle (
ttl) string. - Merges subjects that share the same
rdf:typeandrdfs:label, preferring a UUID-based IRI as the canonical subject.
Public API
Class: OntologyReasoner
is_iri_uuid(iri: rdflib.term.URIRef) -> bool
- Checks whether the last path/fragment segment of an IRI matches a UUID pattern (
8-4-4-4-12lowercase hex).
dedup_subject(class_label: tuple, graph: rdflib.Graph) -> rdflib.Graph
- De-duplicates subjects in
graphfor a given(class_iri, label)pair. - Behavior:
- Runs a SPARQL query to find subjects
?swith:?s a <class_iri>?s rdfs:label "label"- Excludes nodes that are superclasses of something (
FILTER NOT EXISTS { ?other rdfs:subClassOf ?s }) - Excludes nodes typed as
owl:Classorrdfs:Class
- Picks the first matching subject whose IRI ends with a UUID; otherwise creates a new subject IRI:
http://ontology.naas.ai/abi/{uuid4} - For each non-canonical duplicate subject found:
- Copies its non-
rdf:typeoutgoing triples to the canonical subject. - Rewrites triples that reference the duplicate as an object to instead reference the canonical subject.
- Copies its non-
- Rebuilds and returns a new graph excluding triples that mention removed duplicate IRIs.
- Runs a SPARQL query to find subjects
dedup_ttl(ttl: str) -> tuple[str, rdflib.Graph]
- Parses a Turtle string into an
rdflib.Graph, identifies duplicates by(rdf:type, rdfs:label), and appliesdedup_subjectwhen multiple subjects share the same pair. - Returns:
- Serialized Turtle string (with original namespace bindings re-bound)
- The resulting
rdflib.Graph
Configuration/Dependencies
- Python dependencies:
rdflib
- Standard library:
re,uuid,typing
Usage
from naas_abi_core.utils.OntologyReasoner import OntologyReasoner
ttl = """
@prefix ex: <http://example.com/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ex:a a ex:Person ; rdfs:label "Alice" ; ex:age 30 .
ex:b a ex:Person ; rdfs:label "Alice" ; ex:age 31 .
"""
reasoner = OntologyReasoner()
deduped_ttl, deduped_graph = reasoner.dedup_ttl(ttl)
print(deduped_ttl)
Caveats
dedup_ttlasserts that all triple components(s, p, o)areURIRef; Turtle containing literals (e.g.,rdfs:label "Alice") will violate this assertion for the literal object and raiseAssertionError.- The SPARQL query in
dedup_subjectreferencesowl:Classbut does not declare theowl:prefix in the query text; this may fail depending on the graph/query environment. - UUID detection only matches lowercase hex UUIDs (uppercase UUID IRIs will not be treated as UUID-based).