OntologyReasoner
What it is
- A small utility for deduplicating RDF resources in a Turtle (TTL) ontology graph using rdflib.
- Deduplication is based on having the same
rdf:typeandrdfs:label, merging triples into a single “main” subject.
Public API
Class: OntologyReasoner
is_iri_uuid(iri: URIRef) -> bool
- Checks whether the last path/fragment segment of an IRI matches a UUID pattern (
8-4-4-4-12lowercase hex).
dedup_subject(class_label: tuple, graph: Graph) -> Graph
- Deduplicates subjects in
graphfor a given(class_iri, label)pair. - Behavior:
- Finds subjects
?ssuch that:?s a <class_iri>?s rdfs:label "label"- and excludes subjects that:
- are a superclass of another resource (
FILTER NOT EXISTS { ?other rdfs:subClassOf ?s }) - are explicitly typed as
owl:Classorrdfs:Class
- are a superclass of another resource (
- Picks the first matching subject whose IRI ends with a UUID as the main node; otherwise creates a new node:
http://ontology.naas.ai/abi/<uuid4>
- Merges data:
- Copies non-
rdf:typetriples from duplicate subjects onto the main node - Rewrites object references pointing to duplicate subjects to point to the main node
- Removes excluded duplicate subjects from the output graph
- Copies non-
- Finds subjects
dedup_ttl(ttl: str) -> Tuple[str, Graph]
- Parses a Turtle string into a graph, finds duplicate subjects with the same
(rdf:type, rdfs:label), and deduplicates them. - Returns:
- A serialized Turtle string of the deduplicated graph
- The deduplicated
rdflib.Graphobject
- Preserves existing namespace bindings from the parsed graph.
Configuration/Dependencies
- Python dependencies:
rdflib
- Standard library:
re,uuid,typing
- Input format:
dedup_ttlexpects Turtle (format="turtle").
Usage
from naas_abi_core.utils.OntologyReasoner import OntologyReasoner
ttl = """
@prefix ex: <http://example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ex:a a ex:Thing ; rdfs:label "X" ; ex:p "v1" .
ex:b a ex:Thing ; rdfs:label "X" ; ex:p "v2" .
"""
reasoner = OntologyReasoner()
deduped_ttl, deduped_graph = reasoner.dedup_ttl(ttl)
print(deduped_ttl)
Caveats
dedup_ttlasserts that all triple components areURIRef; graphs containing literals (includingrdfs:labelvalues) may trigger anAssertionErrorduring iteration.- SPARQL in
dedup_subjectreferencesowl:Classbut does not declare theowl:prefix in the query; whether this works depends on the graph/query namespace handling in rdflib. - UUID detection in
is_iri_uuidmatches lowercase hex only.