De-duplicates subjects in graph for a given (class_iri, label) pair.
Behavior:
Runs a SPARQL query to find subjects ?s with:
?s a <class_iri>
?s rdfs:label "label"
Excludes nodes that are superclasses of something (FILTER NOT EXISTS { ?other rdfs:subClassOf ?s })
Excludes nodes typed as owl:Class or rdfs:Class
Picks the first matching subject whose IRI ends with a UUID; otherwise creates a new subject IRI: http://ontology.naas.ai/abi/{uuid4}
For each non-canonical duplicate subject found:
Copies its non-rdf:type outgoing triples to the canonical subject.
Rewrites triples that reference the duplicate as an object to instead reference the canonical subject.
Rebuilds and returns a new graph excluding triples that mention removed duplicate IRIs.
dedup_ttl(ttl: str) -> tuple[str, rdflib.Graph]
Parses a Turtle string into an rdflib.Graph, identifies duplicates by (rdf:type, rdfs:label), and applies dedup_subject when multiple subjects share the same pair.
Returns:
Serialized Turtle string (with original namespace bindings re-bound)
dedup_ttl asserts that all triple components (s, p, o) are URIRef; Turtle containing literals (e.g., rdfs:label "Alice") will violate this assertion for the literal object and raise AssertionError.
The SPARQL query in dedup_subject references owl:Class but does not declare the owl: prefix in the query text; this may fail depending on the graph/query environment.
UUID detection only matches lowercase hex UUIDs (uppercase UUID IRIs will not be treated as UUID-based).