DocumentOntology
What it is
- A small RDF/OWL modeling layer using Pydantic models to represent ontology entities and serialize/deserialize them as RDF.
- Defines a base
RDFEntityplus three domain entities:File,Document, andProcessor. - Supports:
- Auto-generated URIs (UUID-based) with configurable namespace
- RDF graph generation via
rdflib - Loading instances from an IRI via SPARQL query results (requires a query executor)
Public API
class RDFEntity(pydantic.BaseModel)
Base class for RDF-backed entities.
set_namespace(namespace: str) -> None- Sets the namespace used to generate new instance URIs.
set_query_executor(query_executor: Callable[[str], Iterable[object]] | None) -> None- Sets a callable used by
from_iri()to execute SPARQL queries.
- Sets a callable used by
from_iri(iri: str, query_executor: Callable[[str], Iterable[object]] | None = None, graph_name: str | None = None) -> Self- Loads an instance by querying triples
<iri> ?p ?o(excludingrdf:type). - Maps predicate URIs to model fields via the class’
_property_uris. - Coerces literals to Python values; object properties become string IRIs.
- If
labelis a model field and missing, it is derived from the IRI.
- Loads an instance by querying triples
rdf(subject_uri: str | None = None, visited: set[str] | None = None) -> rdflib.Graph- Serializes the instance to an RDFLib
Graph. - Adds:
rdf:typeof the class (_class_uri) when presentrdf:type owl:NamedIndividualrdfs:labeliflabelattribute exists
- Recursively serializes related
RDFEntityobjects; usesvisitedfor cycle detection.
- Serializes the instance to an RDFLib
class File(RDFEntity)
Ontology class: http://ontology.naas.ai/abi/document/File
- Data fields (Pydantic):
file_path: Optional[str] = "unknown"file_name: Optional[str] = "unknown"mime_type: Optional[str] = "unknown"file_size_bytes: Optional[int]created_time: Optional[datetime.datetime]modified_time: Optional[datetime.datetime]accessed_time: Optional[datetime.datetime]permissions: Optional[str] = "unknown"encoding: Optional[str] = "unknown"sha256: Optional[str] = "unknown"label: str(required)created: Optional[datetime.datetime] = datetime.datetime.now()creator: Optional[Any] = os.environ.get("USER")
- Object properties (treated as IRIs in RDF):
derivedFrom: Optional[List[Union[File, rdflib.URIRef, str]]] = ["http://ontology.naas.ai/abi/unknown"]embodies: Optional[List[Union[Document, rdflib.URIRef, str]]] = ["http://ontology.naas.ai/abi/unknown"]processedBy: Optional[List[Union[Processor, rdflib.URIRef, str]]] = ["http://ontology.naas.ai/abi/unknown"]
class Document(RDFEntity)
Ontology class: http://ontology.naas.ai/abi/document/Document
- Data fields:
sha256: Optional[str] = "unknown"label: str(required)created: Optional[datetime.datetime] = datetime.datetime.now()creator: Optional[Any] = os.environ.get("USER")
- Object properties:
isEmbodiedIn: Optional[List[Union[File, rdflib.URIRef, str]]] = ["http://ontology.naas.ai/abi/unknown"]
class Processor(RDFEntity)
Ontology class: http://ontology.naas.ai/abi/document/Processor
- Data fields:
label: str(required)created: Optional[datetime.datetime] = datetime.datetime.now()creator: Optional[Any] = os.environ.get("USER")
Configuration/Dependencies
- Dependencies:
pydantic.BaseModel,pydantic.Field,pydantic.ValidationErrorrdflib.Graph,rdflib.Literal,rdflib.Namespace,rdflib.URIRefrdflib.namespaceconstants:OWL,RDF,RDFS,XSD
- Namespace configuration:
- Default URI namespace for new instances:
RDFEntity._namespace = "http://ontology.naas.ai/abi/" - Override with
RDFEntity.set_namespace("...")
- Default URI namespace for new instances:
- SPARQL loading:
RDFEntity.from_iri()requires aquery_executorcallable (set per-call or viaset_query_executor).- Optional
graph_namescopes the query toGRAPH <graph_name> { ... }.
Usage
Create entities and serialize to RDF
from naas_abi_marketplace.domains.document.ontologies.modules.DocumentOntology import (
File, Document, Processor
)
doc = Document(label="Invoice #123")
proc = Processor(label="OCR Pipeline")
f = File(label="invoice.pdf", embodies=[doc], processedBy=[proc])
g = f.rdf()
print(g.serialize(format="turtle"))
Load an entity from an IRI (requires SPARQL executor)
from naas_abi_marketplace.domains.document.ontologies.modules.DocumentOntology import File
def exec_sparql(query: str):
# Return an iterable of rows where row["p"] and row["o"] (or attributes p/o) exist.
# This is a placeholder; integrate with your SPARQL client.
return []
File.set_query_executor(exec_sparql)
loaded = File.from_iri("http://example.org/resource/file-1")
print(loaded._uri, loaded.label)
Caveats
labelis required onFile,Document, andProcessor;from_iri()will synthesize it from the IRI only if missing in query results.from_iri()rejects IRIs (andgraph_name) containing angle brackets (<,>).- Object properties in
from_iri()are coerced to string IRIs (not automatically instantiated asRDFEntityobjects). - Some fields default to values at import/instantiation time:
createddefaults todatetime.datetime.now()at model creation time (as written).creatordefaults toos.environ.get("USER").