AddPowerPointPresentationPipeline
What it is
A Pipeline that reads a PowerPoint presentation (from local src... path or object storage), extracts core document properties plus all slides/shapes, builds an RDF graph using the ABI/PPT namespaces, and inserts the triples into a configured triple store (unless the presentation already exists).
Public API
-
AddPowerPointPresentationPipelineConfiguration(dataclass)- Purpose: Configuration container for the pipeline.
- Fields:
powerpoint_configuration: PowerPointIntegrationConfigurationtriple_store: ITripleStoreService
-
AddPowerPointPresentationPipelineParameters(PipelineParameters)- Purpose: Validated inputs for
run(). - Fields:
presentation_name: str— Name of the presentation.storage_path: str— Storage path for the presentation.download_url: Optional[str]— Optional download URL.template_uri: Optional[str]— Optional template URI (must matchURI_REGEX).
- Purpose: Validated inputs for
-
AddPowerPointPresentationPipeline(Pipeline)run(parameters) -> rdflib.Graph- Purpose: Build and insert RDF triples for a presentation, its slides, and shapes.
- Behavior:
- Loads the presentation:
- If
storage_pathstarts with"src": usesPowerPointIntegration.create_presentation(storage_path). - Else: fetches bytes via
StorageUtils.get_powerpoint_presentation(dir, filename)and loads withpptx.Presentation(...).
- If
- Extracts core properties (
author,created,modified,last_modified_by) and combines them withpresentation_nameto form a signature string. - Hashes the signature (
create_hash_from_string) and checks for existing entity viaSPARQLUtils.get_identifier(...).- If found: returns
SPARQLUtils.get_subject_graph(existing_uri, depth=2)and does not insert.
- If found: returns
- Otherwise: creates new URIs and adds triples for:
- Presentation metadata (label, unique_id, storage_path, optional download_url, optional template links).
- Slides (
slide_number) and shapes (id/type/text/alt-text/position/size/rotation), plus relationships.
- Inserts the graph into the triple store under graph name
http://ontology.naas.ai/graph/defaultwhen non-empty. - Returns the built
rdflib.Graph(or an emptyGraph()if core property extraction fails).
- Loads the presentation:
as_tools() -> list[BaseTool]- Purpose: Exposes a LangChain
StructuredToolnamedadd_powerpoint_presentationthat callsrun()withAddPowerPointPresentationPipelineParameters.
- Purpose: Exposes a LangChain
as_api(...) -> None- Purpose: Present but not implemented; always returns
Noneand does not register routes.
- Purpose: Present but not implemented; always returns
Configuration/Dependencies
- Requires:
PowerPointIntegrationConfiguration(passed into pipeline configuration).ITripleStoreService(used forinsert()).
- Uses runtime services from
ABIModule.get_instance().engine.services:- Triple store (wrapped via
SPARQLUtils). - Object storage (wrapped via
StorageUtils).
- Triple store (wrapped via
- External libraries:
python-pptx(pptx.Presentation)rdflibfastapi(only for typing inas_api)langchain_coretools
Usage
from naas_abi_marketplace.applications.powerpoint.pipelines.AddPowerPointPresentationPipeline import (
AddPowerPointPresentationPipeline,
AddPowerPointPresentationPipelineConfiguration,
AddPowerPointPresentationPipelineParameters,
)
from naas_abi_marketplace.applications.powerpoint.integrations.PowerPointIntegration import (
PowerPointIntegrationConfiguration,
)
# Provide concrete instances appropriate for your environment:
ppt_cfg = PowerPointIntegrationConfiguration(...) # integration-specific
triple_store = ... # must implement ITripleStoreService
pipeline = AddPowerPointPresentationPipeline(
AddPowerPointPresentationPipelineConfiguration(
powerpoint_configuration=ppt_cfg,
triple_store=triple_store,
)
)
g = pipeline.run(
AddPowerPointPresentationPipelineParameters(
presentation_name="Quarterly Review",
storage_path="src/presentations/review.pptx", # or a non-src path to load from object storage
download_url="https://example.com/review.pptx",
template_uri="http://ontology.naas.ai/abi/some-template-uri",
)
)
print(len(g)) # number of triples produced/returned
Caveats
- If extracting
presentation.core_propertiesfails for any reason, the pipeline logs an error and returns an empty RDFGraphwithout inserting anything. as_api()is a no-op; it does not expose a FastAPI endpoint.- Deduplication is based on a hash of
presentation_nameplus available core properties; different files with identical properties may be treated as the same presentation.