ConvertToMarkdownBasePipeline
What it is
- An abstract
Pipelinethat finds files of a given MIME type in an RDF graph, converts each to Markdown (via a subclass implementation), uploads the resulting.mdfile, and returns RDF statements describing the new files.
Public API
Classes
-
ConvertToMarkdownBasePipelineConfiguration(PipelineConfiguration)mime_type: str- MIME type used to select which files to convert.
-
ConvertToMarkdownBasePipelineParameters(PipelineParameters)graph_name: str = "http://ontology.naas.ai/graph/document"- Graph name to read from / ingest into.
processor_iri: str- IRI of the processor used to mark files as processed.
-
ConvertToMarkdownBasePipeline(Pipeline)__init__(configuration: ConvertToMarkdownBasePipelineConfiguration)- Stores configuration and resolves
ABIModulesingleton.
- Stores configuration and resolves
convert_to_markdown(file: File) -> str- Abstract; must be implemented by subclasses to produce Markdown content for a
File.
- Abstract; must be implemented by subclasses to produce Markdown content for a
run(parameters: PipelineParameters) -> rdflib.Graph- Pipeline execution:
- Selects files via
get_files_to_process(graph_name, mime_type, processor_iri). - For each file IRI:
- Loads
Filefrom triple store. - Calls
convert_to_markdown. - Uploads and creates a new
.mdFileviaFile.UploadAndCreateFile(...). - Adds
derivedFromandprocessedBylinks.
- Loads
- Returns an RDF
Graphcontaining new file triples.
- Selects files via
- Pipeline execution:
as_tools() -> list[langchain_core.tools.BaseTool]- Exposes the pipeline as a LangChain
StructuredToolnamed"ConvertToMarkdownBase".
- Exposes the pipeline as a LangChain
as_api(...) -> None- No-op (returns
None).
- No-op (returns
Configuration/Dependencies
- Depends on:
naas_abi_core.pipeline(Pipeline,PipelineConfiguration,PipelineParameters)rdflib.Graphnaas_abi_marketplace.domains.document.ABIModule(triple store access)Fileontology class:File.from_iri(...)File.UploadAndCreateFile(...)new_file.rdf()
get_files_to_process(...)(selects file IRIs to process)- LangChain tooling (
StructuredTool)
mime_typedrives which files are considered for conversion.
Usage
from naas_abi_marketplace.domains.document.pipelines.ConvertToMarkdownBasePipeline import (
ConvertToMarkdownBasePipeline,
ConvertToMarkdownBasePipelineConfiguration,
ConvertToMarkdownBasePipelineParameters,
)
class MyMarkdownPipeline(ConvertToMarkdownBasePipeline):
def convert_to_markdown(self, file):
# Implement actual conversion logic here
return f"# {file.file_name}\n\nConverted content."
pipeline = MyMarkdownPipeline(
ConvertToMarkdownBasePipelineConfiguration(mime_type="application/pdf")
)
result_graph = pipeline.run(
ConvertToMarkdownBasePipelineParameters(
processor_iri="http://example.org/processors/convert-to-md"
)
)
Caveats
convert_to_markdownis not implemented and will raiseNotImplementedErrorunless overridden.run()asserts parameters areConvertToMarkdownBasePipelineParameters; passing other parameter types will fail.as_api()does not register any routes (no API exposure by default).