Skip to main content

AIAgentOntologyGenerationPipeline

What it is

A pipeline that:

  • Loads the latest Artificial Analysis LLM JSON dataset from a datastore folder
  • Groups models into AI-agent “modules” (e.g., chatgpt, claude, llama) using name/slug/provider heuristics
  • Generates TTL ontology files per agent (BFO-structured content as strings)
  • Writes outputs into a timestamped datastore directory and also deploys a “current” TTL into module ontologies folders
  • Inserts a small summary rdflib.Graph (file count + timestamp) into a configured triple store

Public API

Classes

  • AIAgentOntologyGenerationConfiguration(PipelineConfiguration)

    • Configuration for the pipeline.
    • Fields:
      • triple_store: ITripleStoreService (required) - target triple store service used to insert(Graph)
      • datastore_path: str - output root for generated ontologies (timestamped subfolders)
      • source_datastore_path: str - input folder containing *_llms_data.json files
      • max_models_per_agent: int - cap per agent for performance
  • AIAgentOntologyGenerationParameters(PipelineParameters)

    • Execution parameters.
    • Fields:
      • force_regenerate: bool - defined but not used in current implementation
      • agent_filter: Optional[List[str]] - restrict generation to specific agent modules (keys like ["chatgpt","claude"])
  • AIAgentOntologyGenerationPipeline(Pipeline)

    • Main pipeline implementation.

Methods (intended for external use)

  • AIAgentOntologyGenerationPipeline.run(parameters: PipelineParameters) -> rdflib.Graph

    • Runs the pipeline end-to-end.
    • Validates parameter type (AIAgentOntologyGenerationParameters).
    • Loads latest AA dataset, generates/deploys TTL files, writes a JSON summary file, inserts a summary graph into the triple store, and returns that graph.
  • AIAgentOntologyGenerationPipeline.as_tools() -> list[BaseTool]

    • Exposes the pipeline as a LangChain StructuredTool named ai_agent_ontology_generation.
    • Tool calls run(AIAgentOntologyGenerationParameters(**kwargs)).
  • AIAgentOntologyGenerationPipeline.as_api(...) -> None

    • Present but currently does nothing (returns None).
  • AIAgentOntologyGenerationPipeline.get_configuration() -> AIAgentOntologyGenerationConfiguration

    • Returns the pipeline configuration instance.

Configuration/Dependencies

Required dependencies

  • rdflib (Graph, Literal, Namespace) - graph returned/inserted, though TTL is generated as plain text files.
  • naas_abi_core.pipeline - base Pipeline, PipelineConfiguration, PipelineParameters.
  • naas_abi_core.services.triple_store.TripleStorePorts.ITripleStoreService
    • Must provide an insert(graph: Graph) method.
  • langchain_core.tools - for as_tools() (BaseTool, StructuredTool).

Filesystem inputs/outputs

  • Input: latest file matching *_llms_data.json in source_datastore_path.
  • Output: under datastore_path/<UTC_TIMESTAMP>/
    • <AgentTitle>Ontology.ttl (current)
    • <UTC_TIMESTAMP>_<AgentTitle>Ontology.ttl (audit copy)
    • generation_summary_<UTC_TIMESTAMP>.json
  • Deployment output: also writes <AgentTitle>Ontology.ttl into:
    • Path(__file__).parent.parent.parent / <agent_module> / "ontologies" / <AgentTitle>Ontology.ttl

Usage

from naas_abi.pipelines.AIAgentOntologyGenerationPipeline import (
AIAgentOntologyGenerationPipeline,
AIAgentOntologyGenerationConfiguration,
AIAgentOntologyGenerationParameters,
)

# Minimal triple store stub for demonstration
class TripleStoreStub:
def insert(self, graph):
pass

pipeline = AIAgentOntologyGenerationPipeline(
AIAgentOntologyGenerationConfiguration(
triple_store=TripleStoreStub(),
source_datastore_path="storage/datastore/core/modules/abi/ArtificialAnalysisWorkflow",
datastore_path="storage/datastore/core/modules/abi/AIAgentOntologyGenerationPipeline",
max_models_per_agent=50,
)
)

graph = pipeline.run(
AIAgentOntologyGenerationParameters(agent_filter=["chatgpt", "claude"])
)

print(len(graph)) # summary triples count

Caveats

  • force_regenerate parameter is currently unused.
  • The LangChain tool description states “datastore only, no module deployment”, but run() does deploy TTL files into module folders.
  • Module deployment path is derived from __file__ by going up 3 directories; this assumes a specific repo/layout and may write files into unexpected locations depending on installation.
  • Ontology TTL content is written as text and is not parsed/validated before writing.
  • If no *_llms_data.json exists (or source directory missing), run() raises ValueError.