CreateClassEmbeddingsWorkflow
What it is
A workflow that:
- Queries a triple store for all entities of a given RDF class (plus their datatype properties).
- Creates OpenAI embeddings from each entity’s
rdfs:label. - Stores new embeddings (skipping already-indexed entities) into a vector store collection.
- Optionally builds a LangChain
StructuredToolfor similarity search over a collection.
Public API
Classes
-
CreateClassEmbeddingsWorkflowConfiguration(WorkflowConfiguration)- Holds required services and embedding settings.
- Fields:
triple_store: ITripleStoreService(required)vector_store: VectorStoreService(required)embeddings_model_name: str = "text-embedding-3-large"embeddings_dimension: int = 3072
-
CreateClassEmbeddingsWorkflowParameters(WorkflowParameters)- Input parameters for embedding creation.
- Fields:
class_uri: str— RDF class URI used in SPARQLa {class_uri}.collection_name: str— vector store collection name.entity_variable_name: str— SPARQL variable name for the entity (e.g."person").entity_type_label: str— label used in logs (e.g."person").
-
CreateClassEmbeddingsWorkflow(Workflow)- Main workflow implementation.
Methods (CreateClassEmbeddingsWorkflow)
-
create_class_embeddings(parameters: CreateClassEmbeddingsWorkflowParameters) -> Dict[str, Any]- Ensures the target vector collection exists (cosine distance; configured dimension).
- Queries the triple store for:
- entity URI,
rdfs:label, and allowl:DatatypePropertyvalues (optional).
- entity URI,
- Creates embeddings for new entities only (based on
document_id = uri.split("/")[-1]). - Stores vectors and metadata in the vector store.
- Returns a status dict including
entities_processedand, when embeddings were added,collection_nameandentity_type.
-
create_search_tool(collection_name: str, search_param_name: str, tool_name: str, tool_description: str, entity_type_label: str) -> StructuredTool- Builds a LangChain
StructuredToolthat:- Accepts a dynamically named query parameter (e.g.
"person_name") andk(default 5, 1–20). - Embeds the query text and runs vector similarity search.
- Returns a list of
{uri, label, score}from stored metadata.
- Accepts a dynamically named query parameter (e.g.
- Builds a LangChain
-
as_tools() -> list[BaseTool]- Exposes the workflow as a LangChain tool:
- Tool name:
"create_class_embeddings" - Args schema:
CreateClassEmbeddingsWorkflowParameters
- Tool name:
- Exposes the workflow as a LangChain tool:
-
as_api(...) -> None- Declared but not implemented (
pass).
- Declared but not implemented (
Configuration/Dependencies
- Requires:
ITripleStoreServicefor SPARQL querying (query()).VectorStoreServicefor collection management and vector operations:ensure_collection(...)get_document(...)add_documents(...)search_similar(...)
- Uses OpenAI embeddings via
langchain_openai.OpenAIEmbeddingswithembeddings_model_name. - Uses
numpyto convert embedding lists to arrays before storage. - Uses
SPARQLUtils(...).results_to_list(...)to normalize triple store query results.
Usage
Create embeddings for a class
from naas_abi_core.engine.Engine import Engine
from naas_abi_marketplace.applications.linkedin import ABIModule
from naas_abi_marketplace.applications.linkedin.workflows.CreateClassEmbeddingsWorkflow import (
CreateClassEmbeddingsWorkflow,
CreateClassEmbeddingsWorkflowConfiguration,
CreateClassEmbeddingsWorkflowParameters,
)
engine = Engine()
engine.load(module_names=["naas_abi_marketplace.applications.linkedin"])
module: ABIModule = ABIModule.get_instance()
cfg = CreateClassEmbeddingsWorkflowConfiguration(
triple_store=module.engine.services.triple_store,
vector_store=module.engine.services.vector_store,
)
wf = CreateClassEmbeddingsWorkflow(cfg)
result = wf.create_class_embeddings(
CreateClassEmbeddingsWorkflowParameters(
class_uri="cco:ont00001262",
collection_name="linkedin_persons",
entity_variable_name="person",
entity_type_label="person",
)
)
print(result)
Create a similarity search tool
tool = wf.create_search_tool(
collection_name="linkedin_persons",
search_param_name="person_name",
tool_name="search_person",
tool_description="Search persons by name using embeddings.",
entity_type_label="person",
)
# Call tool function directly (LangChain integration may vary)
print(tool.func(person_name="Ada Lovelace", k=5))
Caveats
- Document IDs are derived from
uri.split("/")[-1]; URIs not containing/or requiring different ID logic may collide or behave unexpectedly. - Embeddings are computed only from
rdfs:label; datatype properties are stored as metadata but not embedded. as_api()is not implemented.