SearchLinkedInOrganizationPageWorkflow
What it is
A workflow that uses Google Programmable Search Engine results to find and extract LinkedIn organization page URLs (company/school/showcase), then stores each matched page as JSON in object storage.
Public API
-
SearchLinkedInOrganizationPageWorkflowConfiguration- Workflow configuration:
integration_config:GoogleProgrammableSearchEngineIntegrationConfigurationused to query Google.pattern: regex used to match LinkedIn organization URLs (default:https://.+\.linkedin\.com/(company|school|showcase)/[^?]+).datastore_path: base storage path (defaults underABIModule.get_instance().configuration.datastore_path/linkedin_organization_pages).
- Workflow configuration:
-
SearchLinkedInOrganizationPageWorkflowParameters- Input parameters:
organization_name: str: organization name to search for.
- Input parameters:
-
SearchLinkedInOrganizationPageWorkflow__init__(configuration)- Instantiates Google search integration and storage utility.
search_linkedin_organization_page(parameters) -> Any- Builds query:
"{organization_name}+site:linkedin.com"(spaces replaced by+). - Filters results by
configuration.pattern. - Detects LinkedIn org type by URL path:
/company/,/school/,/showcase/. - Extracts
organization_idfrom the URL segment after the type. - Persists a JSON document per match via
StorageUtils.save_json(...). - Returns a list of dicts with keys:
title,link,description,cse_image.
- Builds query:
as_tools() -> list[BaseTool]- Exposes a LangChain
StructuredToolnamedgooglesearch_search_linkedin_organization_page.
- Exposes a LangChain
as_api(...) -> None- Present but does not register routes (returns
None).
- Present but does not register routes (returns
Configuration/Dependencies
- Depends on:
GoogleProgrammableSearchEngineIntegration(requiresintegration_config).ABIModule.get_instance()for datastore path and object storage service.StorageUtilsfor persisting JSON outputs.
- Storage layout:
- Saves under
datastore_pathwith"organization"replaced by the detected type (company,school, orshowcase), then/<organization_id>/<organization_id>.json.
- Saves under
Usage
from naas_abi_marketplace.applications.google_search.workflows.SearchLinkedInOrganizationPageWorkflow import (
SearchLinkedInOrganizationPageWorkflow,
SearchLinkedInOrganizationPageWorkflowConfiguration,
SearchLinkedInOrganizationPageWorkflowParameters,
)
from naas_abi_marketplace.applications.google_search.integrations.GoogleProgrammableSearchEngineIntegration import (
GoogleProgrammableSearchEngineIntegrationConfiguration,
)
config = SearchLinkedInOrganizationPageWorkflowConfiguration(
integration_config=GoogleProgrammableSearchEngineIntegrationConfiguration(
# provide required integration fields here
)
)
wf = SearchLinkedInOrganizationPageWorkflow(config)
results = wf.search_linkedin_organization_page(
SearchLinkedInOrganizationPageWorkflowParameters(organization_name="OpenAI")
)
print(results)
Caveats
- Only URLs matching the configured regex and containing one of
/company/,/school/,/showcase/are returned/saved. as_api()does not expose any HTTP endpoints.