SearchLinkedInProfilePageWorkflow
What it is
A workflow that queries Google Programmable Search Engine for LinkedIn profile page URLs, extracts matching /in/ profile links via regex, and persists each found profile page record as JSON in object storage.
Public API
-
SearchLinkedInProfilePageWorkflowConfiguration(dataclass)- Holds workflow configuration:
integration_config:GoogleProgrammableSearchEngineIntegrationConfigurationpattern: regex used to validate/extract LinkedIn profile URLs (default:r"https://.+\.linkedin\.[^/]+/in/[^?]+")datastore_path: storage path for saved profile JSON files (defaults under.../linkedin_profile_pages)
- Holds workflow configuration:
-
SearchLinkedInProfilePageWorkflowParameters(pydantic)- Execution parameters:
profile_name(str, required): profile name to searchorganization_name(str, optional): organization name to include in query
- Execution parameters:
-
SearchLinkedInProfilePageWorkflow(class)__init__(configuration): builds the Google search integration and storage utility.search_linkedin_profile_page(parameters) -> list[dict]:- Builds a search query from
profile_name(+ optionalorganization_name) withLinkedIn profile site:linkedin.com - Calls the integration query
- Filters results matching
configuration.pattern - Extracts
profile_idfrom the/in/{profile_id}URL segment - Persists each result as
{profile_id}.jsonunderdatastore_path/{profile_id}/ - Returns a list of page records with keys:
title,link,description,cse_image
- Builds a search query from
as_tools() -> list[BaseTool]:- Exposes the workflow as a LangChain
StructuredToolnamedgooglesearch_search_linkedin_profile_page
- Exposes the workflow as a LangChain
as_api(...) -> None:- Present but does not register any routes (no-op)
Configuration/Dependencies
- Integration
GoogleProgrammableSearchEngineIntegrationconfigured viaGoogleProgrammableSearchEngineIntegrationConfiguration
- Storage
- Uses
StorageUtilsbacked byABIModule.get_instance().engine.services.object_storage - Default
datastore_pathderives fromABIModule.get_instance().configuration.datastore_path
- Uses
Usage
from naas_abi_marketplace.applications.google_search.workflows.SearchLinkedInProfilePageWorkflow import (
SearchLinkedInProfilePageWorkflow,
SearchLinkedInProfilePageWorkflowConfiguration,
SearchLinkedInProfilePageWorkflowParameters,
)
from naas_abi_marketplace.applications.google_search.integrations.GoogleProgrammableSearchEngineIntegration import (
GoogleProgrammableSearchEngineIntegrationConfiguration,
)
integration_config = GoogleProgrammableSearchEngineIntegrationConfiguration(
# fill with required integration settings
)
wf = SearchLinkedInProfilePageWorkflow(
SearchLinkedInProfilePageWorkflowConfiguration(integration_config=integration_config)
)
results = wf.search_linkedin_profile_page(
SearchLinkedInProfilePageWorkflowParameters(
profile_name="Ada Lovelace",
organization_name="Example Corp",
)
)
print(results)
Caveats
- Only URLs matching the configured regex
patternare returned and saved. as_api(...)is a no-op; this workflow does not expose HTTP endpoints via FastAPI in its current implementation.- Saving depends on
ABIModulebeing initialized with a working object storage service and datastore path.