LinkedInIntegration
What it is
- A Python integration client for LinkedIn’s private “voyager” API endpoints.
- Authenticates using LinkedIn cookies (
li_at,JSESSIONID) and fetches:- Organization info
- Profile “top card”, skills, experience, education
- Profile posts feed and post engagement (stats, reactions, comments, reposts)
- People search and mutual connections exports (JSON + Excel)
- Persists raw and derived artifacts (JSON, optional images, Excel) under a configured datastore path.
- Provides
as_tools()to expose selected actions as LangChainStructuredTools.
Public API
LinkedInIntegrationConfiguration (dataclass)
Configuration for LinkedInIntegration.
li_at: str— LinkedInli_atcookie.JSESSIONID: str— LinkedInJSESSIONIDcookie (quotes are stripped in__init__).linkedin_url: str— a LinkedIn profile URL used to initialize the integration and deriveprofile_public_id.naas_integration_config: NaasIntegrationConfiguration | None— optional; if provided, enables uploading exported Excel as a Naas asset.base_url: str = "https://www.linkedin.com/voyager/api"— API base.datastore_path: str— storage root (defaults fromABIModuleconfiguration).
LinkedInIntegration
Main integration client.
Constructor
LinkedInIntegration(configuration: LinkedInIntegrationConfiguration)- Validates access by calling
get_profile_public_id(configuration.linkedin_url). - Raises
Exceptionif the public ID cannot be resolved.
- Validates access by calling
Data retrieval methods
-
get_organization_id_from_url(url: str) -> Dict[str, str]- Parses organization “public id” from
/company/,/school/, or/showcase/URLs.
- Parses organization “public id” from
-
get_organization_id(url: str) -> Dict[str, str]- Fetches organization info then extracts the numeric org id from
*elements.
- Fetches organization info then extracts the numeric org id from
-
get_organization_info(url: str, return_cleaned_json: bool = False) -> Dict- Calls
/organization/companies?...and saves response; optionally returns cleaned/flattened JSON.
- Calls
-
get_profile_id_from_url(url: str) -> Dict[str, str]- Extracts vanity id from
/in/<id>/profile URLs.
- Extracts vanity id from
-
get_profile_public_id(url: str) -> Dict[str, str]- Reads
publicIdentifierfrom top-card response; falls back to parsing a “share URL” from overflow actions.
- Reads
-
get_profile_id(url: str) -> Dict[str, str]- Extracts internal profile id (URN suffix) from top-card response.
-
get_profile_top_card(url: str, return_cleaned_json: bool = False) -> Dict- GraphQL request for top-card; rejects vanity ids starting with
AcoAA(treated as invalid URL input).
- GraphQL request for top-card; rejects vanity ids starting with
-
get_profile_data(url: str, profile_type: str = "skills", locale: str = "en_US", return_cleaned_json: bool = False) -> Dict- GraphQL request for a profile section (
skills,experience,education, etc.).
- GraphQL request for a profile section (
-
get_profile_skills(url: str, return_cleaned_json: bool = False) -> Dict -
get_profile_experience(url: str, return_cleaned_json: bool = False) -> Dict -
get_profile_education(url: str, return_cleaned_json: bool = False) -> Dict -
get_profile_posts_feed(url: str, start: int = 0, count: int = 1, pagination_token: str | None = None, return_cleaned_json: bool = False) -> Dict- GraphQL request for profile posts feed.
- Extracts
activity_idandpaginationToken; derives a publish date from token. - Saves response under a date/activity-based prefix.
-
get_activity_id_from_url(url: str) -> Dict- Extracts activity id from URLs containing
-activity-or:activity:.
- Extracts activity id from URLs containing
-
get_post_stats(url: str, return_cleaned_json: bool = False) -> Dict- Calls
/feed/updates/urn:li:activity:{id}.
- Calls
-
get_post_reactions(url: str, start: int = 0, count: int = 100, limit: int = -1, return_cleaned_json: bool = False) -> Dict- GraphQL pagination loop; merges
includedanddata.*elements; saves aggregated JSON.
- GraphQL pagination loop; merges
-
get_post_comments(url: str, start: int = 0, count: int = 100, limit: int = -1, return_cleaned_json: bool = False) -> Dict- GraphQL pagination loop; merges comment elements; saves aggregated JSON.
-
get_post_reposts(url: str, start: int = 0, count: int = 100, limit: int = -1, return_cleaned_json: bool = False) -> Dict- GraphQL pagination loop; merges reshare elements; saves aggregated JSON.
Search/export methods
-
get_mutual_connexions(profile_url: str, connection_distance: str = "F", organization_url: str | None = None, start: int = 0, count: int = 50, limit: int = 1000, query_id: str = "...") -> Dict- GraphQL pagination loop to fetch people results connected to
profile_url(optionally filtered by organization). - Builds a simplified list of people (id, public_id, name, headline, location, profile_url, etc.).
- Saves final JSON and exports Excel; if Naas is configured, uploads and returns an
excel_url.
- GraphQL pagination loop to fetch people results connected to
-
search_people(connection_distance: str = "F", organization_url: str | None = None, location: str | None = None, start: int = 0, count: int = 50, limit: int = 1000, query_id: str = "...") -> Dict- GraphQL pagination loop for generic PEOPLE search.
- Optional
locationmapping currently contains only"France" -> "105015875". - Saves final JSON and exports Excel (same upload behavior as above).
Cleaning utilities
clean_json(prefix: str, filename: str, data: dict) -> Dict[str, Any]- If a cleaned file exists, returns it.
- Otherwise:
- Removes keys starting with
*or containingurn(case-insensitive) recursively. - Parses
includedentities into a$type-keyed structure, optionally replacing image-like fields with highest-quality URL. - Flattens nested dict keys with
_. - Saves
*_cleaned.jsonunderdatastore_path/prefix/.
- Removes keys starting with
as_tools(configuration: LinkedInIntegrationConfiguration) -> list
- Returns a list of LangChain
StructuredToolwrappers around:linkedin_get_organization_infolinkedin_get_profile_top_cardlinkedin_get_profile_skillslinkedin_get_profile_experiencelinkedin_get_profile_educationlinkedin_get_profile_posts_feedlinkedin_get_post_commentslinkedin_get_post_reactionslinkedin_get_post_repostslinkedin_get_mutual_connexionslinkedin_search_people
- Each tool validates inputs via Pydantic schemas (URL patterns, connection distance pattern
^[FSO]$, etc.).
Configuration/Dependencies
- Required runtime dependencies:
requests,pydash,naas_abi_core(Integration base, cache, storage utils),naas_abi_marketplace(ABIModule)
- Optional dependencies (only used by specific paths):
pandas(Excel export)langchain_core,pydantic(foras_tools)
- Authentication:
- Must provide valid LinkedIn cookies:
li_atandJSESSIONID.
- Must provide valid LinkedIn cookies:
- Storage:
- Uses
StorageUtilsbound toABIModule.get_instance().engine.services.object_storage. - Writes under
LinkedInIntegrationConfiguration.datastore_path.
- Uses
Usage
Minimal client usage
from naas_abi_marketplace.applications.linkedin.integrations.LinkedInIntegration import (
LinkedInIntegration,
LinkedInIntegrationConfiguration,
)
cfg = LinkedInIntegrationConfiguration(
li_at="YOUR_LI_AT",
JSESSIONID='"YOUR_JSESSIONID"', # quotes are stripped automatically
linkedin_url="https://www.linkedin.com/in/someone/",
)
li = LinkedInIntegration(cfg)
org = li.get_organization_info("https://www.linkedin.com/company/naas-ai/", return_cleaned_json=True)
profile = li.get_profile_top_card("https://www.linkedin.com/in/someone/", return_cleaned_json=True)
print(org.keys(), profile.keys())
LangChain tools
from naas_abi_marketplace.applications.linkedin.integrations.LinkedInIntegration import (
LinkedInIntegrationConfiguration,
as_tools,
)
cfg = LinkedInIntegrationConfiguration(
li_at="YOUR_LI_AT",
JSESSIONID="YOUR_JSESSIONID",
linkedin_url="https://www.linkedin.com/in/someone/",
)
tools = as_tools(cfg)
# tools is a list[StructuredTool]
Caveats
- Uses LinkedIn private endpoints (
voyager+ hard-coded GraphQLqueryIds); these may break if LinkedIn changes/rotates query IDs. LinkedInIntegration.__init__performs a live call to resolveprofile_public_idand will raise if it fails._make_requestis cached for 1 day keyed by method + cookies + endpoint (+ params); responses may be stale within TTL.get_profile_posts_feedcontainsassertstatements expecting specific response structure; it can raiseAssertionErroron unexpected API responses.get_post_reactions,get_post_comments, andget_post_repostscall_make_requestwith unsupported keyword arguments (prefix,filename) in this file; as written, this will raiseTypeErrorat runtime if those code paths are executed.clean_jsonremoves keys containing"urn"anywhere in the key name; this may drop fields you care about.- Location filtering in
search_peopleonly maps"France"; other locations are ignored (with aprint).