ArXivIntegration
What it is
- An integration wrapper around the
arxivPython client for:- Searching ArXiv papers
- Fetching metadata for a specific ArXiv paper
- Can also expose these capabilities as LangChain
StructuredTooltools.
Public API
ArXivIntegrationConfiguration
- Dataclass extending
IntegrationConfiguration. - Fields:
max_results: int = 10— default maximum number of search results when not provided per-call.
ArXivIntegration
Integration class (extends naas_abi_core.integration.Integration).
__init__(configuration: ArXivIntegrationConfiguration)
- Creates an
arxiv.Client()and stores configuration.
search_papers(query: str, max_results: Optional[int] = None) -> List[dict]
- Searches ArXiv with
arxiv.Search(query=..., max_results=...). - Returns a list of paper metadata dictionaries with keys:
id,title,authors,summary,published,updated,categories,links,pdf_url
get_paper(paper_id: str) -> dict
- Fetches metadata for one paper using
arxiv.Search(id_list=[paper_id]). - Returns a metadata dictionary with the same keys as
search_papers.
as_tools(configuration: ArXivIntegrationConfiguration) -> List[StructuredTool] (staticmethod)
- Builds and returns two LangChain
StructuredToolinstances:search_arxiv_papers→ callssearch_papersget_arxiv_paper→ callsget_paper
- Uses Pydantic argument schemas defined inside the method.
Configuration/Dependencies
- Depends on:
arxiv(Python package)langchain_core.tools.StructuredToolpydantic(BaseModel,Field)naas_abi_core.integration(Integration,IntegrationConfiguration)
- Configuration:
ArXivIntegrationConfiguration.max_resultscontrols default search result count.
Usage
Direct integration usage
from naas_abi_marketplace.applications.arxiv.integrations.ArXivIntegration import (
ArXivIntegration,
ArXivIntegrationConfiguration,
)
cfg = ArXivIntegrationConfiguration(max_results=5)
client = ArXivIntegration(cfg)
papers = client.search_papers("cat:cs.CL")
print(papers[0]["id"], papers[0]["title"])
paper = client.get_paper(papers[0]["id"])
print(paper["pdf_url"])
As LangChain tools
from naas_abi_marketplace.applications.arxiv.integrations.ArXivIntegration import (
ArXivIntegration,
ArXivIntegrationConfiguration,
)
tools = ArXivIntegration.as_tools(ArXivIntegrationConfiguration(max_results=3))
result = tools[0].func(query="quantum computing", max_results=2)
print(len(result))
Caveats
get_paperusesnext(self.__client.results(search)); if no results are returned for the givenpaper_id, it will raiseStopIteration.idis derived frompaper.entry_id.split("/")[-1](the last URL segment of the entry id).