PubMedPipeline
What it is
- A pipeline that queries PubMed within a date range and returns results as an RDF
Graph. - Can filter results to include only papers that have a
downloadUrl(PubMed Central downloadable).
Public API
Classes
-
PubMedPipelineConfiguration(PipelineConfiguration)- Pipeline configuration type (currently no additional fields).
-
PubMedPipelineParameters(PipelineParameters)- Input parameters for running the pipeline.
- Fields:
query: str— PubMed search query.start_date: str— start date (string format expected by integration; CLI suggestsYYYY-MM-DDorYYYY/MM/DD).end_date: Optional[str] = None— end date; ifNone, searches up to present.sort: Optional[Literal["pub_date","Author","JournalName","relevance"]] = "pub_date"— sorting mode.downloadable_only: Optional[bool] = False— include only results withdownloadUrl.max_results: Optional[int] = 100— max results (1..10,000).
-
PubMedPipeline(Pipeline)- Main pipeline class.
- Methods:
__init__(configuration: PubMedPipelineConfiguration)- Initializes the pipeline and an internal
PubMedIntegration(PubMedAPIConfiguration()).
- Initializes the pipeline and an internal
run(parameters: PipelineParameters) -> Graph- Executes the PubMed query and returns an RDF
Graphaggregated from resultrdf()outputs. - Raises
ValueErrorifparametersis notPubMedPipelineParameters.
- Executes the PubMed query and returns an RDF
as_api(...) -> None- Declared but not implemented (
pass).
- Declared but not implemented (
as_tools() -> List[BaseTool]- Returns a LangChain
StructuredToolnamedsearch_downloadable_pubmed_papersthat runs the pipeline and returns Turtle serialization.
- Returns a LangChain
Configuration/Dependencies
- Depends on:
naas_abi_core.pipeline:Pipeline,PipelineConfiguration,PipelineParameters,Graphnaas_abi_marketplace.applications.pubmed.integrations.PubMedAPI:PubMedIntegration,PubMedAPIConfiguration,PubMedPaperSummary
- LangChain:
langchain_core.tools.StructuredTool - FastAPI:
fastapi.APIRouter(only referenced byas_api, which is not implemented) - CLI dependencies when run as a script:
click,rich
Usage
Run from Python
from naas_abi_marketplace.applications.pubmed.pipelines.PubMedPipeline import (
PubMedPipeline,
PubMedPipelineConfiguration,
PubMedPipelineParameters,
)
pipeline = PubMedPipeline(PubMedPipelineConfiguration())
graph = pipeline.run(
PubMedPipelineParameters(
query="cancer biomarkers",
start_date="2024-01-01",
end_date="2024-03-01",
downloadable_only=True,
max_results=50,
)
)
ttl = graph.serialize(format="turtle")
print(ttl[:500])
Use as a LangChain tool
pipeline = PubMedPipeline(PubMedPipelineConfiguration())
tool = pipeline.as_tools()[0]
turtle = tool.run({
"query": "machine learning radiology",
"start_date": "2024-01-01",
"end_date": None,
"downloadable_only": False,
"max_results": 10,
})
print(turtle[:300])
Run as a script (writes pubmed_output.ttl)
python PubMedPipeline.py --query "diabetes" --start-date 2024-01-01 --end-date 2024-02-01
Caveats
PubMedPipeline.run()only acceptsPubMedPipelineParameters; passing any otherPipelineParameterssubtype raisesValueError.as_api()is not implemented.downloadable_only=Truefilters results by checkingresult.downloadUrl is not Nonebefore addingresult.rdf()to the output graph.