AirgapChatOpenAI (Gemma3 Airgap Model)
What it is
- A minimal wrapper around
langchain_openai.ChatOpenAIthat targets a local Docker Model Runner endpoint for Gemma3 (ai/gemma3). - Provides:
- Prompt formatting (System/User/Assistant style) before sending requests.
- Basic streaming via direct HTTP requests to the runner.
- Simplified tool-call handling (forces empty
tool_callsto avoid routing issues).
Public API
Class: AirgapChatOpenAI(ChatOpenAI)
__init__(**kwargs)- Initializes the underlying
ChatOpenAIand a private_toolslist (unused).
- Initializes the underlying
bind_tools(tools, **kwargs) -> AirgapChatOpenAI- No-op: returns
selfwithout storing tools.
- No-op: returns
bind(**kwargs) -> AirgapChatOpenAI- No-op: returns
self.
- No-op: returns
- Property:
_llm_type -> str- Returns
"airgap_chat_openai".
- Returns
_generate(messages, stop=None, run_manager=None, **kwargs) -> ChatResult- Extracts any system prompt and last human message, builds a single user prompt:
System: ...User: ...Assistant:
- Calls parent
_generatewith filtered kwargs (temperature,max_tokens,stop). - Forces the returned first generation message to an
AIMessagewithtool_calls = [].
- Extracts any system prompt and last human message, builds a single user prompt:
_stream(messages, stop=None, run_manager=None, **kwargs) -> Iterator[ChatGenerationChunk]- Sends a streaming POST request to
"{openai_api_base}/chat/completions"usingrequests. - Parses
data: {json}lines and yieldsChatGenerationChunkwithAIMessageChunk(content=...). - On timeout/connection/error, yields a final chunk containing an error message.
- Sends a streaming POST request to
Module-level model registration
Constants:
MODEL_ID = "ai/gemma3"NAME = "gemma3-airgap"DESCRIPTION,IMAGE,CONTEXT_WINDOW = 8192,PROVIDER = "google"
Object:
model: ChatModelmodel.modelis anAirgapChatOpenAIconfigured with:model="ai/gemma3"temperature=0.2max_tokens=512openai_api_base="http://localhost:12434/engines/v1"api_key="ignored"
Configuration/Dependencies
- Python deps:
langchain_openai,langchain_corerequestsnaas_abi_core(forloggerandChatModel)
- Runtime dependency:
- A local Docker Model Runner compatible with an OpenAI-style endpoint at:
http://localhost:12434/engines/v1/chat/completions
- A local Docker Model Runner compatible with an OpenAI-style endpoint at:
Usage
Non-streaming generation
from langchain_core.messages import HumanMessage
from naas_abi.models.airgap_gemma import model
llm = model.model # AirgapChatOpenAI instance
res = llm.invoke([HumanMessage(content="Write a one-line haiku about latency.")])
print(res.content)
Streaming tokens
from langchain_core.messages import HumanMessage
from naas_abi.models.airgap_gemma import model
llm = model.model
for chunk in llm.stream([HumanMessage(content="Count from 1 to 5.")]):
print(chunk.content, end="")
print()
Caveats
- Tool support is intentionally minimal:
bind_tools()does not store or apply tools._generate()forcestool_calls = []on the returned message.
- Streaming is implemented via
requests.post(..., stream=True)and expects SSE-likedata:lines and a[DONE]sentinel. - If no valid user message is found, the wrapper substitutes
"Hello". - Error handling during streaming yields an error chunk as model output (rather than raising).