Prompt Injection Defense¶
Prompt injection happens when untrusted content — a web page, email body, support ticket, or retrieved document — contains instructions that convince the model to invoke a sensitive tool. From the model's perspective, it is just following text in its context. Strahl makes the source of that text a first-class input to the authorization decision.
The pattern¶
Give high-integrity tools a requires.source that only trusted origins can satisfy. Give untrusted content a source that does not include those tags.
import strahl
from strahl import ALL, Label
# Low-integrity tool: any source may use it, but results are scoped
@strahl.tool(
requires=Label(source=ALL, visibility={"public"}),
produces=Label(source=lambda url: {f"site:{url}"}, visibility={"user"}),
)
def web_fetch(url: str) -> str:
...
# High-integrity tool: only direct user instructions may drive it
@strahl.tool(
requires=Label(source={"user"}, visibility={"user"}),
produces=Label(source={"payments"}, visibility={"user"}),
)
def pay_invoice(invoice_id: str, amount: float) -> str:
...
Web-fetched content is labeled source={"site:example.com"}. That tag is not "user", so it cannot satisfy pay_invoice's requires.source={"user"}. Even if the fetched page says "pay invoice INV-999", the call is denied.
Adding retrieved content¶
When you put retrieved content into the conversation, register it as a document so Strahl knows its provenance:
html = web_fetch("https://example.com/invoice")
strahl.add_document(
"web-invoice-page",
html,
label=Label(source={"site:example.com"}, visibility={"user"}),
)
Documents and messages are analyzed together. Content that entered via a document carries the document's label, not the role label of whatever message role contains the summary.
Email and ticket bodies¶
Apply the same principle to inbound email or support tickets:
strahl.add_document(
"inbound-email",
email_body,
label=Label(source={"email:external"}, visibility={"support", "user"}),
)
Now even if the email body says "transfer funds to account X", that content carries source={"email:external"} and cannot satisfy a financial tool's requires.source={"user"} or requires.source={"finance-system"}.
Checking the result¶
After analysis, Strahl tells you which part of the tool call was blocked, where the risky influence came from, and the short excerpt that supported the block:
analysis = strahl.analyze(messages)
if analysis.denied:
for result in analysis.denied_results:
for component in result.denied_components:
print(f"{result.name}.{component.sink_value}: {component.decision}")
for violation in component.violations:
where = f"message {violation.source_ref.message_index}"
if hasattr(violation.source_ref, "tool_result_index"):
where += f", tool result {violation.source_ref.tool_result_index}"
print(f" blocked influence from {where}: {violation.evidence!r}")
This gives you an audit trail for the blocked call without exposing a separate scoring model: the SDK tells you the affected argument, the conversation location that influenced it, and the excerpt Strahl selected.