Skip to main content
AI-powered workflow steps using Large Language Models.

Overview

Osmedeus provides two step types for LLM integration:
  • llm — Single-shot LLM calls: chat completions, tool calling, embeddings, multimodal content, structured outputs
  • agent — Agentic execution loop: iterative tool calling, sub-agents, memory management, planning stages, multi-goal execution

Configuration

Settings

In osm-settings.yaml, configure one or more providers under llm_providers. Providers are rotated automatically across requests:
llm:
  llm_providers:
    - provider: openai
      base_url: "https://api.openai.com/v1"
      auth_token: "sk-..."
      model: gpt-4
    - provider: anthropic
      base_url: "https://api.anthropic.com/v1"
      auth_token: "sk-ant-..."
      model: claude-3-opus
  max_tokens: 4096
  temperature: 0.7
  stream: false

Environment Variables

Environment variables override settings for the default provider:
export OSM_LLM_BASE_URL=https://api.openai.com/v1
export OSM_LLM_AUTH_TOKEN=sk-...
export OSM_LLM_MODEL=gpt-4

Chat Completion

Basic Usage

- name: analyze-results
  type: llm
  messages:
    - role: system
      content: You are a security analyst. Analyze findings concisely.
    - role: user
      content: |
        Analyze these vulnerabilities:
        {{readFile("{{Output}}/vulns.txt")}}
  exports:
    analysis: "{{analyze_results_content}}"
Export variables are based on the sanitized step name (hyphens replaced with underscores). A step named analyze-results produces exports analyze_results_llm_resp (full response object) and analyze_results_content (text content only).

Message Roles

RoleDescription
systemSystem instructions
userUser input
assistantPrevious AI response
toolTool call result

Multi-turn Conversation

- name: chat
  type: llm
  messages:
    - role: system
      content: You are a helpful security assistant.
    - role: user
      content: What is SQL injection?
    - role: assistant
      content: SQL injection is a code injection technique...
    - role: user
      content: How do I prevent it in Python?

Tool Calling

Define Tools

- name: intelligent-scan
  type: llm
  messages:
    - role: system
      content: You are a security scanner. Use tools to analyze targets.
    - role: user
      content: Analyze {{target}} for security issues.
  tools:
    - type: function
      function:
        name: port_scan
        description: Scan ports on a target
        parameters:
          type: object
          properties:
            target:
              type: string
              description: Target IP or hostname
            ports:
              type: string
              description: Port range (e.g., "1-1000")
          required: ["target"]

    - type: function
      function:
        name: vulnerability_scan
        description: Run vulnerability scan
        parameters:
          type: object
          properties:
            target:
              type: string
            templates:
              type: string
              enum: ["cves", "misconfigurations", "exposures"]
          required: ["target"]

Handle Tool Calls

Tool calls are exported within the _llm_resp object:
- name: ai-scan
  type: llm
  messages:
    - role: user
      content: Scan {{target}}
  tools:
    - type: function
      function:
        name: scan
        parameters: { ... }
  exports:
    full_response: "{{ai_scan_llm_resp}}"

- name: execute-tool
  type: function
  pre_condition: '{{full_response}} != ""'
  function: |
    // Parse and execute tool calls
    executeToolCalls("{{full_response}}")

Embeddings

Generate Embeddings

- name: embed-findings
  type: llm
  is_embedding: true
  embedding_input:
    - "SQL injection in login form"
    - "Cross-site scripting in search"
    - "Insecure direct object reference"
  exports:
    embeddings: "{{embed_findings_llm_resp}}"

Use with Files

- name: embed-vulns
  type: llm
  is_embedding: true
  embedding_input: "{{readLines('{{Output}}/vulns.txt')}}"
  exports:
    vuln_embeddings: "{{embed_vulns_llm_resp}}"

Structured Output

JSON Schema

- name: extract-findings
  type: llm
  messages:
    - role: user
      content: |
        Extract vulnerabilities from this report:
        {{readFile("{{Output}}/scan-report.txt")}}
  response_format:
    type: json_schema
    json_schema:
      name: vulnerabilities
      schema:
        type: object
        properties:
          findings:
            type: array
            items:
              type: object
              properties:
                title:
                  type: string
                severity:
                  type: string
                  enum: ["critical", "high", "medium", "low"]
                description:
                  type: string
        required: ["findings"]
  exports:
    structured_findings: "{{extract_findings_content}}"

Configuration Override

Per-Step Config

- name: local-analysis
  type: llm
  llm_config:
    provider: ollama
    model: llama2
    max_tokens: 2048
    temperature: 0.5
    stream: true
  messages:
    - role: user
      content: Analyze {{target}}

Extra Parameters

- name: creative-analysis
  type: llm
  messages:
    - role: user
      content: Write a security assessment for {{target}}
  extra_llm_parameters:
    temperature: 0.9
    top_p: 0.95
    frequency_penalty: 0.5

Multimodal Content

Image Analysis

- name: analyze-screenshot
  type: llm
  messages:
    - role: user
      content:
        - type: text
          text: Analyze this screenshot for security issues.
        - type: image_url
          image_url:
            url: "file://{{Output}}/screenshot.png"

Streaming

Both llm and agent steps support streaming output via the stream field:
- name: stream-analysis
  type: llm
  stream: true
  messages:
    - role: user
      content: Analyze {{target}} in detail.
The stream field overrides both llm_config.stream and the global config setting.

Agent Step Type

The agent step type provides an agentic LLM execution loop — the LLM iteratively calls tools, processes results, and reasons until completion. This is fundamentally different from the single-shot llm step.

Basic Agent

- name: recon-agent
  type: agent
  query: "Enumerate subdomains of {{Target}} and identify interesting services."
  system_prompt: "You are an expert security reconnaissance agent."
  max_iterations: 15
  agent_tools:
    - preset: bash
    - preset: read_file
    - preset: save_content
  exports:
    findings: "{{agent_content}}"
FieldTypeRequiredDescription
querystringYes*The task prompt for the agent
queriesstring[]Yes*Multiple goals executed sequentially
system_promptstringNoSystem prompt for the agent
max_iterationsintYesMaximum tool-calling loop iterations (must be > 0)
agent_toolsAgentToolDef[]NoTools available to the agent
Either query (single goal) or queries (multi-goal) is required, not both.

Preset Tools

Preset tools reference built-in osmedeus functions with auto-generated schemas:
agent_tools:
  - preset: bash
  - preset: read_file
  - preset: grep_regex
PresetDescription
bashExecute a shell command and return its output
read_fileRead the contents of a file
read_linesRead a file and return its contents as an array of lines
file_existsCheck if a file exists at the given path
file_lengthCount the number of non-empty lines in a file
append_fileAppend content from source file to destination file
save_contentWrite string content to a file (overwrites if exists)
globFind files matching a glob pattern
grep_stringSearch a file for lines containing a string
grep_regexSearch a file for lines matching a regex pattern
http_getMake an HTTP GET request and return the response
http_requestMake an HTTP request with specified method, headers, and body
jqQuery JSON data using jq expression syntax
exec_pythonRun inline Python code and return stdout
exec_python_fileRun a Python file and return stdout
exec_tsRun inline TypeScript code via bun and return stdout
exec_ts_fileRun a TypeScript file via bun and return stdout
run_moduleRun an osmedeus module as a subprocess
run_flowRun an osmedeus flow as a subprocess

Custom Tools

Define custom tools with explicit schemas and JavaScript handlers:
agent_tools:
  - preset: bash
  - name: check_port
    description: "Check if a port is open on a host"
    parameters:
      type: object
      properties:
        host:
          type: string
          description: "Target hostname or IP"
        port:
          type: integer
          description: "Port number to check"
      required: ["host", "port"]
    handler: |
      exec("nc -zv -w3 " + args.host + " " + args.port)
The handler is a JavaScript expression. The parsed tool call arguments are available as the args object.

Multi-Goal Execution

Use queries to run the agent through multiple goals sequentially. Each goal is executed in order, and all results are collected:
- name: full-recon
  type: agent
  queries:
    - "Discover all subdomains of {{Target}}"
    - "Identify web services running on discovered subdomains"
    - "Check for common misconfigurations on each service"
  system_prompt: "You are a thorough security auditor."
  max_iterations: 20
  agent_tools:
    - preset: bash
    - preset: read_file
    - preset: save_content
  exports:
    all_results: "{{agent_goal_results}}"
    final_output: "{{agent_content}}"
The agent_goal_results export contains results from all goals as a JSON array.

Planning Stage

Add a planning phase before the main execution loop. The agent first generates a plan, then executes it:
- name: planned-scan
  type: agent
  query: "Perform a comprehensive security assessment of {{Target}}"
  plan_prompt: |
    Create a step-by-step plan for assessing {{Target}}.
    Consider: subdomain enumeration, service detection, vulnerability scanning.
  plan_max_tokens: 1000
  max_iterations: 25
  agent_tools:
    - preset: bash
    - preset: read_file
    - preset: save_content
  exports:
    plan: "{{agent_plan}}"
    results: "{{agent_content}}"
FieldTypeDescription
plan_promptstringPrompt for the planning phase (triggers plan generation before main loop)
plan_max_tokensintMax tokens for the plan response

Memory Management

Control conversation context size for long-running agents:
- name: long-running-agent
  type: agent
  query: "Perform deep reconnaissance on {{Target}}"
  max_iterations: 50
  memory:
    max_messages: 30
    summarize_on_truncate: true
    persist_path: "{{Output}}/agent/conversation.json"
    resume_path: "{{Output}}/agent/conversation.json"
  agent_tools:
    - preset: bash
    - preset: read_file
    - preset: save_content
FieldTypeDefaultDescription
max_messagesint0 (unlimited)Sliding window size; oldest non-system messages are dropped when exceeded
summarize_on_truncateboolfalseUse LLM to summarize dropped messages instead of silently discarding them
persist_pathstringSave conversation JSON after completion
resume_pathstringLoad a prior conversation on start (enables continuation across runs)

Model Preferences

Specify preferred models tried in order. Falls back to the default provider config if none are available:
- name: smart-agent
  type: agent
  query: "Analyze complex target architecture for {{Target}}"
  max_iterations: 10
  models:
    - claude-3-opus
    - gpt-4
    - claude-3-sonnet
  agent_tools:
    - preset: bash

Structured Output (Agent)

Enforce a JSON schema on the agent’s final output using output_schema:
- name: structured-agent
  type: agent
  query: "Find all open ports and services on {{Target}}"
  max_iterations: 15
  output_schema: '{"type":"object","properties":{"ports":{"type":"array","items":{"type":"object","properties":{"port":{"type":"integer"},"service":{"type":"string"},"version":{"type":"string"}}}},"summary":{"type":"string"}},"required":["ports","summary"]}'
  agent_tools:
    - preset: bash
    - preset: save_content
  exports:
    structured_results: "{{agent_content}}"
The schema is enforced on the final iteration via the OpenAI response_format parameter.

Sub-Agents

Define inline sub-agents that the parent agent can spawn via the auto-generated spawn_agent tool:
- name: coordinator
  type: agent
  query: "Assess {{Target}} using specialized sub-agents for each phase."
  system_prompt: "You are a coordinator agent. Delegate tasks to specialized sub-agents."
  max_iterations: 10
  max_agent_depth: 3
  agent_tools:
    - preset: read_file
    - preset: save_content
  sub_agents:
    - name: subdomain-scanner
      description: "Discovers subdomains for a target domain"
      system_prompt: "You are a subdomain enumeration specialist."
      max_iterations: 10
      agent_tools:
        - preset: bash
        - preset: save_content

    - name: vuln-checker
      description: "Checks for vulnerabilities on discovered services"
      system_prompt: "You are a vulnerability assessment specialist."
      max_iterations: 10
      agent_tools:
        - preset: bash
        - preset: read_file
      output_schema: '{"type":"object","properties":{"vulnerabilities":{"type":"array"}}}'
  exports:
    assessment: "{{agent_content}}"
When sub_agents is defined, a spawn_agent tool is automatically added with parameters:
  • agent — Name of the sub-agent to spawn (from the defined list)
  • query — The task to delegate
Sub-agents support recursive nesting (sub-agents can define their own sub_agents). Use max_agent_depth to control nesting depth (default: 3).

Stop Condition

A JavaScript expression evaluated after each iteration. If it returns true, the agent stops:
- name: targeted-scan
  type: agent
  query: "Find the admin panel for {{Target}}"
  max_iterations: 20
  stop_condition: 'agent_content.includes("admin") && iteration > 3'
  agent_tools:
    - preset: bash
    - preset: read_file
Available variables in the expression: agent_content (current response text), iteration (current iteration number).

Tool Tracing Hooks

JavaScript expressions executed before and after each tool call for logging or debugging:
- name: traced-agent
  type: agent
  query: "Scan {{Target}}"
  max_iterations: 10
  on_tool_start: 'log_info("Calling tool: " + tool_name + " with: " + tool_args)'
  on_tool_end: 'log_info("Tool " + tool_name + " returned: " + tool_result.substring(0, 200))'
  agent_tools:
    - preset: bash
    - preset: read_file
HookAvailable Variables
on_tool_starttool_name, tool_args
on_tool_endtool_name, tool_args, tool_result

Parallel Tool Calls

By default, agents allow the LLM to make multiple tool calls in parallel. Disable this for sequential execution:
- name: sequential-agent
  type: agent
  query: "Carefully test {{Target}} one step at a time"
  max_iterations: 10
  parallel_tool_calls: false
  agent_tools:
    - preset: bash

Agent Exports

All exports available from agent steps:
ExportTypeDescription
agent_contentstringFinal text response from the agent
agent_historyJSONFull conversation history
agent_iterationsintNumber of iterations executed
agent_total_tokensintTotal tokens consumed
agent_prompt_tokensintPrompt tokens consumed
agent_completion_tokensintCompletion tokens consumed
agent_tool_resultsJSONAll tool call results
agent_planstringPlan content (when plan_prompt is used)
agent_goal_resultsJSONResults from each goal (when queries is used)
exports:
  report: "{{agent_content}}"
  history: "{{agent_history}}"
  stats: "{{agent_iterations}}"
  plan: "{{agent_plan}}"

API Endpoint

OpenAI-compatible API:
# Chat completion
curl -X POST http://localhost:8002/osm/api/llm/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Analyze this vulnerability: ..."}
    ]
  }'

# Embeddings
curl -X POST http://localhost:8002/osm/api/llm/v1/embeddings \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-ada-002",
    "input": ["text to embed"]
  }'

Providers

OpenAI

llm:
  llm_providers:
    - provider: openai
      base_url: "https://api.openai.com/v1"
      auth_token: "sk-..."
      model: gpt-4

Anthropic

llm:
  llm_providers:
    - provider: anthropic
      base_url: "https://api.anthropic.com/v1"
      auth_token: "sk-ant-..."
      model: claude-3-opus

Ollama (Local)

llm:
  llm_providers:
    - provider: ollama
      base_url: "http://localhost:11434"
      model: llama2

Azure OpenAI

llm:
  llm_providers:
    - provider: azure
      base_url: "https://your-resource.openai.azure.com"
      auth_token: "..."
      model: gpt-4

Multiple Providers (Rotation)

Configure multiple providers for automatic rotation:
llm:
  llm_providers:
    - provider: openai
      base_url: "https://api.openai.com/v1"
      auth_token: "sk-..."
      model: gpt-4
    - provider: anthropic
      base_url: "https://api.anthropic.com/v1"
      auth_token: "sk-ant-..."
      model: claude-3-opus
    - provider: ollama
      base_url: "http://localhost:11434"
      model: llama2

Workflow Functions

Use LLM functions directly in function steps without the full llm step type.

llm_invoke

Simple LLM call with a direct message:
- name: quick-summary
  type: function
  function: "llm_invoke('Summarize these findings: ' + readFile('{{Output}}/vulns.txt'))"
  exports:
    summary: "{{_result}}"

llm_invoke_custom

LLM call with a custom POST body template. Use {{message}} as a placeholder:
- name: custom-analysis
  type: function
  function: |
    llm_invoke_custom(
      'Analyze this target: {{Target}}',
      '{"model": "gpt-4", "temperature": 0.5, "messages": [{"role": "user", "content": "{{message}}"}]}'
    )

llm_conversations

Multi-turn conversation using role:content format:
- name: conversation
  type: function
  function: |
    llm_conversations(
      'system:You are a security analyst.',
      'user:What are common web vulnerabilities?',
      'assistant:Common web vulnerabilities include SQL injection, XSS, CSRF...',
      'user:How do I test for SQL injection?'
    )
  exports:
    response: "{{_result}}"

Use Cases

Vulnerability Analysis

- name: analyze-vulns
  type: llm
  messages:
    - role: system
      content: |
        You are a security expert. Analyze vulnerabilities and provide:
        1. Risk assessment
        2. Impact analysis
        3. Remediation steps
    - role: user
      content: "{{readFile('{{Output}}/nuclei-results.json')}}"

Report Generation

- name: generate-report
  type: llm
  messages:
    - role: user
      content: |
        Generate a security assessment report for {{target}}.

        Subdomains found: {{fileLength("{{Output}}/subs.txt")}}
        Live hosts: {{fileLength("{{Output}}/live.txt")}}
        Vulnerabilities: {{readFile("{{Output}}/vulns.txt")}}
  exports:
    report: "{{generate_report_content}}"

- name: save-report
  type: bash
  command: echo "{{report}}" > {{Output}}/report.md

Intelligent Filtering

- name: filter-false-positives
  type: llm
  messages:
    - role: system
      content: |
        Analyze these findings and mark false positives.
        Return JSON: {"valid": [...], "false_positives": [...]}
    - role: user
      content: "{{readFile('{{Output}}/findings.json')}}"
  response_format:
    type: json_object

Autonomous Reconnaissance Agent

- name: auto-recon
  type: agent
  query: |
    Perform reconnaissance on {{Target}}:
    1. Enumerate subdomains
    2. Check for live hosts
    3. Identify web technologies
    4. Save a summary report to {{Output}}/agent-report.md
  system_prompt: "You are an autonomous security reconnaissance agent with access to common security tools."
  max_iterations: 30
  memory:
    max_messages: 40
    summarize_on_truncate: true
    persist_path: "{{Output}}/agent/recon-memory.json"
  agent_tools:
    - preset: bash
    - preset: read_file
    - preset: save_content
    - preset: glob
    - preset: file_exists
  exports:
    recon_report: "{{agent_content}}"

Best Practices

  1. Use system prompts for consistent behavior
  2. Limit context size — summarize large inputs before passing to LLM
  3. Set max_iterations appropriately — higher for complex tasks, lower for simple queries
  4. Enable memory management for long-running agents to avoid context overflow
  5. Use structured output when you need to parse the response programmatically
  6. Consider local models (Ollama) for sensitive data that shouldn’t leave your network
  7. Use sub-agents to decompose complex tasks into specialized subtasks
  8. Add stop_condition when the agent has a clear success criteria
  9. Use plan_prompt for complex tasks that benefit from upfront planning

Next Steps