Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.oktolabs.ai/llms.txt

Use this file to discover all available pages before exploring further.

Knowledge Graph health & migration

The KG consolidation pipeline is a queue-driven background worker. When it gets stuck — embedder timeouts, schema drift, slow disk — the agent stops being able to write decisions to the graph. Five MCP admin tools surface the pipeline’s state and let an operator unblock it without touching the database file:
ToolLineUse it when…
okto_pulse_kg_health12484You want a single-call summary: queue depth, oldest pending age, dead-letter count, decay tick freshness, and relevance score health.
okto_pulse_kg_dead_letter_list12527Consolidations are failing and you need to see what bounced.
okto_pulse_kg_dead_letter_reprocess12570The underlying issue is fixed and you want to retry the failed entries.
okto_pulse_kg_migrate_schema12643You upgraded Pulse and need to bring the board’s graph.lbug to the current schema version.
okto_pulse_kg_tick_run_now12716You don’t want to wait for the daily decay tick to recompute relevance scores.
Permissions are dotted-string flags in the granular registry. kg_health and kg_dead_letter_list are read-only and ride on the standard kg.query.* / kg.admin.settings_read flags most presets ship with. The two that mutate state — kg_dead_letter_reprocess and kg_migrate_schema (and the historical-consolidation CLI path) — are gated by kg.admin.historical_consolidation and the broader kg.admin.* namespace. Operator presets layer these in deliberately. Source: okto-pulse-core/src/okto_pulse/core/mcp/server.py:12484–12857 and core/infra/permissions.py:PERMISSION_REGISTRY under the kg.admin key. Citations: 80-pulse-feature-inventory.md:493–501. For consolidation flow itself, see consolidation. For the schema, see overview.

okto_pulse_kg_health

The board’s pipeline health summary. The Pulse dashboard polls this every 30 seconds.
Input:
  board_id: "brd_abc123"
The MCP tool returns a 12-field aggregate computed in-process (cheap to poll). Implemented in okto-pulse-core/src/okto_pulse/core/services/kg_health_service.py:get_kg_health. Real shape:
Output:
{
  "queue_depth":             12,
  "oldest_pending_age_s":    41.8,
  "dead_letter_count":       0,
  "total_nodes":             4178,
  "default_score_count":     312,
  "default_score_ratio":     0.0747,
  "avg_relevance":           0.612,
  "top_disconnected_nodes":  [
    {"node_id": "ent_91", "node_type": "Entity",     "degree": 0},
    {"node_id": "lrn_03", "node_type": "Learning",   "degree": 0}
  ],
  "schema_version":          "1.0",
  "contradict_warn_count":   2,
  "last_decay_tick_at":      "2026-05-07T03:00:00Z",
  "nodes_recomputed_in_last_tick": 388
}
Field meanings (per docstring at server.py:12484 and the service implementation):
FieldWhat it means
queue_depthPending consolidation rows in the SQLite queue. High values mean enqueues are landing faster than the worker drains them.
oldest_pending_age_sAge, in seconds, of the oldest pending consolidation row. null when the queue is empty.
dead_letter_countRows that exceeded kg_queue_max_attempts (default 5). Inspect with kg_dead_letter_list.
total_nodesTotal node count in the board’s graph.lbug.
default_score_count / default_score_ratioNodes still at the default relevance_score (never recomputed). Ratio above ~0.7 means the decay tick isn’t keeping up — see kg_tick_run_now.
avg_relevanceMean relevance_score across all nodes.
top_disconnected_nodesLowest-degree nodes (by edge count). Useful for spotting orphaned consolidations.
schema_versionHealth response schema version. Use kg_migrate_schema or kg_schema_info when checking the graph schema itself.
contradict_warn_countRunning count of contradict_penalty cap events. A spike means the curator should reconcile.
last_decay_tick_atMost recent decay-tick run.
nodes_recomputed_in_last_tickNumber of nodes recomputed by the most recent decay tick.
CLI equivalent — note that the CLI uses a different layered check set rather than this 12-field aggregate:
okto-pulse verify-pipeline brd_abc123 --json
cli.py:683–744 (cmd_verify_pipeline) runs 5 layered checks: queue depth, graph file presence + node count, graph-vs-SQLite ref mirror, outbox staleness, global discovery file. Exit code 0 if healthy, 1 if any layer fails. Use the CLI for monitoring scripts; use kg_health from agents.

okto_pulse_kg_dead_letter_list

Consolidation entries that exceeded kg_queue_max_attempts (default 5) land in the dead-letter table. Pulse never auto-reprocesses them — an operator must inspect and replay.
Input:
  board_id: "brd_abc123"
  limit:    50
  offset:   0
Output:
{
  "board_id": "brd_abc123",
  "total":    3,
  "entries": [
    {
      "id":              "dl_001",
      "source_type":     "spec",
      "source_id":       "spec_007",
      "first_failed_at": "2026-05-06T22:11:04Z",
      "attempts":        5,
      "last_error":      "embedder.timeout: sentence-transformers exceeded 30s",
      "session_id":      "kgs_01HV..."
    },
    {
      "id":              "dl_002",
      "source_type":     "card",
      "source_id":       "card_91",
      "first_failed_at": "2026-05-07T01:22:18Z",
      "attempts":        5,
      "last_error":      "graph.lock_timeout: failed to acquire write lock"
    }
  ]
}

okto_pulse_kg_dead_letter_reprocess

Move dead-letter entries back to the active queue for another attempt. Use after the underlying cause is resolved (embedder up, disk space available, schema migrated).
Input:
  board_id: "brd_abc123"
  entry_ids: ["dl_001", "dl_002"]    # or omit to reprocess all
Output:
{
  "board_id":  "brd_abc123",
  "requeued":  2,
  "skipped":   0,
  "entry_ids": ["dl_001", "dl_002"]
}
The reprocess increments the attempts counter back to 0 for each requeued entry. If the same root cause persists, entries will land back in dead-letter after another kg_queue_max_attempts failures.

okto_pulse_kg_migrate_schema

Run schema migrations on a board’s graph.lbug. Use after upgrading okto-pulse-core to a version with a higher schema version than the file on disk.
Input:
  board_id: "brd_abc123"
  target_version: "0.3.3"           # optional — defaults to the runtime's current schema version
  dry_run: false
Output:
{
  "board_id":      "brd_abc123",
  "from_version":  "0.3.1",
  "to_version":    "0.3.3",
  "migrations_applied": [
    {"id": "0001_add_belongs_to_multi_pair", "took_ms": 412},
    {"id": "0002_hnsw_metric_to_cosine",     "took_ms": 1840}
  ],
  "node_count_before": 4178,
  "node_count_after":  4178,
  "ok": true
}
Migrations are idempotent: a board already at target_version returns migrations_applied: [] and ok: true.
Always take a copy of ~/.okto-pulse/boards/{board_id}/graph.lbug before running a migration with dry_run: false. Migrations rewrite the file in place. The okto-pulse kg backfill --apply flow is a safer rebuild path when a migration corrupts data.

okto_pulse_kg_tick_run_now

Trigger the decay tick worker immediately instead of waiting for the schedule (default daily, kg_decay_tick_interval_minutes = 1440). The tick recomputes relevance_score for nodes whose last_recomputed_at is older than kg_decay_tick_staleness_days (default 7).
Input:
  board_id: "brd_abc123"
Output:
{
  "board_id":         "brd_abc123",
  "started_at":       "2026-05-07T15:30:11Z",
  "completed_at":     "2026-05-07T15:30:14Z",
  "nodes_recomputed": 312,
  "nodes_skipped_fresh": 3866
}
The decay formula is documented in kg/workers/kg_decay_tick.py. Note that find_similar_decisions uses a separate search reranking formula at retrieval time — do not conflate the two (80-pulse-feature-inventory.md:957).

Hot-reloadable settings

You don’t need to restart Pulse to change pipeline tuning:
Setting groupHot-reload mechanism
kg_queue_* (worker count, claim timeout, max attempts, alert threshold, recovery scan)APScheduler re-reads the value with a 5-second debounce. No action required.
kg_decay_tick_* (interval, staleness, max age)PUT /settings/runtime — applies on next tick boundary.
Source: 80-pulse-feature-inventory.md:790. The full settings table lives in Knowledge Graph and the inventory.

CLI fallbacks

The MCP tools are the primary surface, but three CLI commands cover deeper recovery scenarios:

okto-pulse verify-pipeline <board_id>

cli.py:683–744. Wraps the same 5 checks as kg_health but exits 1 on failure — useful in CI / monitoring scripts.
okto-pulse verify-pipeline brd_abc123 --json

okto-pulse kg backfill <board_id>

cli.py:747–902. Runs the Layer 1 deterministic KG worker against every artifact on the board.
# Dry run: report what would be emitted, no writes
okto-pulse kg backfill brd_abc123

# Apply: enqueue all artifacts, drain the consolidation queue
okto-pulse kg backfill brd_abc123 --apply

# Filter to one artifact type
okto-pulse kg backfill brd_abc123 --apply --artifact-type spec
This is the recovery path when the graph is structurally out of sync (e.g., after a partial migration or a manual file restore). It rebuilds the deterministic skeleton; it does not replay cognitive-agent decisions.

okto-pulse kg dedup-entities <board_id>

cli.py:908–936. Consolidate duplicate nodes per (node_type, source_artifact_ref).
okto-pulse kg dedup-entities brd_abc123 --dry-run
okto-pulse kg dedup-entities brd_abc123
kg dedup-entities writes by default. Always run with --dry-run first.

Underlying REST endpoints

The MCP tools wrap REST endpoints exposed by the API server. They are documented here for completeness — most callers should prefer the MCP tools.
MethodPathEquivalent MCP tool
GET/kg/healthkg_health
GET/kg/queue/healthkg_health (subset)
POST/kg/tick/run-nowkg_tick_run_now
GET/kg/dead-letter/...kg_dead_letter_list
POST/kg/dead-letter/reprocesskg_dead_letter_reprocess
Source: 80-pulse-feature-inventory.md:729–735.

Common operational scenarios

”Consolidations stopped landing”

  1. Call kg_health — check queue_depth, oldest_pending_age_s, and dead_letter_count.
  2. If dead_letter_count > 0, call kg_dead_letter_list to see error reasons.
  3. Fix the root cause (embedder, disk, schema mismatch).
  4. Call kg_dead_letter_reprocess with the entry ids.
  5. Re-call kg_health. Expect dead_letter_count: 0 and queue_depth draining.

”Just upgraded Pulse, dashboard shows schema drift”

  1. Inspect the graph schema with kg_schema_info or run the migration tool when a schema error points to drift.
  2. Take a backup of graph.lbug.
  3. Call kg_migrate_schema with default target_version (= runtime version).
  4. Re-check the graph schema. Expect it to match the runtime schema.

”Relevance scores look stale”

  1. Call kg_tick_run_now. Expect nodes_recomputed > 0.
  2. If always 0 nodes recomputed, increase kg_decay_tick_staleness_days lower bound or check that nodes are being touched at all.

”Need to rebuild the deterministic skeleton from scratch”

  1. CLI only: okto-pulse kg backfill <board_id> (dry-run) — review.
  2. okto-pulse kg backfill <board_id> --apply.
  3. Call kg_health — confirm graph_node_refs is balanced.

Next steps

Consolidation

The 7 transactional write primitives the queue is feeding.

Archive & retention

Cascading entity archive, supersedence as soft-archive, and KG retention policy.
Last modified on May 9, 2026