Documentation Index
Fetch the complete documentation index at: https://docs.oktolabs.ai/llms.txt
Use this file to discover all available pages before exploring further.
Knowledge Graph health & migration
The KG consolidation pipeline is a queue-driven background worker. When it gets stuck — embedder timeouts, schema drift, slow disk — the agent stops being able to write decisions to the graph. Five MCP admin tools surface the pipeline’s state and let an operator unblock it without touching the database file:| Tool | Line | Use it when… |
|---|---|---|
okto_pulse_kg_health | 12484 | You want a single-call summary: queue depth, oldest pending age, dead-letter count, decay tick freshness, and relevance score health. |
okto_pulse_kg_dead_letter_list | 12527 | Consolidations are failing and you need to see what bounced. |
okto_pulse_kg_dead_letter_reprocess | 12570 | The underlying issue is fixed and you want to retry the failed entries. |
okto_pulse_kg_migrate_schema | 12643 | You upgraded Pulse and need to bring the board’s graph.lbug to the current schema version. |
okto_pulse_kg_tick_run_now | 12716 | You don’t want to wait for the daily decay tick to recompute relevance scores. |
kg_health and kg_dead_letter_list are read-only and ride on the standard kg.query.* / kg.admin.settings_read flags most presets ship with. The two that mutate state — kg_dead_letter_reprocess and kg_migrate_schema (and the historical-consolidation CLI path) — are gated by kg.admin.historical_consolidation and the broader kg.admin.* namespace. Operator presets layer these in deliberately. Source: okto-pulse-core/src/okto_pulse/core/mcp/server.py:12484–12857 and core/infra/permissions.py:PERMISSION_REGISTRY under the kg.admin key. Citations: 80-pulse-feature-inventory.md:493–501.
For consolidation flow itself, see consolidation. For the schema, see overview.
okto_pulse_kg_health
The board’s pipeline health summary. The Pulse dashboard polls this every 30 seconds.
okto-pulse-core/src/okto_pulse/core/services/kg_health_service.py:get_kg_health. Real shape:
server.py:12484 and the service implementation):
| Field | What it means |
|---|---|
queue_depth | Pending consolidation rows in the SQLite queue. High values mean enqueues are landing faster than the worker drains them. |
oldest_pending_age_s | Age, in seconds, of the oldest pending consolidation row. null when the queue is empty. |
dead_letter_count | Rows that exceeded kg_queue_max_attempts (default 5). Inspect with kg_dead_letter_list. |
total_nodes | Total node count in the board’s graph.lbug. |
default_score_count / default_score_ratio | Nodes still at the default relevance_score (never recomputed). Ratio above ~0.7 means the decay tick isn’t keeping up — see kg_tick_run_now. |
avg_relevance | Mean relevance_score across all nodes. |
top_disconnected_nodes | Lowest-degree nodes (by edge count). Useful for spotting orphaned consolidations. |
schema_version | Health response schema version. Use kg_migrate_schema or kg_schema_info when checking the graph schema itself. |
contradict_warn_count | Running count of contradict_penalty cap events. A spike means the curator should reconcile. |
last_decay_tick_at | Most recent decay-tick run. |
nodes_recomputed_in_last_tick | Number of nodes recomputed by the most recent decay tick. |
cli.py:683–744 (cmd_verify_pipeline) runs 5 layered checks: queue depth, graph file presence + node count, graph-vs-SQLite ref mirror, outbox staleness, global discovery file. Exit code 0 if healthy, 1 if any layer fails. Use the CLI for monitoring scripts; use kg_health from agents.
okto_pulse_kg_dead_letter_list
Consolidation entries that exceeded kg_queue_max_attempts (default 5) land in the dead-letter table. Pulse never auto-reprocesses them — an operator must inspect and replay.
okto_pulse_kg_dead_letter_reprocess
Move dead-letter entries back to the active queue for another attempt. Use after the underlying cause is resolved (embedder up, disk space available, schema migrated).
attempts counter back to 0 for each requeued entry. If the same root cause persists, entries will land back in dead-letter after another kg_queue_max_attempts failures.
okto_pulse_kg_migrate_schema
Run schema migrations on a board’s graph.lbug. Use after upgrading okto-pulse-core to a version with a higher schema version than the file on disk.
target_version returns migrations_applied: [] and ok: true.
okto_pulse_kg_tick_run_now
Trigger the decay tick worker immediately instead of waiting for the schedule (default daily, kg_decay_tick_interval_minutes = 1440). The tick recomputes relevance_score for nodes whose last_recomputed_at is older than kg_decay_tick_staleness_days (default 7).
kg/workers/kg_decay_tick.py. Note that find_similar_decisions uses a separate search reranking formula at retrieval time — do not conflate the two (80-pulse-feature-inventory.md:957).
Hot-reloadable settings
You don’t need to restart Pulse to change pipeline tuning:| Setting group | Hot-reload mechanism |
|---|---|
kg_queue_* (worker count, claim timeout, max attempts, alert threshold, recovery scan) | APScheduler re-reads the value with a 5-second debounce. No action required. |
kg_decay_tick_* (interval, staleness, max age) | PUT /settings/runtime — applies on next tick boundary. |
80-pulse-feature-inventory.md:790. The full settings table lives in Knowledge Graph and the inventory.
CLI fallbacks
The MCP tools are the primary surface, but three CLI commands cover deeper recovery scenarios:okto-pulse verify-pipeline <board_id>
cli.py:683–744. Wraps the same 5 checks as kg_health but exits 1 on failure — useful in CI / monitoring scripts.
okto-pulse kg backfill <board_id>
cli.py:747–902. Runs the Layer 1 deterministic KG worker against every artifact on the board.
okto-pulse kg dedup-entities <board_id>
cli.py:908–936. Consolidate duplicate nodes per (node_type, source_artifact_ref).
Underlying REST endpoints
The MCP tools wrap REST endpoints exposed by the API server. They are documented here for completeness — most callers should prefer the MCP tools.| Method | Path | Equivalent MCP tool |
|---|---|---|
GET | /kg/health | kg_health |
GET | /kg/queue/health | kg_health (subset) |
POST | /kg/tick/run-now | kg_tick_run_now |
GET | /kg/dead-letter/... | kg_dead_letter_list |
POST | /kg/dead-letter/reprocess | kg_dead_letter_reprocess |
80-pulse-feature-inventory.md:729–735.
Common operational scenarios
”Consolidations stopped landing”
- Call
kg_health— checkqueue_depth,oldest_pending_age_s, anddead_letter_count. - If
dead_letter_count > 0, callkg_dead_letter_listto see error reasons. - Fix the root cause (embedder, disk, schema mismatch).
- Call
kg_dead_letter_reprocesswith the entry ids. - Re-call
kg_health. Expectdead_letter_count: 0andqueue_depthdraining.
”Just upgraded Pulse, dashboard shows schema drift”
- Inspect the graph schema with
kg_schema_infoor run the migration tool when a schema error points to drift. - Take a backup of
graph.lbug. - Call
kg_migrate_schemawith defaulttarget_version(= runtime version). - Re-check the graph schema. Expect it to match the runtime schema.
”Relevance scores look stale”
- Call
kg_tick_run_now. Expectnodes_recomputed > 0. - If always 0 nodes recomputed, increase
kg_decay_tick_staleness_dayslower bound or check that nodes are being touched at all.
”Need to rebuild the deterministic skeleton from scratch”
- CLI only:
okto-pulse kg backfill <board_id>(dry-run) — review. okto-pulse kg backfill <board_id> --apply.- Call
kg_health— confirmgraph_node_refsis balanced.
Next steps
Consolidation
The 7 transactional write primitives the queue is feeding.
Archive & retention
Cascading entity archive, supersedence as soft-archive, and KG retention policy.