arun.v

Agentic · Vehicle Telemetry · 2025

CollectMind

Agentic vehicle telemetry policy engine. Four-node LangGraph validates policies against COVESA VSS and deploys behind a confidence gate.

Lead Engineer
AI / MLPlatform
Policy validation
COVESA VSS + ECU constraints
Deployment gate
confidence-scored
CI gates
VSS pass rate, schema completeness

Problem

Vehicle telemetry policy lives at an awkward intersection: the rules change frequently, the rules have to be validated against an industry standard (COVESA VSS) and the specific ECU capability set on each vehicle, and every policy decision has to be auditable down to the version that produced it. A pure LLM agent is too non-deterministic; a pure rules engine is too rigid for the way operators want to express intent.

Approach

Use the LLM at the boundary, never in the deployment loop.

  • Four-node LangGraph state machine. Orchestrator plans, Generator produces a typed policy, Validator checks it against COVESA VSS and the target ECU capabilities, and Deployer ships it behind a confidence gate. Each node is a checkpoint with rollback semantics.
  • Compile, do not interpret. Policies are compiled, validated, and versioned in an immutable PostgreSQL registry. The vehicle never executes a free-form LLM output. It executes a versioned policy that has passed the gates.
  • Confidence-gated deployment. Each generated policy carries a confidence score. Below the threshold, the policy is queued for human review rather than auto-deployed.
  • Streaming ingest, two-tier store. Telemetry flows in over Kafka, lands in Redis for the hot window, ages into TimescaleDB for the cold window. Isolation Forest watches for anomalies in real time so a misbehaving policy is caught before it becomes a trend.
  • CI gates. RAGAS-style automated checks on every PR: VSS pass rate, schema completeness, and policy round-trip stability. A regression on any gate fails the build.

Stack

  • Agent / planner: Python, LangGraph, GPT-4o for the natural-language to typed-policy step.
  • Runtime: FastAPI, PostgreSQL for the policy registry, TimescaleDB for telemetry cold store, Redis for hot windows, Apache Kafka for ingest.
  • ML: Isolation Forest for unsupervised telemetry anomaly detection.
  • Observability: Structured logs with policy version ids, tracing across the four-node graph.

Outcomes

  • Operators evolve telemetry policy without engineering involvement on the happy path.
  • Every deployed policy is validated against COVESA VSS and the target ECU before it leaves the registry.
  • Confidence-gated deployment means low-confidence outputs queue for human review rather than auto-shipping.
  • CI gates catch regressions at compile time, not in the field.

Lessons

  • The hardest part of an agentic system is not the agent. It is the registry of safe, vetted operations.
  • Determinism per policy version is a far stronger guarantee than "we use temperature 0".
  • Validation against an industry standard and the device-specific capability set is non-negotiable for any system that ships to fleet hardware.

Stack

PythonLangGraphGPT-4oFastAPIPostgreSQLTimescaleDBRedisApache KafkaIsolation ForestRAGASCOVESA VSS

Highlights

  • Four-node LangGraph state machine (Orchestrator, Generator, Validator, Deployer) autonomously generates, validates, versions, and deploys vehicle telemetry collection policies on a confidence-gated path.
  • Validator node checks every generated policy against COVESA VSS plus per-ECU capability constraints; only policies that pass both gates reach the immutable PostgreSQL registry.
  • Streaming ingest on Apache Kafka with Isolation Forest anomaly detection, Redis hot store, and TimescaleDB cold store handles fleet-scale telemetry without losing recent windows on the hot path.
  • CI-enforced RAGAS gates block PRs on VSS pass rate and schema completeness, so a regression cannot ship even if the LLM regresses upstream.