World Model Infrastructure Lab
Building the infrastructure layer for world-model-driven AI.
I work on the systems layer behind next-generation AI agents: memory, retrieval, model routing, evaluation, local inference, and runtimes that help agents maintain state, simulate outcomes, and act reliably.The future of AI is not just larger language models. It is infrastructure that lets models understand environments, reason across time, and interact with the world.Operating LoopBuilding the systems layer for agents that remember, simulate, and act.
01Observe
02Model
03Simulate
04Act
05Evaluate
06Update
The World Model Infrastructure StackWorld-model-driven AI needs more than a foundation model. It needs infrastructure for state, memory, retrieval, simulation, tool use, model routing, and evaluation. My work explores that connective tissue between models and reliable action.
Layer 01
World-Model Applications
Interfaces where persistent, stateful AI systems surface to users and operators.Relevant work: Applied agent workflows · AI Toolkit utilities · Research notesLayer 02
Agent Runtime
Execution layer for planners, implementers, reviewers, and agent role orchestration.Relevant work: subagent-fleet · Claude Code workflows · MCP explorationLayer 03
State + Memory Layer
Durable context about tasks, users, tools, and prior outcomes over long horizons.Relevant work: awesome-agentic-memory · agentic memory research · photographic memory ideasLayer 04
Retrieval + Context Layer
Backend-agnostic search, filtering, and recall for relevant context at runtime.Relevant work: embenx · RAG workflows · vector backend abstractionLayer 05
Simulation / Prediction Layer
Pre-action reasoning loops, sandboxed what-if evaluation, and future-state planning.Relevant work: Evaluation prototypes · research agenda · planned field notesLayer 06
Tool + Environment Interface
Connectors and protocols that let agents observe, read, write, and act safely.Relevant work: MCP · Claude skills · AI Toolkit · automation workflowsLayer 07
Model Routing + Local/Cloud Inference
Routing policies across local Ollama nodes, hosted models, and specialized backends.Relevant work: subagent-fleet · Ollama · LiteLLM · OpenRouter workflowsLayer 08
Observability + Evaluation
Operational visibility and behavior measurement for systems acting over time.Relevant work: Prompt grader · structured agent evaluation · trace-driven workflowsExplore My Work Through Your LensAditya's work sits in the infrastructure layer around world-model agents: memory, retrieval, model routing, local inference, and tool orchestration. The strongest research signal is the push from prompt chains toward inspectable, stateful systems.
Research Agenda
Agent State
How should AI agents maintain a durable model of users, tasks, tools, goals, and environments?Memory Infrastructure
How should agents retrieve, compress, forget, and update knowledge over long horizons?Simulation Loops
How can agents test possible actions before acting?Model Routing
How should AI systems route between language models, vision models, local models, specialized tools, and simulators?Evals for Agency
How do we evaluate systems that act over time instead of answering one prompt?Local-First Intelligence
How can builders run powerful AI infrastructure without depending entirely on closed platforms?Current SystemsProjects are framed here as research artifacts: each one explores a concrete question in the world-model stack and makes the systems layer more legible.
View full system indexsubagent-fleet
local inference · model routing · coding agents · Ollama · LiteLLMResearch Question
Can local machines become a practical compute fleet for AI coding agents?
System Built
An open-source local AI compute control plane that generates agent definitions, LiteLLM routing config, warmup flows, and a live dashboard from one declarative fleet topology.
Why It Matters
Persistent agent systems get expensive fast. Local-first routing turns spare Macs, workstations, and Ollama nodes into inspectable infrastructure instead of one opaque endpoint.
Status
Active experiment
embenx
retrieval · memory layer · vector backends · MCP · hybrid searchResearch Question
Can retrieval infrastructure for agents become backend-agnostic without losing practical control?
System Built
A Python retrieval library with a unified Collection API across 15+ vector backends, plus metadata filtering, reranking, hybrid search, temporal recall, and an MCP server.
Why It Matters
World-model-driven agents need a swappable and inspectable memory substrate. embenx reduces retrieval glue code while preserving the ability to choose the right storage backend per workload.
Status
Shipping / active
AI Toolkit
tool interface · prompt systems · evals · workflow toolingResearch Question
What lightweight tools make LLM workflows more inspectable and repeatable for builders?
System Built
A set of practical prompt and workflow tools including a prompt grader, intelligent prompt composer, and thread generator for turning vague inputs into more structured model interactions.
Why It Matters
Reliable AI systems need a disciplined interface layer. These tools sharpen prompts, evaluation criteria, and operator workflows before heavier agent runtime infrastructure is added.
Status
Shipping
awesome-agentic-memory
memory research · MCP · ecosystem map · agent frameworksResearch Question
What does the current memory ecosystem reveal about the missing systems layer for agentic AI?
System Built
A curated research and tooling map across agent memory frameworks, MCP servers, vector stores, graph backends, and emerging papers.
Why It Matters
Thought leadership in an emerging category requires ecosystem compression. This project translates a fragmented memory landscape into a clearer infrastructure map.
Status
Active knowledge base
Field NotesEssays and system notes that reinforce the thesis: AI is moving from chat interfaces toward stateful, operational systems that need better infrastructure.
planned field noteThe Missing Infrastructure Layer for World-Model AIFoundation models are not enough for reliable real-world agency. The next category is the infrastructure around them: state, memory, routing, simulation, and evals.
planned field noteFrom RAG to State: Why Agent Memory Is Not Just RetrievalRAG retrieves facts. Agent memory needs to maintain evolving state about users, tools, goals, failures, and plans across time.
planned field noteLocal-First AI Infrastructure for Agent BuildersAs agent workflows become persistent and expensive, local inference and routing become an infrastructure advantage rather than a hobbyist optimization.
published system notesubagent-fleet: Local AI Compute Control Plane for Coding AgentsA local AI control plane can get materially closer to frontier coding quality than most people expect, while preserving privacy and operator control.
published system noteembenx Guide: The Ultimate Python Library for Vector SearchRetrieval logic should outlive any single vector backend, especially for agent memory systems that will evolve as workloads change.
Operating Principles
Useful AI systems need more than better prompts.They need memory that can be inspected.They need tools that can be audited.They need models that can be routed.They need state that can be updated.They need evals that measure behavior over time.They need local-first infrastructure so builders can experiment without waiting for permission.
Open Research ChannelCurrent threads I am actively pushing forward across the world-model stack.
MemoryBackend-agnostic retrieval, temporal recall, and agent memory abstractions that stay portable across storage layers.
RoutingLocal-plus-cloud model routing policies for planner / implementer / reviewer agent roles and cost-aware execution.
ObservabilityTracing, warmup visibility, and evaluation loops that make long-running agent behavior auditable.
System Boot Notes
initializing world model stack...loading memory layer...routing local + cloud models...attaching tool interfaces...starting simulation loop...ready