Back to Blog

The Modularity Divide: How AI Is Splitting Into Two Worlds

A quiet fracture is forming in the AI world—and it's not getting the attention it deserves.

The dominant narrative is simple: bigger models, better results. GPT-5 will be smarter than GPT-4. Gemini Ultra will surpass Claude. The competition, we're told, is a raw capability arms race. But underneath that surface story, something else is happening. A growing body of research, a wave of new developer tooling, and a groundswell of community sentiment are all pointing in the same direction: the future isn't one model to rule them all. It's modular systems of specialized components working together.

This isn't just an architectural preference. It's a fundamental split in how the AI community is thinking about intelligence, and the evidence is piling up fast.

The Problem With One Big Brain

If you've spent any time pushing frontier models to do complex, multi-step work, you've hit the ceiling. Ask a model to plan a research project, write code, debug it, run tests, and write a report—and even the best models stumble. They hallucinate less than they used to, but they still lose track of subgoals, forget constraints from earlier steps, and make high-level planning mistakes that no amount of prompting seems to fix.

The instinct has been to throw more parameters at the problem. If the model can't hold the full plan in context, give it a bigger context window. If it makes mistakes at step 47 of a 50-step process, train it longer. This approach is yielding diminishing returns—and the research community knows it.

Consider what the latest batch of April 2026 arXiv papers reveals when you read them together. They're not all trying to build bigger brains. They're trying to build better architectures for thinking.

The Planning Architecture Papers

In LoHo-Manip (Long-Horizon Manipulation via Trace-Conditioned VLA Planning), researchers tackle a problem that stumps even capable vision-language-action models: multi-step robotic manipulation over long time horizons. Their solution is telling. Instead of asking one model to reason about the entire task, they split it into two decoupled components. A "manager" VLM handles high-level planning—subtask sequencing, progress tracking, done/remaining splits. An "executor" VLA handles local control, following a compact visual trace that tells it where to go next. The key insight: the manager is invoked in a receding-horizon manner, meaning it only ever plans the next chunk of work, not the whole task. Failed steps persist in subsequent outputs and update traces automatically. Closed-loop replanning without hand-crafted recovery logic.

This is modular reasoning applied to robotics. The "brain" doesn't do everything—it delegates, tracks progress separately, and updates its plans based on what actually happened.

Meanwhile, From Research Question to Scientific Workflow (AGH University / Sano Centre) attacks the same problem from a different angle: turning a natural-language research question into an executable computational pipeline. The researchers propose a three-layer architecture. A semantic layer uses an LLM to interpret intent into structured parameters. A deterministic layer generates reproducible workflow DAGs. A knowledge layer encodes domain vocabulary and optimization strategies in expert-authored markdown files called "Skills." The critical move: LLM non-determinism is confined to intent extraction only. Everything downstream is deterministic code. The result is that identical intents always yield identical workflows—reproducibility preserved, without sacrificing the flexibility of natural-language interaction.

And then there's Nemobot, which revisits Claude Shannon's 1950 taxonomy of game-playing machines and extends it with modern LLMs. The paper describes a programmable framework where language models handle different categories of games—dictionary-based, mathematically solvable, heuristic-driven, and learning-based—each requiring fundamentally different reasoning strategies. The insight is that no single prompting scheme works across all categories. The framework programs around the LLM, using it as one component among many in a system designed for specific tasks.

What connects all three? None of them is trying to build a bigger model. They're all trying to build better structure around models.

The Community Echo

The Hacker News crowd, never shy about taking contrarian positions, has been unusually aligned this week. The top story with 914 points and 777 comments is about Microsoft and OpenAI ending their exclusive partnership—a deal structure that was itself predicated on the "one mega-lab, one mega-model" worldview. The top comment frames it as a win for Google (TPUs, the commenter argues, will now be accessible to OpenAI). But the deeper story is infrastructure fragmentation: when the exclusive deal ends, OpenAI's compute strategy has to diversify, and so does everyone else's. The era of betting everything on a single model's road map is ending.

Meanwhile, Reddit's AI communities are processing the arrival of Qwen 3.6 27B and DeepSeek v4 with a familiar ambivalence. The models are impressive. The community is also clearly exhausted by the perpetual churn. "This is where we are right now," reads a top LocalLLaMA post with 3145 points, capturing a mood that's less "wow another model" and more "okay, but what do I actually build with this?" The question being asked, underneath the benchmark excitement, is: how do these models compose into real systems?

That question keeps coming back. A post on r/MachineLearning linking to the "Scientific Theory of Deep Learning" paper (#2604.21691) is generating real discussion about whether we even understand what we've built—and whether scale alone will get us to understanding. It's a different kind of modularity question: not architecture modularity, but explanatory modularity. Can we decompose neural networks into principles we can reason about, or are we stuck treating them as black boxes?

The Developer Infrastructure Wave

GitHub tells its own story of modularity. The most-starred repos this week aren't new models—they're infrastructure. Repomix (24K stars) packs entire repositories into single, AI-friendly files for consumption by other models. MCP for Beginners (16K stars) teaches Microsoft's Model Context Protocol—a standardized interface for connecting AI agents to external tools. NeMo (17K stars), NVIDIA's generative AI framework, is built explicitly for researchers who need to compose multimodal and speech pipelines. Deep-Research-Web-UI (2K stars) provides a web interface for running iterative, multi-step research agents that combine search, scraping, and language model inference.

These aren't model papers. They're wiring. They're what you build when you've accepted that a single model isn't enough, and you need to connect things.

The HN thread on Microsoft's VibeVoice—open-source frontier voice AI—is similarly revealing. Commenters immediately dissect it into component parts (STT/TTS/streaming TTS as separate problems), note where each component is strong or weak, and debate tradeoffs between model size, latency, and multilingual capability. Nobody's treating it as a single "voice AI" black box. The modularity is architectural and conversational.

What This Means

The modularity thesis isn't new—it's been brewing in the agentic AI space for a couple of years. But something feels different about where we are in April 2026. The tooling has matured enough that you can actually build these systems without a research team. The agentic architectures are starting to show real results in the wild. And the limits of pure scale are becoming apparent enough that even the frontier labs are diversifying.

This doesn't mean frontier models are going away. They'll keep getting better. They'll keep being important. But the interesting action—the systems that actually change how work gets done—are the ones that figure out how components compose: planning, execution, memory, tool use, and verification working together in closed loops.

The pattern across the papers, the community discussions, the GitHub repos, and the HN threads is consistent: intelligence isn't a size problem. It's a composition problem. The question isn't how big can we make the model. It's how do we architect systems where the right parts do the right work.

That's the split that's forming. And whoever figures out the composition problem first is going to have a very interesting few years ahead.


Sources

Academic Papers

Hacker News Discussions

Reddit Communities

GitHub Projects

  • Repomix — GitHub, Apr 2026 — 24K stars; AI-friendly repository packing for model consumption
  • MCP for Beginners — Microsoft, Apr 2026 — 16K stars; Model Context Protocol curriculum for agent tooling
  • NeMo — NVIDIA, Apr 2026 — 17K stars; composable generative AI framework for multimodal/speech pipelines
  • Deep-Research-Web-UI — Apr 2026 — 2K stars; iterative multi-step research agent combining search and LLM inference