Hi, Johan here.
My 2026 coding setup is getting a little ridiculous.
I used to think the big upgrade was going from VS Code to Cursor, then from Cursor to Claude Code, then from Claude Code to Codex, and so on. But the real shift was not the editor. The real shift was realizing I do not want a smarter autocomplete. I want a small operating system for agents.
Which is how I ended up using OMC and OMX a lot.
OMC is the Claude Code side. It gives Claude Code workflows, skills, teams, slash commands, HUD stuff, and a bunch of "please stop making me manually orchestrate this" quality-of-life layers. The repo literally says not to learn Claude Code, just use OMC, which is funny but also kind of accurate.
OMX is the Codex side. It keeps Codex as the engine, then adds the work layer around it: deep-interview, ralplan, team, ralph, project guidance, logs, memory, state in .omx/, all that good operator stuff.
Translation: OMC is how I make Claude Code less like a one-off chat box. OMX is how I make Codex less like a one-off chat box. Both are basically saying the same thing:
Stop prompting. Start operating.
The Ralph Loop Changed My Brain
The biggest feature for me is Ralph.
Before Ralph, my coding flow was: ask agent to do thing, agent does 80% of thing, I find the weird missing 20%, I sigh, I fix it, repeat. Classic "wow AI is amazing and also why am I babysitting a very confident intern" energy.
Ralph changes the posture. The point is not "answer my prompt." The point is "continue until the task is complete and verified." OMC has Ralph as an in-session persistence mode. OMX has $ralphas part of the canonical workflow surface. The exact mechanics differ, but the vibe is the same: do the work, check the work, fix the work, do not silently stop at a half-finished result.
That sounds simple, but it changes how I write tasks. I no longer think only about the first prompt. I think about the loop the agent is about to live inside.
Harness Engineering
The phrase I keep coming back to is harness engineering:
Anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.
This is honestly the whole game.
If the agent forgets to run tests, I do not just type "run tests next time" and pray. I add a verification rule. If it edits files outside the task, I add ownership boundaries. If it hallucinates library APIs, I force docs retrieval into the flow. If it writes AI slop, I add a cleanup pass. If it keeps losing context, I write the decision into a skill, note, or project instruction.
The first time an agent makes a mistake, it is a bug. The second time, it is my fault for not building the harness.
That is a painful sentence because it means my job did not get easier. It got weirder. I spend less time typing code and more time designing the conditions where code can appear safely.
Aristotle Is My Math Side Quest
The other thing I use a lot is Aristotle from Harmonic.
Aristotle is not the same category as Claude Code or Codex. It is more like a formal math/proof brain. Harmonic describes it as an agent that can work on Lean projects, prove or formalize English math problems, and run for a long time without needing someone to poke it every five minutes. Their report also talks about gold-medal-level IMO performance and Monte Carlo search over proof steps.
Which, of course, immediately made me think: wait, is agentic coding just proof search with worse variable names?
You propose a path. Reality rejects it. You add a lemma. You run the checker. You backtrack. You try again. Eventually you either prove the thing or discover the original problem was nonsense. That is basically half my week now.
I Am Becoming an Agent Operator
The funny part is that I still call this vibe coding, but my actual setup is becoming less vibe-y every month.
My day now looks like:
- write the task like a spec, not a wish
- choose the right agent surface
- let it run
- watch where it fails
- turn the failure into a rule, test, hook, or skill
That is why "agent operator" feels more accurate than "programmer" sometimes. I am still technical. I still need taste. I still need to know when the model is confidently driving into a wall. But the leverage is shifting from writing every line to shaping the feedback loop.
And honestly? I kind of love it. It feels like managing a tiny software team where everyone is brilliant, sleep-deprived, overconfident, and needs extremely specific instructions.
Appendix: The Harness Regret Algorithm
Because I cannot end a post without making the situation unnecessarily mathematical, here is the tiny algorithm I have been using in my head.
Every repeated agent mistake has a cost:
where is the mistake type. Maybe it skipped tests. Maybe it used a fake API. Maybe it made the UI look like a SaaS landing page from 2021. Painful either way.
A harness improvement is worth building when the future pain it removes is bigger than the cost of building it:
If , install the harness. Add the test. Write the skill. Add the checklist. Make the verifier annoying. Future you will be grateful, even if present you wants to go outside and touch grass.
So the actual 2026 vibe-coding equation is:
Everyone wants to maximize the numerator. More agents. Bigger models. More parallelism. Cool, sure.
But the denominator is where the money is.