Agentic workflow shift: artifacts, memory, and the spectrogram war

After my previous workflow post, I moved from "agent chats + intuition" to a much stricter system: artifact-driven development with explicit memory and repeatable validation.

I still switch between tools, but now each one has a role, and the hand-offs are intentional.

What changed in my agentic workflow

The biggest shift is that I stopped treating context as something I can keep in my head. Sessions end, tokens compact, details are lost. So now I externalize everything important.

In PeakTrace, the base contract is in AGENTS.md, plus concrete execution rules in .agent/rules/mcp-usage.md and the Serena memories (style_and_conventions, task_completion_checklist, spectrogram journal, etc.).

In .cursor/mcp.json I keep the MCP wiring (serena + context7), but Serena is the one I rely on as the daily driver for code intelligence and memory continuity.

From there, this is my current split:

Serena is my main MCP backbone. It solves the memory continuity problem much better for code work: symbol-level navigation, persistent memory, and strict discipline to read previous context before touching fragile areas.
Claude Code with Opus is my heavy reasoning and design partner. I use it for architecture sketching, complex review, and inference on messy problems where framing matters as much as coding.
Codex does most of the production work now: implementation, targeted fixes, code review loops, and bug resolution. On hard debugging and risky refactors, I run it on xhigh effort settings.
I also run small local experiments with opencode + ollama + gemma4 to offload simple tasks. It works for lightweight chores, but efficiency is still mediocre compared to cloud models for non-trivial tasks.

So practically, Opus is often the "strategist", Codex is the "primary executor", but I like to clash them on some performance audit or complex refactoring tasks, and Serena is the memory + navigation substrate that keeps the whole flow coherent.

Artifacts became non-optional

The second big shift: I treat artifacts as core engineering infrastructure, not documentation overhead.

I maintain:

plans/ for approved active plans
plans/done/ for implemented plans (moved there once work starts/finishes)
audits/ for problem analysis and focused investigation reports, often done by several models in parallel (gpt-5.3-codex xhigh vs claude opus 4.5 high vs gemini-3.1-pro-preview) and then reconciled by Codex
living docs in docs/ (not changelog-style append dumps)
Serena memories for session rules and long-running context

This changed how agents behave. Instead of re-discovering the same facts in every session, they start from the current system state. And because there is a defined lifecycle for plans and audits, I can avoid parallel-plan collisions and stale assumptions.

For complex DSP/UI zones, I keep dedicated learning guides as living references. This is very important for me personally: I need a place where the current truth is explained in a way I can re-learn quickly, not just execute blindly.

Spectrogram example: quality first, then performance

Spectrogram has been my longest struggle area. Not because one bug was hard, but because it was a chain of interacting problems: quality artifacts, temporal precision, wrap behavior, and then render cost.

The breakthrough was procedural, not magical:

Persistent action log (.serena/memories/spectrogram/dev_journal.md) to track change -> hypothesis -> observed result -> verdict.
Cross-agent load sharing where reasoning/design and implementation/review were separated by strengths.
Hard constraints documented and treated as invariants, so we stop repeating known-bad ideas.

On quality side, we went through issues like stale wrap spans, left-edge duplication, seam artifacts, and scroll desync. The journal preserved causal history, so when a fix looked "reasonable" but had already failed (or caused stutter), we caught it fast.

On precision side, we aligned the pipeline around stable invariants (chronological ingest, strict merge policy ownership, DSP-owned display mapping, stable scroll/update ordering). That removed a lot of "looks okay in one scene, breaks in another" behavior.

Then came performance.

We introduced a small performance harness framework around component scenarios, deterministic fixtures, warm-up rules, stage profiling, and artifact output (raw.json, matrix.csv, summary.md, plus audit reports). This was injected into the normal dev flow, not treated as occasional benchmarking.

That gave two very important benefits:

We could reason stage-by-stage (G* scopes), not just "it feels faster/slower".
Agents could compare artifact sets and discuss regressions with concrete evidence.

Part of this setup uses my own crate timetrap.

It already helps, and I expect to evolve it more, because the new perf-testing challenges are exposing what should be improved next.

Learning stack I now rely on

Three docs are now central for me:

docs/spectrogram_development_learning_guide.md
docs/performance.md
docs/optimizations.md

Together they serve two roles:

Operational reference for agents and for future me.
Learning material so I actually understand the system better over time.

This matters because I don't want "vibe-coding" where the project grows but my understanding shrinks.

Why this matters to me

My current philosophy is simple: AI models are trained on the cumulative output of humans at huge scale. The average model behavior is not perfect, but it aggregates far more patterns than any one engineer can hold.

So for me the right strategy is not denial, and not blind delegation either.

I want to use AI aggressively, but also learn from it, validate it, and keep real ownership of architecture and trade-offs.

I want to:

learn
develop
automate
speed up
evolve with the technology

and still understand what the code is doing.

That is the shift.

To be continued...