AI Content Pipeline Deep Dive (4/5): Editing
written by Stefan Christoph
- 15 minutes readThis is Part 4 of a five-part series. Previous parts: Ingestion, Research, Collaborative Writing. This post covers the quality assurance phase that most solo writers skip entirely.
Steps are numbered continuously across the series — editing starts at Step 9 because research, ideation, and writing came first.
Where this post sits in the pipeline — Stage 4 of 5: Editing.
Why Automated Editing Matters
You cannot review your own work effectively.
You know what you meant, so you read what you meant instead of what you wrote. Every writer knows this. Few solo writers solve it. The traditional solution is an editor, a human who reads your work with fresh eyes, challenges your assumptions, catches your blind spots. That works if you publish monthly. It does not scale to three posts a week.
My content pipeline (described in the series overview) replaces the human editor with five systematic passes, each catching a different category of issues. Together, they are more consistent than most human editors. Not more capable on any single pass, but relentless about running every check every time. They never get tired. They never skip steps under deadline pressure. They never let familiarity soften their feedback.
I ran an audit of 32 of my own posts from March through May 2026 against the AI smell checklist below. Two posts scored A. Four scored D. The pipeline was added in mid-April, and the improvement is visible in the data: early posts average C-D, later posts average B-C. Automation makes thoroughness sustainable.
Step 9: Challenge and Improve
The first editing pass generates the hardest questions a reader could ask, then uses the answers to strengthen the article.
The full constraint set from my pipeline:
- Generate 10 challenging questions a critical reader might ask
- Cover: methodology gaps, missing nuance, unstated assumptions,
alternative interpretations
- Provide elaborated answers to each question — honest about limitations
- Save questions + answers in challenging-questions.md
- Review answers and identify insights that strengthen the article
(typically 4-6 improvements)
- Weave insights into the article proactively:
- Acknowledge limitations upfront
- Add missing context
- Strengthen weak arguments
- Address sharpest critiques before readers raise them
- Do NOT make the article defensive — tone: honest and confident
- Prioritize improvements for:
- Sample size / methodology caveats
- Unstated assumptions
- Missing comparisons to alternatives
- Safety / failure mode implications
- Present changes as summary table (Q&A insight → what changed)
What This Looks Like in Practice
When I wrote The 732-Byte Wake-Up Call about the Linux kernel exploit and AI-driven vulnerability discovery, the challenging questions included:
- “You claim this changes the security equilibrium. But hasn’t every new tool shifted the balance? What makes AI different from, say, fuzzing?”
- “Your argument assumes defenders can’t use the same AI tools. Why?”
- “You recommend custom seccomp profiles. What’s the operational cost for a team of 5?”
The answers revealed gaps. The “what makes AI different” question led me to add a paragraph about the asymmetry: attackers need to find one bug, defenders need to prevent all of them. AI amplifies the attacker’s advantage more than the defender’s because thoroughness matters more for offense than defense. That insight was not in my original draft. The challenging question surfaced it.
For Cognitive Debt, the questions pushed me to distinguish between cognitive offloading (strategic, you maintain the ability to verify) and cognitive surrender (you accept without engaging). That distinction became the conceptual backbone of the piece. Without the challenging questions, it would have been a vaguer argument about “teams not understanding their code.”
The key discipline: the questions must be genuinely hard. “What are the benefits of this approach?” is not a challenging question. “Your sample size is 10 items — how do you know this generalizes?” is. The agent generates questions that a skeptical peer reviewer would ask, not questions that a supportive colleague would ask.
Step 10: Reader FAQ
A separate document that tests whether the post explains what it claims.
The full constraint set:
- Generate 8-12 questions a reader would naturally ask after reading
- Cover: "how does this compare to X?", "can I use this for Y?",
"what about Z risk?", practical application questions,
"what's the connection to [related topic]?"
- Answers reference both blog content AND original source material
with specific citations
- Answers go deeper than the blog post itself — this is where
additional research pays off
- Save as faq.md in the draft folder
- Include Sources section listing all referenced materials
The FAQ is not the same as challenging questions. Challenging questions strengthen the article internally. They find weak arguments and fix them. The FAQ is an external-facing companion document. It serves three purposes:
- Quality gate. If a FAQ question reveals the post never defined a key concept, that is a gap to fix before publishing.
- Comment preparation. When readers ask questions on LinkedIn, I often have pre-researched answers ready. The FAQ is my cheat sheet.
- Talk material. If I present on the topic later, the FAQ becomes the Q&A prep document.
The “answers go deeper than the blog post” constraint is deliberate. The blog post is the argument. The FAQ is the supporting evidence that did not fit the narrative. A reader who wants to go deeper finds it here. A reader who just wants the argument gets the post.
Step 11: The AI Smell Check
This is the step people ask about most. Here is the full checklist — copy it, use it, extend it with your own tells as you discover them.
The 13-Category Checklist
The complete AI smell check from my pipeline — copy-paste ready:
AI Smell Check — 13 Categories
Scan the entire draft (excluding code blocks) for:
1. Em dashes (—)
Count in prose (exclude headings, definition lists, source refs).
Red flag: >10 in a 2,500-word article.
Fix: Replace excess with commas, periods, colons, or parentheses.
Target: keep 5-8 for natural emphasis.
2. AI vocabulary
Flag: "delve", "tapestry", "landscape", "leverage", "robust",
"facilitate", "seamlessly", "elevate", "comprehensive",
"straightforward", "fascinating", "remarkable", "groundbreaking",
"nuanced", "navigate", "paradigm", "holistic", "intricate",
"cornerstone", "testament"
Fix: Replace with simpler alternatives.
Exception: "harness" is OK as a technical noun (agent harness,
test harness). Flag only when used as verb meaning "to utilize."
3. Bold sentence headers
Pattern: "**Bold phrase.** Rest of sentence..."
This is a recognizable ChatGPT habit.
Fix: Restructure as regular paragraphs or use ### subheadings.
4. Significance inflation
Flag: "monumental", "transformative", "game-changing",
"revolutionary", "unprecedented", "seismic shift"
Fix: Replace with measured language backed by evidence.
5. Transition word overuse
Flag: "Furthermore", "Additionally", "Moreover",
"It is important to note", "This highlights",
"In today's world", "It goes without saying"
Threshold: max 1 per section.
6. Excessive hedging
Flag: "studies suggest", "experts believe",
"it could be argued", "many would agree"
— without named sources.
Fix: Either cite a specific source or state an opinion directly.
7. Filler phrases
Flag: "It is important to note that",
"This highlights the importance of",
"It cannot be overstated",
"When all is said and done"
Fix: Remove entirely or rework.
8. Template intros
Flag: "In today's rapidly evolving..."
Fix: Rewrite with specific, concrete openers.
9. Hollow conclusions
Flag: "Ultimately, a balanced approach..."
Fix: Replace with actionable or honest statements.
10. Sentence length uniformity
Measure: standard deviation of sentence word counts.
Red flag: StdDev < 5 words (metronomic rhythm).
Target: StdDev > 5 (natural variation — short punches
mixed with longer developing sentences).
11. Paragraph length uniformity
Measure: standard deviation of paragraph word counts.
Red flag: StdDev < 10 words (uniform 4-6 sentence blocks).
Target: StdDev > 10 (mix of single-sentence paragraphs
with longer developed ones).
12. Rule of threes overuse
Pattern: "A, B, and C" repeated throughout.
Fix: Vary enumeration patterns. Use two items. Use four.
Break the rhythm.
13. Collaborative language leaks
Flag: "I hope this helps", "Let me know if you'd like",
"Feel free to" in published prose.
These are chatbot patterns, not author patterns.
Targets:
- <10 em dashes in prose
- 0 AI vocabulary words
- 0 bold sentence headers
- 0 significance inflation
- 0 filler phrases
- Good sentence and paragraph length variance
Process:
- Read entire post in ONE pass
- Identify ALL instances across all 13 categories
- Present summary table (category, count, examples, fix)
- Apply all fixes in a single write operation
- Re-verify counts dropped to acceptable levels
- One pass, not incremental patching
Why Each Category Matters
The goal is not to fool AI detection tools. It is to avoid the “generic and impersonal” uncanny valley that makes readers disengage. They may not consciously identify AI tells, but they register the text as corporate filler rather than a specific person with specific opinions.
A WriteHuman analysis of 80,141 humanization pairs (April 2026) confirms that the real 2026 AI tells are structural, not vocabulary [1]. Their data shows:
- “Ensuring” is the single strongest word-level AI tell — over-represented 4.3x in AI text versus human-edited text
- “Rather than” is the strongest multi-word tell — 17,251 occurrences in AI inputs versus 6,859 in humanized outputs
- “X plays a crucial/critical/important role in shaping Y” is the most formulaic sentence shape ChatGPT produces
- Em dashes are a weaker tell than the 2024 narrative claimed; only 18.5% of AI inputs contain one, down from the “everything is em dashes” caricature
This aligns with my own audit data. When I checked 32 of my posts against the 13 categories, em dashes were the #1 problem (26 of 32 posts exceeded the target), but bold sentence headers (18 posts) and AI vocabulary (16 posts) were close behind. The worst offender had 74 em dashes in 3,146 words. One every 43 words. That is not emphasis. That is a tic.
Category Deep Dives
Em dashes are the most subtle tell. Humans use them occasionally for emphasis or parenthetical asides. AI uses them as a universal connector, replacing commas, colons, and periods indiscriminately. When you see 15 em dashes in a 2,000-word post, your subconscious registers “machine-generated” even if you cannot articulate why. My audit found posts with ratios as bad as 1 em dash per 35 words. The fix is mechanical: replace most with the punctuation mark that actually fits the grammatical relationship.
AI vocabulary is the most obvious tell. “Delve” appears in approximately 0.001% of human-written text and approximately 5% of AI-generated text. It is a statistical fingerprint. But the WriteHuman data reveals a subtler layer: hedging verbs like “ensures,” “highlights,” “supports,” and “reflects” are even stronger signals than the famous vocabulary words. ChatGPT reaches for these when it is padding an idea to sound considered. A human would just say what the thing does.
Sentence length uniformity is the most overlooked tell. Human writing has rhythm. Short sentences punch. Longer sentences develop ideas, add nuance, build toward a conclusion that the short sentence then delivers. AI produces metronomic uniformity: every sentence is 15-20 words, the standard deviation is tiny, and it reads like a textbook rather than a person thinking on the page. The WriteHuman data confirms this — their humanizer actually produces slightly longer average sentences (23.3 vs 22.9 words), because natural connective tissue adds length. The difference is variance, not brevity.
Bold sentence headers are the most recognizable ChatGPT pattern. The format **Key insight.** Here's what that means in practice... appears in virtually every ChatGPT output and virtually no human writing. If your post has these, readers who use ChatGPT daily will clock it immediately. My audit found one post with 14 instances. That post scored D.
A Note on False Positives
The “0 AI vocabulary words” target is aspirational. “Landscape” in a geography article is fine. “Leverage” in a finance context is natural. “Harness” as a technical noun (agent harness, test harness) is legitimate — my audit found 46 instances across 10 posts, most of them valid technical usage.
In practice, about 10% of flags get overridden after human review. The aggressive target ensures nothing slips through unexamined. It is easier to approve a legitimate use than to catch a missed one.
The Meta Question
Yes, I ran this post through its own AI smell check. The irony of a post about AI tells containing AI tells would be too perfect. Current stats after the check: 7 em dashes in prose, 0 AI vocabulary words, 0 bold sentence headers, 0 significance inflation, sentence length StdDev of 8.2 words.
Step 11b: Critical Reader Pass
A different lens than challenging questions. This pass reads as someone who was not there.
The full constraint set:
- Read full post and list issues where an outsider would stumble
- Check for:
- Undefined terms or jargon used without context
- Unclear antecedents ("we" without stating who)
- Broken paragraph transitions (double "But", orphaned connectors)
- Unreferenced sources (listed but never cited inline)
- Vague descriptions assuming reader knowledge they don't have
- Claims that need grounding (numbers stated as fact vs. "for example")
- Present findings as numbered list with problematic text + fix
- Get user approval before applying changes
The distinction matters. Challenging questions (Step 9) test argument strength: is the logic sound? Is the evidence sufficient? The critical reader pass tests communication clarity: can someone outside your head follow the argument?
You can have a strong argument that is poorly communicated. A term you use without definition because it is obvious to you. A “we” that refers to your team but reads as “we, the industry.” A transition that connects two paragraphs in your mind but not on the page.
This pass catches the issues that make a reader stop and re-read a sentence. Not because the idea is complex, but because the expression is unclear. Every re-read is a small failure of communication, and enough of them make the reader give up.
Step 11c: TL;DR Generation
The final editing step adds a summary for time-pressed readers.
The full constraint set:
- Add blockquote TL;DR immediately after the title
- 3-5 sentences covering: what happened, key insight, main takeaway
- Must accurately reflect FINAL post content (run after all edits)
- Must NOT be a teaser — give the full conclusion
- Format: > **TL;DR:** [summary text]
The “must not be a teaser” constraint is deliberate. A TL;DR that says “read on to find out…” is useless. The TL;DR should give busy readers the full value. If they want depth, they read the rest. If they do not, they still got the insight. This is a gift to your reader, not a marketing hook.
The “must reflect FINAL content” constraint is equally important. The TL;DR runs last because every previous editing step changes the article. A TL;DR written before the challenging questions step would not reflect the insights that step added. A TL;DR written before the AI smell check might contain the very tells the check removed.
The Editing Pipeline in Sequence
The five editing passes run in sequence: every post goes through all of them before reaching visuals and publishing.
Every post goes through all five steps. No exceptions. No “this one is short, I’ll skip the FAQ.” No “I’m confident about this argument, I’ll skip the challenging questions.” The automation makes thoroughness sustainable at three posts per week. Without it, I would skip at least three of these steps on every post because “I don’t have time.” With it, every post gets more rigorous review than most professionally edited content.
Making It Replicable
You do not need my specific tooling. You need the checklist and a capable model. The full automated pipeline takes 8-12 minutes of wall-clock time per post (running in the background while you do other work). Compare that to the 45-90 minutes a thorough manual editing pass takes. If you are honest about actually doing it every time.
Here is the minimal version:
After finishing a draft, prompt: “Generate 10 challenging questions a skeptical reader would ask about this post. Then answer each one honestly.” Review the answers. Which ones reveal gaps? Fix those gaps.
Prompt: “Generate 8-12 questions a reader would naturally ask after reading this. Answer each one with specific citations.” Save this as your FAQ.
Paste the 13-category checklist above and prompt: “Scan this draft against all 13 categories. Present a summary table with category, count, examples, and proposed fixes.” Apply the fixes.
Prompt: “Read this as someone with no context about the author or topic. Where would you stumble? List unclear terms, broken transitions, and vague claims.” Fix what matters.
Write the TL;DR last. It should reflect the final version, not the first draft.
Model quality matters less than prompt specificity here. A mediocre model with the 13-category checklist catches more issues than a frontier model asked to “please review my writing.” The checklist constrains the model’s attention to specific, measurable patterns rather than letting it generate vague feedback like “consider varying your sentence structure.”
The Compound Effect
I have been running this pipeline since mid-April 2026. The audit of 32 posts shows the trajectory clearly:
- March posts (before the pipeline): average grade C-D. 74 em dashes in one post. 14 bold sentence headers in another.
- Late April posts (pipeline active): average grade B-C. Em dashes under 15. Bold headers eliminated.
- May posts (pipeline refined): average grade B. Occasional A.
The pipeline does not just fix individual posts. It trains the writing process. After running the AI smell check 30 times, you internalize the patterns. You stop writing “Furthermore” in the first place. You catch yourself reaching for an em dash and choose a period instead. The automation is a scaffold that eventually changes the underlying habit.
But I still run it every time. Because the day you think you have internalized the patterns is the day you publish a post with 43 em dashes and wonder why it reads like a press release.
What’s Next
I’m working towards Part 5, which covers Publishing: the final phase — image generation, LinkedIn teasers, deployment automation, and the analytics feedback loop that informs future content decisions. Expect it around Tuesday, June 16.
Sources
[1] WriteHuman, “The Real Signature of AI Writing Isn’t the Em-Dash Anymore” (April 2026) — analysis of 80,141 humanization pairs. writehuman.ai
[2] The AI Content Pipeline: How I Publish 3x a Week Without a Content Team
[4] Cognitive Debt: The Hidden Cost of AI-Generated Code
[5] Code Quality Is the New Infrastructure
[6] Your AI Judge Needs a Judge
About the Author
Stefan Christoph is a Principal Solutions Architect at AWS, focused on agentic AI, media & entertainment, and helping builders move from demo to production. He writes about AI architecture, developer productivity, and the future of software.
This is a personal blog. Opinions expressed here are my own and do not represent the views or positions of my employer.
❤️ Created with the support of AI (Kiro)