Open Framework Spec / June 2026
Papers / Research Framework

Industry Research Framework

A framework for source-backed longform industry research and publishable writing by AI agents. It defines how agents preserve state, discipline evidence, draft in stages, run review loops, and clean final prose.

Scope Contract Research Brief Task State Recovery Source Registry Claim Discipline Staged Drafting Review Loop Reader Revision
00

30-Second Quickstart

Start from the authoritative skill file, then load references only when the task needs them.

Use SKILL.md first. Run research scope calibration, confirm output, reader, depth, evidence standard, and coverage, then create state, logs, and data files before broad source collection.

Load references conditionally: workflow for setup or recovery, analysis lenses for method choice, subagent guidance before delegation, writing style before drafting, and quality gates before completion.

A1

Use With Your Agent

This repository is meant to be handed to an agent as a lightweight research protocol, not installed as a heavy product.

Default path

Give the repository URL to your agent

Ask the agent to read SKILL.md first, run the research brief gate before source collection, and load files under references/ only when the current stage needs them.

Adapters

Use the agent-specific notes

Codex, Claude, Gemini CLI, Cursor, ChatGPT-style agents, OpenClaw, and Hermes Agent have short setup notes in the agents directory.

A2

Evaluation Loop

The repository includes a small conformance loop for checking whether agents actually follow the framework.

What ships

Cases, source pack, rubric

The evals directory contains task cases, rubrics, and a sanitized AI knowledge source pack for workflow testing.

Runner

Run offline checks

Use scripts/run_evals.py to create eval skeletons and score output artifacts for state files, claim discipline, quality gates, and reader cleanup.

01

Motivation: Five Failure Modes

Longform research agents tend to fail in repeatable ways. The framework exists to make those failures harder to repeat.

Failure 01

Topic Overfitting

A method distilled from one project becomes falsely treated as the universal frame.

Failure 02

Process Leakage

The final article reads like a work log instead of a finished author's report.

Failure 03

Evidence Drift

Sources, claims, uncertainty, and judgment collapse into one undifferentiated argument.

Failure 04

False Completion

A partial milestone is reported as final completion before coverage, review, and reader revision are done.

Failure 05

Depth Collapse

Source counts and coverage checklists pass, but the finished report is too short or compressed for the expected research depth.

02

Scope Contract

This repository is an execution framework for research deliverables, not a theory system or product architecture.

Inside

Process

Research scope calibration, staged execution, source processing, drafting, review, revision, and final cleanup.

Inside

State

Task state, progress, findings, assumptions, decisions, and direction tracking.

Inside

Audit

Source, claim, uncertainty, coverage, depth, and reader-quality checks.

Keep domain ontologies, universal taxonomies, intermediate representations, scoring systems, embeddings, knowledge graphs, dashboards, CLIs, databases, automation pipelines, and product architecture outside this repository unless they are explicitly split into a separate project.

If work starts drifting into those layers, preserve the current research deliverable path and record the idea as a future extension.

03

Behavioral Constraints

Hard rules of the framework, each induced from real failure modes in long research work.

  1. Deliverable first. If the output is an article or report, do not drift into system design.
  2. Research brief gate. If critical information is missing, ask one compact clarification batch before collection.
  3. State before scale. Write task state before expanding source collection.
  4. Evidence is not prose. Registries and audit labels stay backstage.
  5. Depth budget before drafting. Define expected depth, rough length band, and unit-level expansion plan.
  6. Staged execution. Plan, collect, analyze, draft, review, revise, then continue.
  7. Optional lenses only. Framing/category and horizontal-vertical analysis are tools, not default structure.
  8. Review closes the loop. Every finding becomes a revision action, downgraded claim, or limitation.
04

Architecture

The main agent owns thesis and final judgment. The backend preserves evidence. The frontend becomes publishable prose.

Main Agent: thesis, structure, final judgment

Research Backend

State files, source registry, claim registry, uncertainty list, review logs, access failures.

Publishing Frontend

Thesis, analytical sections, synthesis, counter-evidence, reader-facing references, final prose cleanup.

05

State File System

State is written to files so the task can recover after context loss and avoid reconstructing progress from memory.

state/

  • task_spec.md
  • progress.json
  • findings.jsonl
  • directions_tried.json
  • iteration_log.jsonl

logs/

  • work.jsonl
  • review.jsonl

data/

  • source_registry.csv
  • claims_registry.csv
  • uncertainty_registry.csv
06

Recovery And Guardrails

Recover from context loss through state files, and stop loops before they become false progress.

Recovery

Resume from state

Read task_spec, progress, recent findings, iteration logs, and tried directions before taking action after a restart.

Guardrail

Stop empty collection

If three consecutive source passes add no relevant evidence, stop that direction and draft or pivot.

Guardrail

Extract claims

If sources grow while claims stay thin, pause collection and convert evidence into claims before searching more.

07

Research Brief Gate

Before collecting sources, ask only for missing information that changes scope, output, evidence, or depth.

If the request lacks decision-critical information, ask one compact batch of questions. The batch must include expected length or depth when it is missing.

If the request is already clear, proceed and record assumptions in task_spec.md instead of asking ritual questions.

08

Operating Loop

Each stage produces bounded progress, then updates state before the next stage begins.

StepActionOutput
1Run the research brief gate, then plan the scope, inputs, output, and done criteria.Stage plan
2Collect or process only the sources needed for that stage.Source notes
3Convert sources into claims, uncertainty, and analysis notes.Claim registry
4Draft a bounded section or unit.Section draft
5Review for evidence, coverage, structure, skepticism, and prose.Review log
6Revise the section and registries.Clean draft
7Update progress and define the next stage.Next action

If one cycle adds no new evidence, case, counterexample, framework, or judgment, increment stale_count. If stale_count >= 2, pivot the structural angle rather than searching harder inside the same frame.

For longform deliverables, source counts, claim counts, link counts, and file size are backend health signals only. They cannot replace a depth review.

09

Analysis Lens Scheduling

Pick one primary lens and at most two secondary lenses unless the user explicitly requests a multi-method report.

Framing/Category

Positioning, legitimacy, category creation, public meaning, media translation.

Horizontal-Vertical

Timeline depth plus current competitor or substitute comparison.

Adoption

User behavior, workflow change, replacement, friction.

Capital

Pricing, revenue, valuation, funding, cost structure, margins.

Organization/Talent

Operating model, hiring, leadership, talent flow.

Counter-Case

Strongest alternative explanation and failure modes.

10

Subagent Scheduling Patterns

Subagents inspect or challenge bounded work. They do not own the thesis or rewrite the whole report.

RoleUse
Requirement MapperTurn user requirements into a completion checklist.
Evidence AuditorCheck support, source access, and confidence boundaries.
Coverage AuditorFind missing companies, actors, periods, source categories, or themes.
Skeptical ReviewerFind hype, PR laundering, weak causality, and missing counter-cases.
Reader CriticAfter factual checks, improve clarity, flow, cognitive load, and report feel.
11

Engineering Constraints

The framework turns quality checks into mechanical habits instead of end-of-project reconstruction.

  • Claim boundary. Every important hard claim needs a confidence boundary.
  • Registry cadence. Every 20 important facts, figures, or judgments should update source and claim registries.
  • Depth gate. A final report must meet the depth budget; registry completeness alone is not completion.
  • Official source limit. Official materials show stated position; they do not prove adoption.
  • Media source limit. Media materials show public framing; they need corroboration for hard facts.
  • Community source limit. User/community evidence shows reception; it is not automatically representative.
  • Reader review limit. Reader review may improve flow and clarity, but must not invent facts.
12

Validation And Limits

The framework improves reliability, but it does not make an agent immune to bad sources, weak reasoning, or unsupported claims.

Completion Gate

Before final delivery

The research brief gate is complete or assumptions are recorded; required coverage is complete or explicitly bounded; major claims trace back to sources; counter-evidence, recovery state, and depth expectations have been addressed.

Limits

Honest disclosure

Subagent review is a check, not external truth. Optional lenses can overfit the report if used mechanically.

13

Full SKILL.md

The authoritative instruction file, included here for copying into other agent environments.

---
name: industry-research-framework
description: Framework for longform, source-backed industry research and publishable writing by AI agents. Use when an agent must plan, clarify, collect, verify, analyze, draft, review, revise, and finalize a substantial industry, market, company, product, technology, policy, or ecosystem research article/report across multiple stages and many sources. Prescribes a scope contract, research brief gate, task state, source/claim/uncertainty registries, depth budgeting, staged execution, optional analysis lenses, subagent review, reader-quality revision, and final prose cleanup. Do not use for quick factual answers, simple summaries, citation formatting only, spreadsheet-only work, or purely creative writing.
---

# Industry Research Framework

This skill is a framework for longform industry research and publishable writing. It ships no scraper, data source, or fixed report template; instead it prescribes conventions for how an AI agent persists state, separates evidence from prose, avoids topic drift, schedules review, and turns a large research backend into a clean reader-facing article or report.

## 1. Motivation

Longform research agents tend to fail in five recurring ways:

1. Topic overfitting: a method distilled from one project becomes falsely treated as the universal frame.
2. Process leakage: the final article reads like a work log, with phrases such as "the user provided" or "the material shows".
3. Evidence drift: sources, claims, uncertainty, and author judgment collapse into one undifferentiated argument.
4. False completion: a partial milestone is reported as final completion before coverage, review, and reader-quality revision are done.
5. Depth collapse: a report satisfies source counts and coverage checklists but is too short, compressed, or thin for the user's expected research depth.

Every mechanism in this framework targets one of those failures.

## 2. Scope Contract

This skill is an execution framework for producing substantial research deliverables. It is not a theory system, product architecture, or universal modeling language.

Keep inside this skill:

1. Process: research scope calibration, staged execution, source processing, drafting, review, revision, and final cleanup.
2. State: task state, progress, findings, assumptions, decisions, and direction tracking.
3. Audit: source, claim, uncertainty, coverage, depth, and reader-quality checks.

Keep outside this skill unless the user explicitly asks for a separate system design project:

1. Domain ontologies, universal taxonomies, or generalized modeling languages.
2. Intermediate representations, scoring systems, embeddings, knowledge graphs, or ranking engines.
3. Dashboards, CLIs, databases, automation pipelines, or product architecture.
4. Methodology manifestos that do not directly improve the current research deliverable.

If a task starts drifting into the excluded layers, preserve the current deliverable path, record the idea as a future extension, and do not expand the workflow.

## 3. Behavioral Constraints

1. Deliverable first: if the requested output is an article or report, do not drift into system design, prompt design, or workflow exposition.
2. Research brief gate before collection: ask one compact clarification batch when decision-critical information is missing.
3. State before scale: for long tasks, write task state to files before expanding source collection.
4. Evidence is not prose: registries, logs, audit labels, and access failures stay backstage unless the user requests an audit appendix.
5. Depth budget before drafting: record expected depth, rough length band, unit-level expansion plan, and what "too short" would mean for this task.
6. Staged execution: plan, collect, analyze, draft, review, revise, and update state before moving to the next unit.
7. Section-level progress: write complex work by section, company, case, period, or argument; do not generate the whole report in one pass.
8. Optional lenses only: framing/category analysis, horizontal-vertical analysis, capital analysis, and adoption analysis are tools, not default structure.
9. Review closes the loop: every audit finding must become a revision action, a downgraded claim, or an explicit limitation.
10. Reader review comes last: improve readability only after factual, coverage, structure, and depth checks are stable.

## 4. Architecture

    Main Agent
    owns thesis, structure, final judgment

    Research Backend   Publishing Frontend
    state files        thesis / sections
    source registry    mechanisms / synthesis
    claim registry     counter-evidence
    uncertainty list   reader-facing references
    review logs        final prose cleanup

Subagents may inspect or challenge bounded parts of the backend, but the main agent owns the argument and final prose.

## 5. State Files

For substantial work, create:

    {task}/state/
      task_spec.md            # objective, reader, output, scope, depth, evidence standard, assumptions
      progress.json           # stage, completed units, open issues, stale_count
      findings.jsonl          # append-only findings and judgments
      directions_tried.json   # directions already attempted
      iteration_log.jsonl     # stage summaries

    {task}/logs/
      work.jsonl              # execution decisions
      review.jsonl            # review findings and routed fixes

    {task}/data/
      source_registry.csv
      claims_registry.csv
      uncertainty_registry.csv

Use state files to recover after context loss. Do not rely on chat history as the only memory.

### Context Recovery Protocol

When resuming after context loss, session restart, or handoff:

1. Read `state/task_spec.md` for objective, scope, reader, output, depth, evidence standard, and assumptions.
2. Read `state/progress.json` for current stage, completed units, open issues, stale_count, and next action.
3. Read the latest entries in `state/findings.jsonl` and `state/iteration_log.jsonl` to recover the recent direction.
4. Read `state/directions_tried.json` to avoid repeating failed or exhausted paths.
5. Resume from the matching step in the operating loop.

Do not re-run completed stages. Do not re-ask the research brief if `task_spec.md` already records the answers.

## 6. Research Brief Gate

Before collection, decide whether the request contains enough decision-critical information. If not, ask one compact batch of questions before starting. The batch should usually contain 3-7 questions and must cover expected length or depth when it is missing.

Ask only for missing critical information:

- research object and scope boundaries
- target reader and decision context
- output format, language, and publishing context
- expected depth, rough length band, or depth level
- must-cover units, exclusions, and priority areas
- required sources or materials, source exclusions, and evidence standard
- time period, geography, deadline, and whether charts/tables are expected

If the user has already supplied enough context, do not ask ritual questions. Proceed, record assumptions in `task_spec.md`, and mark unresolved non-critical details as assumptions or uncertainties.

If critical details remain unanswered after one clarification batch, make conservative assumptions, record them, and begin with a bounded Stage 1 instead of stalling.

## 7. Operating Loop

For each stage:

1. Run the research brief gate, then plan the scope, inputs, output, and done criteria.
2. Collect or process only the sources needed for that stage.
3. Convert sources into claims, uncertainty, and analysis notes.
4. Draft a bounded section or unit.
5. Review the section for evidence, coverage, structure, skepticism, and prose.
6. Revise the section and registries.
7. Update progress and define the next stage.

If one cycle adds no new evidence, case, counterexample, framework, or judgment, increment `stale_count`. If `stale_count >= 2`, pivot the structural angle rather than merely searching harder.

For longform deliverables, do not use source count, claim count, link count, or file size as completion substitutes. They are backend health signals, not proof that the finished report has enough depth. Before final assembly, compare the draft against the depth budget and expand thin units before reader review.

## 8. Source And Claim Discipline

Classify sources by what they can prove:

- official materials show stated position, intent, product surface, or formal policy
- primary data supports measurable claims when definitions and collection methods are clear
- expert materials explain reasoning, context, and interpretation
- media materials show public framing but need corroboration for hard facts
- user/community evidence shows reception but is not automatically representative
- counter-evidence limits, weakens, or falsifies the main claim

Classify claims separately:

- verified fact
- source claim
- interpretation
- author judgment
- speculation

Every important hard claim should have a confidence boundary. Do not turn company PR, investor hopes, or media amplification into fact.

## 9. Analysis Lens Scheduling

Choose the lens that fits the research question:

- framing/category analysis: positioning, legitimacy, category creation, public meaning, and media translation
- horizontal-vertical analysis: timeline depth plus current competitor/substitute comparison
- adoption analysis: user behavior, workflow change, replacement, friction
- capital analysis: pricing, revenue, valuation, funding, cost structure, margins
- organization/talent analysis: operating model, hiring, leadership, talent flow
- policy/legitimacy analysis: regulation, compliance, trust, geopolitical or institutional pressure
- counter-case analysis: strongest alternative explanation and failure modes

Pick one primary lens and at most two secondary lenses unless the user explicitly requests a multi-method report.

Read `references/optional-analysis-lenses.md` when choosing lenses. Read `references/horizontal-vertical-analysis.md` only after that lens has been selected.

## 10. Subagent Scheduling

Use subagents only for bounded work:

- requirement mapping
- source discovery for separate regions, actors, or source classes
- evidence-chain verification
- coverage audit
- skeptical review
- structure review
- reader-quality review after the draft is stable

A subagent prompt must include objective, files or sections to inspect, output format, PASS/FAIL criteria, and boundaries. Subagents should not rewrite the whole report or own the thesis.

Read `references/subagents-and-review-loop.md` before delegation.

## 11. Finalization

The final article or report should contain reader-facing material only:

- conclusion-first insights when useful
- scope note
- analytical sections organized by argument, case, period, or mechanism
- synthesis across units
- counter-evidence and uncertainty expressed cleanly
- implications
- reader-facing reference appendix

Remove:

- visible source IDs
- audit labels
- file paths
- "the user provided"
- "the material shows"
- "this source supplements"
- "this section passed audit"
- excessive caveats that weaken rather than clarify judgment

## 12. Validation And Limits

Before declaring completion:

1. The research brief gate was completed or assumptions were recorded.
2. Required coverage is complete or limitations are explicit.
3. Major claims trace back to sources or uncertainty records.
4. Facts, source claims, interpretations, and author judgments remain distinct.
5. Counter-evidence has been addressed.
6. The draft meets the depth budget or explicitly explains why the original expected depth is no longer appropriate.
7. Reader review has been run after factual, coverage, structure, and depth review.
8. The final prose reads like an author's report, not an agent process report.

Limits:

1. The framework reduces citation and evidence errors; it does not eliminate them.
2. Subagent review is a check, not external truth.
3. Optional lenses can overfit the report if used mechanically.
4. State files help recovery, but they only work if updated during the task, not reconstructed after the fact.

## 13. Execution Guardrails

Use these guardrails to prevent loops, overcollection, and scope drift:

1. Source collection: if three consecutive searches or source passes add no relevant evidence, stop collecting in that direction, update `directions_tried.json`, and draft or pivot.
2. Claim extraction: if `source_registry.csv` grows while `claims_registry.csv` stays thin, pause collection and extract claims before gathering more sources.
3. Review loop: cap full review-revise cycles at two per section unless the user asks for more; record unresolved issues as limitations or follow-up tasks.
4. Depth check: before reader review, compare the draft against the depth budget and expand thin units before optimizing prose.
5. Scope expansion: if new work falls outside `task_spec.md`, record it as a proposed extension and ask before expanding the project.
6. Subagent review: prompts must ask the reviewer to actively look for issues; if no issue is found, the reviewer must state what evidence supports PASS.

## References

- Read `references/research-workflow.md` only when starting a substantial project, creating state files, or resuming after context loss.
- Read `references/optional-analysis-lenses.md` only when the research question needs an explicit analysis lens decision.
- Read `references/horizontal-vertical-analysis.md` only when horizontal-vertical analysis has been selected.
- Read `references/subagents-and-review-loop.md` only before delegating work or running a review loop.
- Read `references/writing-style.md` only when entering drafting, final cleanup, or reader-driven revision.
- Read `references/quality-gates.md` only before declaring a stage or final deliverable complete.
- Read `references/postmortem-lessons.md` only when adapting this framework or diagnosing repeated task drift.