How to Use Codex Goals for Longer, Evidence-Based Work

CodexCodex GoalsOpenAI

June 27, 2026

How to Use Codex Goals for Longer, Evidence-Based Work

By Synthex

Codex is easy to use for a small task.

Ask it to inspect a file, fix one bug, explain one error, or write one test, and the shape is simple: you ask, Codex works, you review the result.

The harder cases are different. A performance problem may need several attempts. A flaky test may need reproduction before repair. A research task may need a final report that separates confirmed evidence from uncertainty. In those cases, the problem is not always the first prompt.

The problem is that the objective gets fuzzy after a few turns.

This guide is based on OpenAI's Cookbook example on using Goals in Codex. The practical idea is simple: a Goal gives Codex a persistent objective for the current thread, so it can keep working toward a defined outcome instead of waiting for you to keep saying "continue."

What you'll learn

What a Codex Goal is in plain language.
When a Goal is better than a normal prompt.
How to start, pause, resume, view, and clear a Goal.
How to write Goals with a clear finish line.
Why strong Goals need evidence, not just intention.
How Goals differ from memory, AGENTS.md, and automation.
What to avoid when a task is too vague or too small.

What this is really about

A normal prompt tells Codex what to do next.

A Goal tells Codex what should be true when the work is finished.

That difference matters.

If you ask:

Fix the flaky checkout test.

Codex can try one repair, report what happened, and wait.

If you set:

/goal Make the checkout test pass reliably on repeated local runs without changing the public checkout behavior.

Codex has a more durable target. It can reproduce the failure, inspect the test, try a contained fix, rerun the test, and decide whether the evidence is good enough.

A Goal is not "do work forever." A Goal is a scoped finish line for the current thread.

The best mental model is:

Mode	What it means
Normal prompt	Do this next thing, then wait
Goal	Keep working toward this defined outcome until it is complete, paused, blocked, cleared, or stopped by budget

Goals are most useful when the next step depends on what Codex learns while working.

What a Codex Goal is

A Goal is a persistent objective attached to a Codex thread.

That means it belongs to the conversation where the work is happening. It is not global memory. It is not a permanent project rule. It is not the same as AGENTS.md.

Use a Goal when the thread needs a clear completion contract:

What should be true at the end.
How Codex should check that it is true.
What must not break along the way.
What Codex should do if the work becomes blocked.

For example:

/goal Reduce report generation time below 2 seconds, verified by the local benchmark, while keeping the existing export tests green.

That gives Codex three useful anchors:

Outcome: report generation below 2 seconds.
Verification: the local benchmark.
Constraint: export tests must still pass.

Without those anchors, Codex may improve something, but it has no reliable way to know whether the work is actually finished.

When to use a Goal

Use a Goal when the task has a finish line but the route is uncertain.

Good candidates:

Performance tuning.
Flaky test investigation.
Bug hunts that require reproduction.
Dependency upgrades with tests and follow-up fixes.
Multi-step refactors with a clear verification command.
Research work that needs a final evidence-backed report.
Long audits where findings need to be checked against source material.

A Goal is useful when you would otherwise keep writing:

Keep going.
Try the next fix.
Run the test again.
Check the benchmark.
Continue until this is actually done.

That kind of repeated instruction is a sign that the task needs a persistent objective.

When not to use a Goal

Do not use a Goal for every Codex task.

A normal prompt is better for:

A one-line edit.
A simple explanation.
A short code review.
A quick command.
A single file cleanup.
A question where you want one answer and then a stop.

Also avoid Goals when the finish line is too vague.

Weak:

/goal Make this better

Better:

/goal Rewrite the onboarding page so it explains setup in five steps, keeps the existing headings, and passes the current Markdown lint check.

The second version gives Codex something it can inspect. The first one mostly gives Codex a mood.

How to start and manage a Goal

Goals are available in Codex builds that support the /goal command. If the command is missing, update Codex first.

For the CLI, the OpenAI Cookbook example lists Goals as available starting in Codex 0.128.0. You can update and check your version with:

npm install -g @openai/codex@latest
codex --version

Or with Homebrew:

brew update
brew upgrade --cask codex
codex --version

To set a Goal, use /goal followed by the desired outcome:

/goal Make the import script produce a clean review CSV and verify it with the sample input file.

To manage the Goal:

Command	What it does
`/goal`	View the current Goal
`/goal pause`	Pause the active Goal
`/goal resume`	Resume a paused Goal
`/goal clear`	Remove the current Goal

The important part is control. A Goal gives Codex continuation context, but you can still pause, resume, or clear it.

How to write a strong Goal

A strong Goal is not long because it is fancy. It is specific because Codex needs to know what counts as done.

The strongest Goals usually include six parts.

Part	Plain meaning	Example
Outcome	What should be true at the end	`Reduce p95 latency below 120 ms`
Verification surface	How Codex should prove it	`verified by the checkout benchmark`
Constraints	What must not regress	`while keeping correctness tests green`
Boundaries	Where Codex may work	`use only the checkout service and related tests`
Iteration policy	How to choose the next attempt	`after each run, compare results and try the smallest defensible change`
Blocked stop condition	When to stop and report	`if the benchmark cannot run, report the blocker and next input needed`

Here is the basic pattern:

/goal [desired end state], verified by [specific evidence], while preserving [constraints]. Use [allowed files, tools, or boundaries]. Between iterations, [how to choose the next useful action]. If blocked, [what to report].

You do not need to use this exact grammar every time. The point is to define the finish line and the evidence.

Weak vs strong examples

Performance

Weak:

/goal Improve performance

Strong:

/goal Reduce dashboard load time below 1.5 seconds on the local benchmark, while keeping the existing export and filter tests green. Use the dashboard route, data-loading helpers, and related tests only. After each attempt, record what changed and what the benchmark showed. If the benchmark cannot run, stop with the blocker and the next input needed.

Why it works:

The target is measurable.
The verification surface is named.
The constraints are visible.
Codex knows when to stop instead of guessing.

Flaky tests

Weak:

/goal Fix the flaky test

Strong:

/goal Make the payment confirmation test pass reliably across 10 repeated local runs without weakening the assertion. Reproduce the failure first, inspect timing or state issues, make the smallest contained fix, and report the before/after evidence. If the failure cannot be reproduced, stop with the commands tried and the remaining uncertainty.

Why it works:

It asks for reproduction before repair.
It protects the assertion from being watered down.
It defines what "reliably" means.
It has a clean blocked condition.

Documentation

Weak:

/goal Write docs for this feature

Strong:

/goal Produce a beginner-friendly docs page that explains setup, configuration, and two common mistakes for this feature. Verify that all commands match the current CLI behavior and that the page builds locally. If a command cannot be verified, label it as uncertain instead of presenting it as confirmed.

Why it works:

It names the final artifact.
It defines what the page should cover.
It asks Codex to check commands instead of inventing confidence.

Research

Weak:

/goal Research this paper

Strong:

/goal Produce an evidence-backed summary of the paper using the available source materials. Build a claim inventory, separate confirmed claims from approximate support, label blocked claims clearly, and end with a short report that lists remaining uncertainty.

Why it works:

It does not pretend every claim can be proven.
It asks for a structured final artifact.
It keeps uncertainty visible.

What changes when a Goal is active

When a Goal is active, Codex can keep the objective in view across turns.

That does not mean it should ignore you. It means the thread has a stronger sense of what it is trying to finish.

Three things change.

1. The objective stays visible

If a test fails, Codex can compare the failure to the Goal.

If a benchmark improves but misses the target, Codex can keep going.

If a research path hits missing data, Codex can adjust the evidence plan without losing the final report standard.

2. Continuation becomes structured

Goals are designed for idle continuation, not chaotic parallel work.

Codex should continue only when the thread is ready for more work. If there is queued user input, active work, or an interruption, the Goal should not bulldoze through it.

This matters because useful autonomy needs boundaries. Continuation should happen at safe points, after Codex has evidence from the previous step.

3. Completion has to be checked

A Goal should not be treated as complete because Codex feels done.

It should be complete because the objective was checked against concrete evidence:

Tests passed.
A benchmark reached the target.
A build succeeded.
A generated artifact exists.
A report separates confirmed, approximate, blocked, and uncertain claims.

The evidence is what makes the Goal trustworthy.

Goals are not memory, AGENTS.md, or automations

These concepts are related, but they are not interchangeable.

Feature	Scope	Use it for
Goal	Current thread	A persistent outcome for one long task
Memory	Cross-thread recall, when available	Stable preferences or context that may help future work
AGENTS.md	Project or folder guidance	Durable instructions Codex should follow in that workspace
Automation	Scheduled or recurring work	Running a known workflow later or repeatedly

Use a Goal when the current thread needs to keep working toward a defined outcome.

Use AGENTS.md when the rule should apply every time Codex works in a folder.

Use memory for stable personal or project preferences, if memory is available and appropriate.

Use automation only after the workflow is already clear enough to run on a schedule.

A practical workflow for beginners

If Goals feel abstract, use this order.

Step 1: Write the task in normal language

Start messy:

I want Codex to keep working on this flaky test until it either fixes it with evidence or can explain what is blocking progress.

Step 2: Ask Codex to turn it into a Goal

Use Codex to draft the Goal before activating it:

Help me turn this into a strong /goal. Include the outcome, verification surface, constraints, iteration policy, and blocked stop condition.

Step 3: Tighten the evidence

Before you start, check whether the Goal answers:

What does done mean?
How will Codex prove it?
What should not change?
What files, tools, or data are allowed?
When should Codex stop and ask?

Step 4: Activate the Goal

Then paste the cleaned-up version:

/goal Make the flaky payment confirmation test pass reliably across repeated local runs without weakening the assertion. Reproduce the failure first, make the smallest contained fix, rerun the relevant tests, and stop with evidence or a clear blocker.

Step 5: Review the evidence

Do not review only the final sentence. Review the proof:

Which commands ran?
Which files changed?
Which tests passed?
Which claims are still uncertain?
Did Codex stay inside the boundaries?

This is where Goals become genuinely useful. They do not remove your review role. They make the review target clearer.

Common misunderstandings

"A Goal means Codex can work without supervision"

No.

A Goal gives Codex a persistent objective. It does not remove your responsibility to review changes, inspect evidence, and manage permissions.

"A longer Goal is always better"

No.

A strong Goal is specific, not bloated. If the Goal becomes a giant requirements document, Codex may struggle to tell which parts are essential. Keep it focused on outcome, evidence, constraints, and blockers.

"Goals are only for code"

Not necessarily.

They are especially useful for coding tasks because tests and benchmarks make verification easier. But the same pattern can help with research, audits, documentation, and file-heavy work when the final artifact has a clear evidence standard.

"If Codex reaches the budget, the Goal is complete"

No.

A budget limit is a stopping condition, not proof of success. If Codex runs out of budget or time, the useful output is a progress summary, blockers, and the next best step.

"A Goal can fix a vague task"

Only if you make the Goal less vague.

Make this better is still weak. A Goal should define what better means, how to check it, and what should stay intact.

What to do first

Try Goals on a contained task before using them on something large.

Good first Goals:

/goal Make the failing formatter test pass without changing the expected output format. Verify by running the formatter test file and report any remaining uncertainty.

/goal Produce a short migration checklist for this package update, verified against the current changelog and local package files. Separate required changes, optional cleanup, and unknowns.

/goal Reduce the sample report export time below 3 seconds using the local sample data, while preserving the existing exported columns and row order.

Avoid starting with:

/goal Improve the whole app

That is too broad to audit.

Start with one narrow task, one verification surface, and one clear constraint. Let the Goal teach you the rhythm: work, check, continue, or stop honestly.

Final takeaway

Codex Goals are for work where the objective should persist longer than one prompt.

Use them when the task has a clear finish line, but the route to that finish line may require investigation. Write the Goal like a compact contract: outcome, evidence, constraints, boundaries, iteration policy, and blocked stop condition.

The point is not to make Codex run endlessly. The point is to keep the work tied to evidence until it is either complete or honestly blocked.

How to Use Codex Goals for Longer, Evidence-Based Work

What you'll learn

What this is really about

What a Codex Goal is

When to use a Goal

When not to use a Goal

How to start and manage a Goal

How to write a strong Goal

Weak vs strong examples

Performance

Flaky tests

Documentation

Research

What changes when a Goal is active

1. The objective stays visible

2. Continuation becomes structured

3. Completion has to be checked

Goals are not memory, AGENTS.md, or automations

A practical workflow for beginners

Step 1: Write the task in normal language

Step 2: Ask Codex to turn it into a Goal

Step 3: Tighten the evidence

Step 4: Activate the Goal

Step 5: Review the evidence

Common misunderstandings

"A Goal means Codex can work without supervision"

"A longer Goal is always better"

"Goals are only for code"

"If Codex reaches the budget, the Goal is complete"

"A Goal can fix a vague task"

What to do first

Final takeaway

Further reading