A watchdog snapshot/diff system for Odoo production

You push a fix for one bug. Something unrelated regresses. You don't notice for three days because the regression is silent: a price field that recalculates wrong, a stock-availability flag that flips for a thousand SKUs, an attribute that gets dropped on the next listing publish. The watchdog pattern: snapshot the relevant state before the change, snapshot after, diff. Anything unexpected in the diff gates the deploy. This is the discipline that lets a one-person shop touch a live production system without breaking it.

$The silent-regression problem

Odoo has an unusually large blast radius for the size of a typical change. The ORM's compute chain means a one-field edit can trigger recomputes across millions of records. The MRO pattern in product.template.write() means a connector override can fire on every product write, including writes you didn't make. The interplay between cron jobs, async queues, and synchronous writes means a change applied at 14:02 might not surface its consequences until 14:35 when the next cron fires.

Standard development discipline (unit tests, staging deploys, code review) catches the regressions that have a clear failure mode. The ones that bite are the ones that look like operating-normally state: the prices are still numbers, the listings are still active, the dashboards are still green. The bug is that the numbers shifted by 12%, or 14% of the catalog quietly went out-of-stock on Amazon, or every product write since the deploy is silently failing one branch of the connector. None of these trip an alarm.

The fix isn't more tests. It's an explicit before/after comparison of the state you care about.

$The pattern: snapshot, change, snapshot, diff

Four steps wrapped around any production-affecting change:

Take a snapshot of state on every model that the change could affect.
Apply the change.
Take a second snapshot of the same state.
Diff. Inspect every unexpected row.

"State" is a deliberate term. Not "all fields of all records," which is too much, but the specific business-relevant fields that, if they shift, cost the operator money or trust.

For an Odoo + multi-marketplace deployment, the state I snapshot pre/post most changes:

# Master data
product_template: list_price, standard_price, sale_ok, active
product_product: default_code, barcode, free_qty, qty_available, lst_price, active

# Marketplace listings
marketplace_listing: state, qty, price, lead_time_to_ship, last_synced_at
fl_marketplaces_amazon_listing: product_type, fulfillment_channel, sku, asin
fl_marketplaces_ebay_listing: listing_id, category_id, dispatch_time

# Pricing
product_pricelist_item: fixed_price, percent_price, applied_on, active

# Stock
stock_quant: quantity, reserved_quantity, location_id

# Configuration
ir_default: field_id, json_value
ir_config_parameter: key, value

Anything else (logs, mail messages, queue history) is noise. The snapshot is fields whose values matter to revenue or operations.

$What a snapshot looks like

A snapshot is a deterministically-ordered dump of the relevant columns into a flat file. Python + the Odoo ORM, output to JSONL for diff-ability:

def snapshot_state(env, label):
    """Write a state snapshot to /var/snapshots/{timestamp}-{label}.jsonl"""
    snapshot_specs = {
        "product.template": ["id", "default_code", "list_price", "standard_price", "sale_ok", "active"],
        "product.product": ["id", "default_code", "barcode", "free_qty", "qty_available", "lst_price", "active"],
        "marketplace.listing": ["id", "state", "qty", "price", "lead_time_to_ship", "channel", "sku"],
        # ... etc
    }
    path = f"/var/snapshots/{datetime.now(UTC).isoformat()}-{label}.jsonl"
    with open(path, "w") as f:
        for model_name, fields in snapshot_specs.items():
            for rec in env[model_name].search([], order="id"):
                row = {"_model": model_name}
                for f_name in fields:
                    row[f_name] = rec[f_name]
                f.write(json.dumps(row, default=str) + "\n")
    return path

Sorted-by-ID is critical, because the diff is line-oriented, so ordering must be deterministic. A snapshot of a 50,000-product catalog takes ~12 seconds; on a 500,000-product catalog, 90 seconds. Both are fast enough to run pre and post every change.

$The diff

JSONL is line-diff-able directly:

diff -u /var/snapshots/2026-05-12T14:00:00-pre-fix.jsonl /var/snapshots/2026-05-12T14:08:00-post-fix.jsonl > /tmp/state.diff
wc -l /tmp/state.diff

Two patterns to look for in the diff:

Expected changes. The records you intended to touch. If the change was "update list_price on 47 products," the diff has ~94 lines (47 pre + 47 post). Verify the IDs match the target set.
Unexpected changes. Records you did NOT intend to touch. Anything here is suspicious. Investigate every line before approving the change as deployed.

The first time I ran this on a production change, the diff had 47 expected lines and 1,247 unexpected lines. The unexpected ones were all marketplace.listing.last_synced_at shifting forward by a cron firing during the change window. Innocent. But the same diagnostic that surfaced the innocent cron-firing pattern is the one that catches a connector accidentally overwriting lead_time_to_ship across the catalog. Same shape of diff. Different consequence.

$The diff-noise problem

Some fields drift continuously and pollute the diff:

last_synced_at, write_date, __last_update: modified on every write
qty_available for stock-managed products: modified by every stock move
state on marketplace listings: flips between publishing and published during a publish cycle

Either exclude these from the snapshot, or run snapshots when these fields are stable (publish cycle complete, no in-flight stock moves). The pattern I prefer is to exclude the obvious noise fields and let the diff surface the rest, then inspect each surprising line.

$What "rollback" actually means

The pre-change snapshot is also the rollback fixture. If the post-change diff is unacceptable, restore the pre-change state by writing each diff line back through the ORM:

def rollback_from_snapshot(env, pre_snapshot_path, affected_ids_by_model):
    """Restore field values from a pre-change snapshot for affected records."""
    by_model = collections.defaultdict(list)
    with open(pre_snapshot_path) as f:
        for line in f:
            row = json.loads(line)
            if row["id"] in affected_ids_by_model.get(row["_model"], set()):
                by_model[row["_model"]].append(row)
    for model_name, rows in by_model.items():
        Model = env[model_name]
        for row in rows:
            rec_id = row.pop("id")
            row.pop("_model", None)
            Model.browse(rec_id).write(row)

This is not a database-level rollback. It's a targeted reset of the affected records' specific fields. Database-level rollback (point-in-time recovery on PostgreSQL) is the bigger hammer reserved for catastrophic data corruption. Most regressions don't need it; they need a 3-line ORM write to put 47 prices back where they were.

$When to run the watchdog

Six change types where I always snapshot:

Any deploy that touches a module that owns a model in the snapshot spec
Any data-fix script (one-off SQL or ORM batch update)
Any cron-job-config change (frequency, parameters)
Any Odoo version upgrade or module upgrade (-u run)
Any connector publish action that touches more than 100 listings at once
Any change to a base/core configuration parameter (ir_config_parameter)

The cost: ~90 seconds per snapshot on a 500K-product catalog, twice per change. Three minutes of operator time bought against the alternative of a silent regression that runs in production for three days.

$Where this fits in the practice stack

The watchdog is one piece of the AI-augmented engineering stack: knowledge vault, persistent memory, module-lock coordination, watchdog snapshots, hard rules. See the practice-stack post for the full picture. The snapshot/diff specifically pairs well with the test suite: tests catch the regressions you anticipated, the watchdog catches the ones you didn't.

On any production-affecting Odoo work, the snapshot ritual is non-negotiable. The five minutes it takes is recovered the first time the diff catches something unexpected, and it always catches something, eventually.