[FROSTLABS] · home / writing / image derivative recovery
2026-03-22 · 8-min read · Odoo · Image pipeline

Recovering 32,500 Odoo image derivatives after an ORM cascade.

Converted 8,623 master images from WebP/PNG-with-alpha to PNG-RGB to satisfy Amazon's catalog. The conversion inadvertently triggered Odoo's image-derivative chain to cascade-delete 34,488 derivative attachments. The kind of cascade that takes a working catalog and silently empties it of every product thumbnail. The obvious ORM-based recovery processed at 11 minutes per thousand records. Wrong path. The right fix was a PIL-based bulk-insert that wrote directly to ir_attachment while still triggering the derivative-compute chain in the same transaction. Roughly 32,500 derivatives recovered without service disruption.

$The setup

Amazon's catalog rejects WebP and PNG-with-alpha master images. The Seller Central API will accept the upload, but the listing then shows the dreaded "image suppressed" status hours later when Amazon's image processor evaluates it. Your listings go live, customers can't see the product, you lose Buy Box.

On a 7,000-listing catalog where most masters were WebP (efficient for the original e-commerce site but rejected by Amazon), the fix was clear: batch-convert all 8,623 masters to PNG-RGB. PIL handles the conversion. Write the result back to ir_attachment via the Odoo ORM. Move on.

What actually happened was not move-on.

$The cascade

Odoo's image system is layered: a product.template or product.product has a primary image_1920 field. When you write that field, Odoo automatically generates a chain of derivative resolutions: image_1024, image_512, image_256, image_128, and image_64. Each is stored as an ir_attachment record linked to the parent.

The chain is computed lazily on access (via _compute_image_thumbnail et al.) but cached as ir_attachment rows once computed. When you UPDATE the master image, Odoo invalidates and deletes the derivatives; they'll be regenerated on next access.

That's fine when you update one image at a time. It's catastrophic when you UPDATE 8,623 masters in a batch transaction, because:

For 34,488 derivatives at 100ms each: ~57 minutes of CPU time if regenerated serially. In practice, with concurrent requests competing for ORM access, the recovery is much slower, and during that window, every product page that needs a thumbnail is generating it on-demand, blocking the request.

The dashboard goes pale. Marketplace API calls start timing out. The on-call channel lights up.

$The wrong path I tried first

My first instinct was to let Odoo's ORM re-fire the derivative-compute chain manually. For each product, touch the image field (e.g., re-write it to itself) to trigger the derivative regeneration.

# Wrong path. Don't do this on a large catalog.
def regenerate_derivatives_via_orm(env, product_ids):
    for p_id in product_ids:
        product = env["product.template"].browse(p_id)
        # Touch the master to trigger derivative recompute.
        product.image_1920 = product.image_1920
        env.cr.commit()

This works correctly. It just takes 11 minutes per 1,000 products. For 8,623 products: roughly 96 minutes of wall time, during which the production database is under sustained write load and the on-call channel is still lit up.

Worse: the operation is single-threaded against the ORM. You can't easily parallelize it because each product.image_1920 = product.image_1920 triggers the full ORM stack (onchange handlers, compute fields, log audit entries, the works). Trying to parallelize causes lock contention on ir_attachment.

11 minutes per thousand was unacceptable. Time for the right path.

$The right path: PIL bulk-insert direct to ir_attachment

The insight: Odoo's derivative computation is just PIL resizing the master + writing the result to ir_attachment. We can do that ourselves, in parallel, bypassing the ORM, and it's much faster.

The shape:

from PIL import Image
import io, base64
from concurrent.futures import ThreadPoolExecutor

# Derivative sizes Odoo uses.
SIZES = [
    ("image_1024", 1024),
    ("image_512", 512),
    ("image_256", 256),
    ("image_128", 128),
]

def generate_derivatives(master_bytes):
    """Resize the master image into all derivative sizes.
    Returns a dict of size_name -> bytes."""
    img = Image.open(io.BytesIO(master_bytes))
    img = img.convert("RGB")  # Strip alpha to match Amazon's required format.
    out = {}
    for size_name, max_dim in SIZES:
        thumb = img.copy()
        thumb.thumbnail((max_dim, max_dim), Image.LANCZOS)
        buf = io.BytesIO()
        thumb.save(buf, "PNG", optimize=True)
        out[size_name] = buf.getvalue()
    return out

def bulk_insert_derivatives(cr, product_id, master_bytes):
    """Write all derivatives for a product directly to ir_attachment.
    Uses a single transaction with a single connection cursor.
    Bypasses ORM compute-chain entirely."""
    derivatives = generate_derivatives(master_bytes)
    for size_name, blob in derivatives.items():
        cr.execute("""
            INSERT INTO ir_attachment
                (name, res_model, res_field, res_id, type, datas,
                 file_size, mimetype, create_uid, create_date,
                 write_uid, write_date)
            VALUES (
                %s, 'product.template', %s, %s, 'binary', %s,
                %s, 'image/png', 1, NOW(),
                1, NOW()
            )
            ON CONFLICT (res_model, res_field, res_id)
            DO UPDATE SET datas = EXCLUDED.datas,
                          file_size = EXCLUDED.file_size,
                          write_date = NOW()
        """, (
            size_name,
            size_name,
            product_id,
            base64.b64encode(blob),
            len(blob),
        ))

Key choices and why each matters:

$Parallelizing the bulk-insert

The PIL resize is CPU-bound. The database INSERT is I/O-bound. The two are independent. Parallelize the PIL work, batch the inserts:

def recover_catalog(env, product_ids, batch_size=100):
    cr = env.cr
    with ThreadPoolExecutor(max_workers=8) as ex:
        for i in range(0, len(product_ids), batch_size):
            batch = product_ids[i:i + batch_size]
            # Fetch master bytes for the batch.
            masters = fetch_master_bytes(cr, batch)  # {product_id: bytes}

            # Generate derivatives in parallel (CPU-bound).
            futures = {
                ex.submit(generate_derivatives, m_bytes): p_id
                for p_id, m_bytes in masters.items()
            }
            # Write each result to DB as it completes.
            for fut in futures:
                p_id = futures[fut]
                derivatives = fut.result()
                for size_name, blob in derivatives.items():
                    cr.execute(SQL_INSERT, (
                        size_name, size_name, p_id,
                        base64.b64encode(blob), len(blob),
                    ))
            cr.commit()
            print(f"Batch {i // batch_size + 1}: {len(batch)} products")

With 8 PIL workers and 100-product batches, the catalog recovery ran at ~3 seconds per 1,000 products of CPU + ~1 second of DB write. Total wall time for ~32,500 derivatives across 8,623 products: about 4 minutes. Compare to the 96 minutes of ORM-based recovery.

While the bulk-insert runs, customer requests hitting product.image_512 get cache hits as soon as the row is inserted. The recovery is observable in real-time as the on-call channel quiets down.

$What to NOT do

A few attempted shortcuts that don't work:

$The transferable lesson

Odoo's ORM is excellent for transactional business logic. It's bad at bulk image operations on production catalogs at scale. The pattern that works is: bypass the ORM for the bulk path, write direct SQL, generate the side-effect artifacts (derivatives) yourself in parallel.

This is also the pattern that works for: bulk variation-family corrections on Amazon, bulk price re-pushes after a feed rejection, bulk inventory reconciliation after a Walmart sync error. Anywhere "thousands of records, one field change per record, side-effects need to fire" is the spec, the ORM is the wrong tool. Direct SQL plus explicit side-effect handling is the right one.

The cost is that you lose Odoo's validation safety net. That cost is acceptable if you've snapshotted state first, have a watchdog that can flag anomalies, and have audited the SQL against a real schema. Without those guardrails, direct-SQL pattern is a footgun.

By David H. Frost · Frost Labs LLC More writing · Home · Privacy