Digital Product Passport Data Placement: What Goes On-Chain, What Must Not, and What Has to Be Encrypted

The hardest architectural decision in a Digital Product Passport is not which blockchain to anchor against. It is deciding what the blockchain must never see.

Most DPP designs I review have this backwards. The team optimises the chain selection — cost per transaction, finality time, EVM-vs-alternative, rollup-vs-L1. The data model is treated as an afterthought: "we'll put the important fields on-chain for credibility." A year later, a European lawyer reads the system and files a GDPR complaint. Or a retailer notices the passport leaks enough commercial data for a competitor to reconstruct the sourcing arrangement. Or the anchoring bill arrives and the team discovers they are paying four cents per event to immortalise a supplier's name in an immutable public ledger that their lawyer now wants them to unpublish.

Data placement is the architectural decision that either holds the whole system together or quietly poisons it from day one. It is also the decision teams seem least prepared to make, because it sits at the intersection of cryptography, regulatory law, cost engineering, and product design, and there is no single vendor who sells "the right answer". This piece is the version of the briefing I would give a team starting a DPP build, if they asked what has to be decided before they write any on-chain code. It assumes you've already worked through the wider architectural shape — the five DPP layers, the first-mile problem, role-based verification — which I covered in the pillar piece on the first-mile problem. What follows here is the data layer.

The Mistake at the Centre of Most DPP Designs

The reflex is simple: put more on-chain because it feels more trustworthy. GPS coordinates of the harvest. Supplier names. Lab results. Photo hashes inline. Weight and grade and certification IDs in the transaction payload. The logic sounds right — anyone reading the ledger can see the evidence, and nobody can change it.

The logic is wrong in four ways, and the ways compound.

First, blockchain provides integrity, not confidentiality. A public chain is a permanent, replicated, queryable public record. Anyone — competitor, journalist, regulator, hostile nation-state — can download the history and read it. There is no erase. There is no "restrict to these roles". Anything that ends up on-chain is, for practical purposes, forever public, even if it's in an obscure L2 that seems unvisited today.

Second, GDPR treats this literally. A supplier's name, even encoded in a hash, is personal data if it can be linked back to an individual with any practical effort. The "right to be forgotten" is not a blockchain-compatible requirement; once the data is in a public ledger, you cannot delete it. The only defensible architecture is one where personal data never reaches the chain in the first place. Teams that don't design for this are building systems that are prospectively illegal in the EU.

Third, commercial confidentiality is violated silently. A supply-chain passport contains, structurally, the names of every participant in the chain, the prices at every handoff, the volumes moved, the quality grades assigned, the timing of every event. Exposing all of that publicly gives competitors a detailed picture of the sourcing operation. Few teams making DPPs actually want this, but they ship it anyway because they didn't think through what an immutable public ledger does to commercial privacy.

Fourth, the cost compounds. Even cheap L2 anchoring — sub-cent per transaction in the best case — adds up across a supply chain that generates thousands of events per batch. Putting full event payloads on-chain, rather than compact commitments, multiplies the bill by one or two orders of magnitude without improving any property the system actually needs.

The clean architectural response is to invert the default. Start from the assumption that nothing goes on-chain, and then move specific artifacts to specific storage tiers only when the tier's properties match what the artifact needs.

Three Storage Tiers, and What They Each Do

A production DPP architecture separates data across three storage tiers. The distinction isn't technology-level (though IPFS tends to show up a lot); it's about what each tier guarantees and what audiences can read it.

This three-tier model sits downstream of the capture layer covered in the offline-first traceability piece. Every event arriving here already carries its trust tier — the field defined upstream that distinguishes individual-signed, gateway-signed, and device-signed evidence. The storage architecture inherits those tiers and surfaces them to verifiers.

Envelope encryption with role-wrapped keys and the three-tier DPP key hierarchy — content-encryption keys wrapped per role for selective decryption, organisation root keys delegating to role keys delegating to actor and device keys, and the on-chain / public IPFS / encrypted IPFS storage split

Tier 1 — On-chain commitments. The ledger carries only hash commitments, references, and role attestations. Concretely: event ID, batch ID, actor public-key hash, role ID, signature, payload hash, previous event hash, timestamp, trust tier, references to public-IPFS and encrypted-IPFS CIDs. No personal data. No free text. No coordinates. No commercial terms. The on-chain record is a cryptographic fingerprint, not a description. Anyone in the world can verify that a specific event was signed by a specific key at a specific time by a specific role — and nothing else.

Tier 2 — Public off-chain data. Product-level traceability summaries, certifications, aggregated claims, anything a consumer or retailer legitimately needs to see. Stored on content-addressed public storage (plaintext IPFS is the default, though CDN-fronted S3 works for programmes that don't want IPFS operational overhead). Addressable by hash, with the hash committed on-chain. The content is public, but the integrity guarantee comes from the on-chain commitment.

Tier 3 — Encrypted off-chain data. Everything that is personal, commercially sensitive, or regulator-only. Supplier identity, precise GPS, device-level metadata, lab results with personal annotations, price terms, commercial agreements. Stored as encrypted blobs, with the encryption scheme and key management designed so that different verifiers can decrypt different slices. A regulator with the right role key decrypts their slice. A retailer with a different role key decrypts a non-overlapping slice. The CID is committed on-chain for integrity. The content is unreadable without the appropriate key material.

The field-level mapping decision becomes concrete. For a harvest event:

On-chain: event ID, batch ID hash, actor pubkey hash, role ID, signature, payload hash, timestamp, trust tier, references.
Public IPFS: aggregated harvest summary at cooperative granularity — region, week, total tonnage, grade distribution.
Encrypted IPFS: individual supplier ID, precise GPS (coarsened before leaving the device in any case), exact weight, photo evidence, any personal annotations.

The same split applies to lab events, custody handoffs, quality checks, and every other event type. Once you commit to the three-tier model, the question "where does this field go" becomes answerable by policy rather than debate. This is not just a storage optimisation. It is the architectural consequence of a DPP regime that expects different audiences — consumer, retailer, auditor, regulator — to access different evidentiary layers of the same passport.

The Encryption Scheme That Makes Role-Based Access Work

The piece of the architecture that most teams fumble is how Tier 3 actually works. "Encrypt and decrypt with role keys" sounds simple in a design document and gets complicated fast in production.

The working pattern is envelope encryption with role-wrapped keys, which is standard in cloud KMS systems and translates cleanly to DPP architectures.

Each encrypted-IPFS blob is encrypted with a symmetric key (AES-256-GCM is the default) generated fresh per blob. Call this the content-encryption key (CEK). The CEK is used once and discarded after the blob is encrypted.

The CEK is then wrapped — encrypted — once for each role that should be able to read the blob. The wrapping is done with the public key of a role-level keypair. If three roles should have access — the regulator, the cooperative manager, and the buyer — the blob carries three wrapped copies of the CEK, one per role public key.

To decrypt, a holder of a role private key unwraps the CEK for their role, then uses the CEK to decrypt the blob. The scheme has useful properties. Adding a new role to an existing blob just means wrapping the CEK one more time. Revoking a role requires re-encrypting the blob and publishing a new CID (irreversible publication of the old CID is the honest limit of what off-chain encrypted storage can do). The blob itself is the same no matter how many roles can read it — only the wrapped-CEK payload changes.

On top of the symmetric scheme sits the role-to-actor mapping. A role (REGULATOR, COOPERATIVE_MANAGER, BUYER) is represented by an on-chain registry entry that carries the role's public key. Individual actors holding that role are mapped to the role via a separate on-chain record, with valid_from and valid_until timestamps so access can be narrowed to specific windows. When an actor wants to read a blob, they first verify (off-chain or through a backend service) that they currently hold the role, then unwrap the CEK with their share of the role private key.

The role private key itself is not usually held as a monolithic secret. For any high-stakes role, it's distributed across multiple custodians using a threshold-controlled custody scheme — MPC-based key-share management, HSM-backed quorum, or a comparable arrangement — so that no single custodian can unilaterally enable decryption of a role-wrapped blob. This is the same pattern that institutional custody providers use for signing and key-unwrap operations on high-value assets, transposed into the DPP context.

The Key Hierarchy That Keeps This Manageable

A DPP with even modest complexity ends up managing dozens of role keys, hundreds of actor keys, and thousands of device keys. Without a clean hierarchy, key management devolves into a spreadsheet problem that nobody wants to own.

The working three-tier hierarchy:

Organisation root key. One per tenant (sourcing organisation, cooperative umbrella, brand). Held under high-assurance custody — hardware security module, offline ceremony, or threshold MPC with multiple geographically distributed custodians. Used only to sign role-registry updates and emergency revocations. Rotated rarely and with ceremony.
Role keys. Per tenant, per role. The keys that wrap CEKs and sign role-level attestations. Issued under the organisation root, with on-chain provenance. Rotated on a schedule (I use annual as a working default) with explicit rollover periods so that blobs encrypted under the old role key remain readable during transition.
Actor/device keys. Per individual actor or capture device. The keys that sign individual events in the field. Issued under a specific role for a specific validity window. Rotated aggressively — when a device is lost, when a supervisor leaves, when a cooperative member's tenure ends. Short validity windows reduce the blast radius of any compromise.

The hierarchy has a practical property that matters operationally. When a tablet is lost in the field, the actor key tied to that device is revoked by pushing a small on-chain update; all future events signed by that key are rejected. But the blobs that device already signed remain verifiable, because the signature was valid at the time it was made and the on-chain record of the key's validity window shows it was authorised then. Key compromise doesn't invalidate history. That's a different property from "we throw away everything the compromised key ever touched", which is what naive schemes produce.

Selective Disclosure and the DID/VC Path

Once role-wrapped encryption is in place, the question of how different audiences see different slices of a passport resolves cleanly. A complementary pattern worth putting on the table is Decentralised Identifiers (DIDs) and Verifiable Credentials (VCs) as the access-layer abstraction on top of the role model. (I cover that model end to end — identifiers, credentials, wallet and presentation protocols, and key recovery — in a separate guide to decentralised identifiers.)

The relevant VC standards (W3C VC Data Model 2.0, with BBS+ signatures for selective disclosure) allow an actor to present a subset of their attributes to a verifier without revealing the full credential. In a DPP context, this means a supplier can prove they hold a role without revealing their name, or a regulator can prove they have authority to read certain data without identifying the specific reviewer. That selective-disclosure property is what makes DID/VC a natural fit for multi-audience passports, where the same evidence has to satisfy very different reader privileges.

The caveat is tooling maturity. DID methods vary widely in production-readiness, and BBS+ tooling is still new enough that most teams build on simpler patterns first and layer the DID/VC abstraction on later. The architectural commitment to make now is that the role model is primary and the identity layer sits on top of it — not the other way around, which traps teams in identity-stack dependencies before the rest of the architecture is stable.

Merkle Batching: Why You Are Not Anchoring Every Event

If every single event hits the chain as its own transaction, the anchoring bill becomes the dominant cost in the system at any meaningful volume. The working pattern is to batch events and anchor only the Merkle root.

Concretely: events accumulate at the backend for a batching window (one minute, five minutes, one hour — varies by programme). At the end of the window, the backend constructs a Merkle tree over the events, computes the root, and submits a single transaction anchoring the root. An off-chain index maintains the Merkle proofs so that any single event can later be verified against the anchored root.

The cost math is straightforward. Anchoring a 32-byte root on a typical L2 (Arbitrum, Optimism, Base, Polygon) costs a few cents in the current fee environment. Anchoring 1,000 events per window, by contrast, would cost an order of magnitude more — and scales linearly with volume. The Merkle batching pattern keeps on-chain cost roughly constant regardless of event volume, at the price of a few minutes of latency between capture and anchoring.

The pattern has a useful property for the offline-capture case. Events captured in the field don't get anchored until they sync. That means the anchoring batch can cover events captured over a multi-day window, which is exactly what happens when a supervisor's tablet syncs at the end of a weekly route. The architecture absorbs the latency naturally.

The main complication is the Merkle proof index. Once the root is on-chain, you need to be able to reconstruct the inclusion proof for any specific event later. That requires a persistent off-chain index — the Merkle tree structure, or enough of it to generate proofs on demand. The index is not cryptographically sensitive (it contains only hashes and structure), but it is operationally critical. Lose the index, and on-chain anchoring becomes worthless for individual-event verification. Plan for the index to be backed up, replicated, and audited just like any other production data store.

Two Paths: MVP vs Fully Encrypted

In practice, teams don't usually ship a fully encrypted role-wrapped architecture on day one. The gap between "good enough for pilot" and "production-grade for regulators" is wide enough that a staged rollout makes sense, and being honest about which stage a given deployment is at saves a lot of marketing-versus-reality conflict later. The right path also depends on which capture mix the programme actually uses — a programme dominated by supervisor tablets has different encryption pressure than one mixing SMS gateways with IoT nodes, and each path's evidence profile is laid out in the capture-layer piece.

The two working paths I've seen teams take:

Path A — MVP, operationally honest. On-chain carries commitments and references. Public off-chain carries what consumers need. Private data lives in a traditional encrypted database (PostgreSQL with column-level encryption, or similar) held by the sourcing organisation, with access controlled by backend permissions. No role-wrapped keys. No encrypted IPFS. The private data is still protected — by the organisation's security perimeter, not by cryptography visible on the chain. Regulators can request access through the organisation's legal channel. This is sufficient for pilots, for early certifications, and for programmes where the buying side hasn't yet demanded cryptographic role-based access.

Path B — Production, fully crypto-native. On-chain commitments. Public IPFS for consumer-readable data. Encrypted IPFS with role-wrapped keys for everything personal or sensitive. Full three-tier key hierarchy. DID/VC abstraction where useful. Role registry on-chain with valid-from/valid-until windows. This is what the system eventually has to become if it wants to serve the full range of verifiers — regulators, retailers, auditors, consumers — with cryptographically enforced role boundaries.

Most programmes ship Path A for the first 12–18 months and migrate to Path B as regulatory requirements harden and buyer demands mature. That migration is easier if Path A was designed with Path B in mind — specifically, if the field-level mapping decisions (what goes to which tier) were made correctly from day one, even if the encryption layer was still backend-enforced rather than cryptographically enforced.

The migration that kills programmes is the one that retrofits data minimisation into a Path A system that put everything on-chain because "blockchain = trust". That migration is not a migration; it's a rebuild. Avoid it by treating data placement as a Day One architectural commitment, independent of which encryption path you ship first.

Retention, Coarsening, and the Things You Don't Write Down

Two smaller but consequential design points before I close.

Retention: any personal or sensitive data held in Tier 3 needs a declared retention policy. Seven years is the default I work with for commercial supply-chain data, matching the upper end of tax and contract retention requirements across the EU. After retention expires, the encrypted blobs are destroyed — not the on-chain commitment (that's permanent), but the decryption key and the ciphertext. A verifier looking at post-retention data sees an on-chain commitment to a blob that no longer exists. The audit trail's existence is preserved; the underlying evidence is gone. One honest caveat: if the encrypted blob was already widely pinned or cached by third parties — which is the normal behaviour of public IPFS networks — "destruction" means losing managed access and key material, not a guarantee that every byte has vanished from the network. The architectural defence is that without the keys, the ciphertext is noise; the operational discipline is to minimise what gets pinned externally in the first place, and to treat key destruction as the real mechanism of retention enforcement. This is the cleanest way I know to reconcile blockchain permanence with regulatory retention limits.

Coarsening: any data that can be coarsened before it leaves the device, should be. Precise GPS becomes district-level. Exact timestamps become hour-rounded unless hourly precision is needed. Identity fields become role references rather than names wherever the downstream verifier only needs the role. The rule of thumb: if the verifier's legitimate question can be answered at coarser granularity, that's the only granularity that ever leaves the device. This reduces Tier 3 volume, reduces the blast radius of any compromise, and reduces the GDPR surface.

What This Means for DPP Builders

A few concrete takeaways.

First, make the tier mapping a Day One deliverable. For every field in your data model, declare the tier (on-chain commitment, public off-chain, encrypted off-chain) before you write any anchoring code. This single exercise surfaces more design problems than any other architecture-review artefact I know of.

Second, default to "not on-chain". The burden of proof is on any field that wants to live on the ledger. If the field's properties (public, integrity-critical, non-personal) don't clearly justify on-chain storage, it goes off-chain.

Third, build the role registry and the key hierarchy before you build the encryption layer. The temptation is to ship encryption first and sort out the role model later. That inverts the dependency; the encryption layer depends on the role model, not the other way around.

Fourth, separate the MVP encryption path from the production encryption path deliberately. Ship Path A if you need to ship fast, but design the tier mapping to Path B standards from the start. The migration from A to B is tractable if the data placement is right, and impossible if it isn't.

Fifth, treat retention and coarsening as architectural commitments, not product-manager checkboxes. The regulatory environment that ESPR-adjacent programmes will operate in takes both seriously, and retrofitting either into a mature system is painful.

If you're scoping the data layer of a DPP — designing the on-chain/off-chain split, the role hierarchy, or the envelope-encryption scheme — or if you're migrating from a Path A backend-enforced model toward production-grade role-wrapped encryption, I work with a small number of teams per quarter on data-placement architecture reviews and key-hierarchy design. The fastest way to reach me is through the contact page.

None of this is the interesting part of a DPP for engineers. The interesting part is always the chain, the cryptography, the wallet stack. But the part that determines whether the system holds up under examination — whether the passport it produces is a credible, defensible, legally-operable record or a collection of unexplainable liabilities — is this one. Data placement is where DPPs live or die. Everything else is commentary.