Apache Iceberg 1.11.0 Release: Deletion Vectors, Variant Type, and V3 Maturity

The Apache Iceberg community has officially rolled out version 1.11.0, and it marks a massive architectural milestone. While prior releases laid the groundwork for the V3 specification, this release is the signal that these advanced features are hardened, stabilized, and ready for high-scale enterprise production.

If you are running large-scale data lakehouses, you are likely intimately familiar with the operational pain points of the past: positional delete bloat, the headache of semi-structured data performance, and rigid ecosystem file formats.

This release systematically targets those bottlenecks. Let's break down exactly what has changed under the hood, how it plays out in your Spark pipelines, and what it looks like in practice.

Deletion Vectors Ready for Prime Time

If you run high-frequency transactional workloads-like streaming ingest, CDC pipelines, or heavy MERGE INTO operations-on Iceberg v2 tables, you have probably run into the dreaded positional delete file accumulation problem.

In the traditional Merge-on-Read (MoR) model, every single row-level update or delete generates a positional delete file. Over time, a single data file can end up tied to dozens of small fractional delete files. At read time, Spark has to open, read, and cross-reference all of these files just to mask out the deleted rows. This causes painful query degradation and massive metadata overhead until a heavy compaction job can finally clean it up.

Iceberg 1.11.0 stabilizes Deletion Vectors (DVs) to eliminate this exact issue. Instead of writing separate, fragmented delete files, the engine utilizes a highly compressed Roaring bitmap stored within a Puffin file format.

There is a strict, clean 1:1 relationship between a data file and its corresponding deletion vector. When a row is deleted, the bitmap is updated directly. At read time, Spark applies this bitmap mask instantly, bypassing the file-open and search overhead entirely.

To put this to work in your Spark pipelines, you just need to ensure your table is explicitly leveraging the V3 capabilities. When configuring your tables in Spark SQL, you can set your write strategies to natively handle these row-level deletion vectors:

-- Create a production-ready V3 table with Deletion Vectors enabled
CREATE TABLE prod.logistics.orders (
    order_id BIGINT,
    customer_id STRING,
    status STRING,
    updated_at TIMESTAMP
) 
USING iceberg
TBLPROPERTIES (
    'format-version' = '3',
    'write.delete.mode' = 'merge-on-read',
    'write.update.mode' = 'merge-on-read',
    'write.delete.format' = 'puffin' -- Directs the engine to use Puffin for bitmap storage
);

When you execute standard operational queries, Spark manages the underlying Roaring bitmaps automatically without littering your S3 or ADLS storage paths with tiny files:

-- Spark executes this by modifying a single 1:1 Deletion Vector bitmap
MERGE INTO prod.logistics.orders target
USING staging.orders source
ON target.order_id = source.order_id
WHEN MATCHED THEN 
  UPDATE SET target.status = source.status, target.updated_at = source.updated_at;

Native Variant Type for Semi-Structured Data

Handling flexible JSON data in a data lake has always felt like a lose-lose tradeoff. You either store the JSON as a raw string-which keeps your schema completely flexible but forces the CPU to parse the full string for every single row during a query-or you completely flatten the schema, which breaks your downstream pipelines the second an upstream app changes a nested field.

The introduction of the native Variant type gives you the best of both worlds. It is a first-class binary encoding format for semi-structured data that preserves structural flexibility while enabling predicate pushdown directly into the binary structure. It also introduces shredding, where frequently accessed nested paths are automatically optimized into sub-columns under the hood while maintaining a completely unified schema view.

For data engineers writing Spark pipelines, this means you can ingest, store, and query complex payloads natively without writing piles of explosive flattening boilerplate.

-- Creating a table with a flexible payload column
CREATE TABLE prod.ecommerce.events (
    event_id STRING,
    event_time TIMESTAMP,
    payload VARIANT
) 
USING iceberg
TBLPROPERTIES ('format-version' = '3');

When you query nested data within that variant column, Spark can push down your filters before scanning the full document structure, significantly dropping disk I/O and processing times:

-- Querying nested fields with full predicate pushdown support
SELECT event_id, payload.variant_column['device']['os'] AS os_type
FROM prod.ecommerce.events
WHERE payload.variant_column['customer']['region'] = 'APAC';

Decoupling with the File Format API

Historically, Iceberg's engine integrations had hardcoded paths to handle specific physical file formats like Parquet, ORC, and Avro. If the broader data ecosystem developed a new format optimized for highly specific workloads, integrating it into Iceberg required rewriting large chunks of engine-specific code.

Iceberg 1.11.0 finalizes the File Format API, decoupling the query engines from the underlying physical storage layouts via a clean plugin model.

This structural shift opens up two massive avenues for future architectural designs:

AI and GPU-Optimized Formats: It paves the way for seamless integration of modern formats (like Lance or Vortex) engineered specifically for machine learning, vector embeddings, and random-access patterns.
Column Families: It allows for vertically split storage layouts where different groups of columns can be read or updated independently, minimizing write amplification and keeping metadata footers incredibly small.

Infrastructure and Runtime Upgrades

Beyond the core V3 features, this release includes critical housekeeping to modernize the runtime environment:

Dropping Legacy Spark Support: The codebase continues to tighten its integrations, prioritizing modern Spark optimizations while dropping support for Spark 3.4.
JDK 17 Baseline: Support for Java 11 has been dropped. The Iceberg build and runtime environment now natively target JDK 17, bringing better runtime performance, better container garbage collection, and modern language features to your distributed clusters.
Nanosecond Precision: You now have native support for nanosecond-precision timestamps (timestamp_ns and timestamptz_ns), which is a massive win if you are managing high-frequency financial data or precision IoT logging.

Wrapping Up

Apache Iceberg 1.11.0 is less about experimental concepts and more about engineering maturity. The convergence of stabilized Deletion Vectors and the Variant type means the historical compromise between fast streaming ingestion and fast analytical query speed is finally disappearing. If you are currently designing or optimizing a lakehouse platform on Spark, moving to 1.11.0 and adopting V3 format guidelines is the logical next step for your roadmap.

All of these advancements deliver real impact only when your lakehouse platform actually puts them to work. Dataverses already supports Iceberg v1.11 with full V3 capabilities across our managed lakehouse platform, so you get:

Deletion Vectors out of the box - Roaring bitmap-based delete handling without manual compaction tuning.
Native Variant type support - Semi-structured JSON payloads with predicate pushdown, no flattening required.
Automatic V3 optimization - Your tables are configured with best-practice V3 settings the moment you ingest data.
Zero-ops compaction - Dataverses continuously monitors and optimizes your Iceberg tables, eliminating positional delete bloat automatically.

While the open-source community celebrates this milestone, Dataverses customers are already running production workloads on it - benefiting from faster queries, lower storage overhead, and a truly no-ops Iceberg experience.

Ready to experience Iceberg v1.11 at its full potential? Start your free 14-day trial today and see what a managed, V3-optimized lakehouse can do for your data pipelines.

Start Free Trial Schedule a Demo

This release systematically targets those bottlenecks. Let's break down exactly what has changed under the hood, how it plays out in your Spark pipelines, and what it looks like in practice.

Deletion Vectors Ready for Prime Time

-- Create a production-ready V3 table with Deletion Vectors enabled
CREATE TABLE prod.logistics.orders (
    order_id BIGINT,
    customer_id STRING,
    status STRING,
    updated_at TIMESTAMP
) 
USING iceberg
TBLPROPERTIES (
    'format-version' = '3',
    'write.delete.mode' = 'merge-on-read',
    'write.update.mode' = 'merge-on-read',
    'write.delete.format' = 'puffin' -- Directs the engine to use Puffin for bitmap storage
);

When you execute standard operational queries, Spark manages the underlying Roaring bitmaps automatically without littering your S3 or ADLS storage paths with tiny files:

-- Spark executes this by modifying a single 1:1 Deletion Vector bitmap
MERGE INTO prod.logistics.orders target
USING staging.orders source
ON target.order_id = source.order_id
WHEN MATCHED THEN 
  UPDATE SET target.status = source.status, target.updated_at = source.updated_at;

Native Variant Type for Semi-Structured Data

For data engineers writing Spark pipelines, this means you can ingest, store, and query complex payloads natively without writing piles of explosive flattening boilerplate.

-- Creating a table with a flexible payload column
CREATE TABLE prod.ecommerce.events (
    event_id STRING,
    event_time TIMESTAMP,
    payload VARIANT
) 
USING iceberg
TBLPROPERTIES ('format-version' = '3');

When you query nested data within that variant column, Spark can push down your filters before scanning the full document structure, significantly dropping disk I/O and processing times:

-- Querying nested fields with full predicate pushdown support
SELECT event_id, payload.variant_column['device']['os'] AS os_type
FROM prod.ecommerce.events
WHERE payload.variant_column['customer']['region'] = 'APAC';

Decoupling with the File Format API

Iceberg 1.11.0 finalizes the File Format API, decoupling the query engines from the underlying physical storage layouts via a clean plugin model.

This structural shift opens up two massive avenues for future architectural designs:

AI and GPU-Optimized Formats: It paves the way for seamless integration of modern formats (like Lance or Vortex) engineered specifically for machine learning, vector embeddings, and random-access patterns.
Column Families: It allows for vertically split storage layouts where different groups of columns can be read or updated independently, minimizing write amplification and keeping metadata footers incredibly small.

Infrastructure and Runtime Upgrades

Beyond the core V3 features, this release includes critical housekeeping to modernize the runtime environment:

Dropping Legacy Spark Support: The codebase continues to tighten its integrations, prioritizing modern Spark optimizations while dropping support for Spark 3.4.
JDK 17 Baseline: Support for Java 11 has been dropped. The Iceberg build and runtime environment now natively target JDK 17, bringing better runtime performance, better container garbage collection, and modern language features to your distributed clusters.
Nanosecond Precision: You now have native support for nanosecond-precision timestamps (timestamp_ns and timestamptz_ns), which is a massive win if you are managing high-frequency financial data or precision IoT logging.

Wrapping Up

Deletion Vectors out of the box - Roaring bitmap-based delete handling without manual compaction tuning.
Native Variant type support - Semi-structured JSON payloads with predicate pushdown, no flattening required.
Automatic V3 optimization - Your tables are configured with best-practice V3 settings the moment you ingest data.
Zero-ops compaction - Dataverses continuously monitors and optimizes your Iceberg tables, eliminating positional delete bloat automatically.

Ready to experience Iceberg v1.11 at its full potential? Start your free 14-day trial today and see what a managed, V3-optimized lakehouse can do for your data pipelines.

Start Free Trial Schedule a Demo

Apache Iceberg 1.11.0 Release: Deletion Vectors, Variant Type, and V3 Maturity

Deletion Vectors Ready for Prime Time

Native Variant Type for Semi-Structured Data

Decoupling with the File Format API

Infrastructure and Runtime Upgrades

Wrapping Up

Tags

Share this article

Keep up with us

Contents in this story

Recommended for you

Announcing Apache Spark 4.2.0: Geospatial Intelligence, First-Class CDC, DSv2 Transactions, and Arrow-Powered PySpark

Code Smarter, Not Harder: Meet the New Notebook Code Generation on Dataverses

Spark Declarative Pipelines in Apache Spark 4.1: A Complete Guide

More articles you might like

Announcing Apache Spark 4.2.0: Geospatial Intelligence, First-Class CDC, DSv2 Transactions, and Arrow-Powered PySpark

Code Smarter, Not Harder: Meet the New Notebook Code Generation on Dataverses

Spark Declarative Pipelines in Apache Spark 4.1: A Complete Guide

Iceberg Summit 2026: The Open Table Format That's Powering the Next Generation of Data Lakehouses

Apache Iceberg 1.11.0 Release: Deletion Vectors, Variant Type, and V3 Maturity

Deletion Vectors Ready for Prime Time

Native Variant Type for Semi-Structured Data

Decoupling with the File Format API

Infrastructure and Runtime Upgrades

Wrapping Up

Tags

Share this article

Keep up with us

Contents in this story

Recommended for you

Announcing Apache Spark 4.2.0: Geospatial Intelligence, First-Class CDC, DSv2 Transactions, and Arrow-Powered PySpark

Code Smarter, Not Harder: Meet the New Notebook Code Generation on Dataverses

Spark Declarative Pipelines in Apache Spark 4.1: A Complete Guide

More articles you might like

Announcing Apache Spark 4.2.0: Geospatial Intelligence, First-Class CDC, DSv2 Transactions, and Arrow-Powered PySpark

Code Smarter, Not Harder: Meet the New Notebook Code Generation on Dataverses

Spark Declarative Pipelines in Apache Spark 4.1: A Complete Guide

Iceberg Summit 2026: The Open Table Format That's Powering the Next Generation of Data Lakehouses