feat(spark): bridge scaffold generator (5/6) by timsaucer · Pull Request #108 · apache/datafusion-java

timsaucer · 2026-06-12T11:39:16Z

Stacked PR series (6 parts) — splitting the Spark DataSource V2 connector.
All six target main. They build on each other, so review and merge in order — until the earlier parts merge, this PR's diff includes their changes too.

build: Cargo workspace + native-common extraction (1/6) #104 — Cargo workspace + native-common extraction

feat(spark): datafusion-spark-bridge Rust SDK (2/6) #105 — datafusion-spark-bridge Rust SDK

feat(spark): connector Java SPI module (3/6) #106 — Spark connector Java SPI

feat(spark): DataSource V2 connector, Scala (4/6) #107 — Spark DataSource V2 connector (Scala)
➤ 5. feat(spark): bridge scaffold generator (5/6) #108 — Bridge scaffold generator

feat(examples): end-to-end Spark bridge demo (6/6) #109 — End-to-end examples

Purpose

Add new_bridge.py + its template — a stdlib-only generator that stamps out a standalone Maven+Cargo bridge project wired to the SDK: a Rust cdylib with export_bridge! and a demo provider, the Java SPI classes, a shaded-jar pom that bundles the cdylib, and a pyspark smoke test. Start a new connector in one command.

🤖 Generated with Claude Code

…e-common Move the standalone `native` crate into a root Cargo workspace and extract shared JNI plumbing (error->exception mapping, Tokio runtime singleton, StreamingReader) into a new `datafusion-jni-common` crate under `native-common/`. `native/src/errors.rs` moves to `native-common/src/errors.rs`; the nine native modules now import error/runtime helpers from `datafusion_jni_common`. Build glue follows: single root `Cargo.lock`, `.cargo/config.toml` redirects output to `rust-target/`, Makefile/CI/poms updated to build `--workspace` and target `-p datafusion-jni`. Core javadoc build commands updated to match. Pure refactor; no behavior change. First of a 6-PR stack splitting the Spark DataSource V2 connector work. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

New `spark/bridge` workspace crate providing the `export_bridge!` macro that generates the six JNI entry points a Spark connector bridge exposes (providerSchemaIpc, createScan, partitionCount, executeStreamPartition, executeStream, closeScan). Includes the options decoder, scan planning/execution glue, and the Arrow type-widening layer (wraps any TableProvider for Spark type compatibility). Self-contained SDK with no Java/Scala coupling. Depends only on datafusion-jni-common. Second of the 6-PR connector stack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Introduce the `spark` Maven module and the pure-Java contracts a bridge implements: BridgeProviderFactory (no-arg factory + scanBackend()), ScanBackend (delegates to the bridge's JNI methods), NativeLibraryLoader (cdylib extraction/loading), OptionsCodec (cross-language options encoder), PartitionInfo (one entry per Spark task), and ReportedPartitioning (optional shuffle-elision declaration). Compiles standalone with no Scala main yet. Includes the two SPI-only tests (OptionsCodecTest, BridgeProviderFactoryDefaultsTest). Third of the 6-PR stack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The connector implementation on top of the Java SPI and the bridge SDK: DatafusionSource/Table/Scan/ScanBuilder DSv2 wiring, per-partition columnar read path (FfiStream + Arrow->Spark batch conversion), V2 predicate pushdown (SparkPredicateTranslator), shared-scan mode with a per-executor refcounted cache (SharedScanCache, SharedScanPartitionReader, NativeSharedScanResources, PinnedSessionConfig), and SupportsReportPartitioning for shuffle elision. These pieces share the DatafusionScanMode sealed trait and the scan builder, so they land together. Includes the connector test suite and the module README. DataSourceRegister SPI file registers DatafusionSource. Fourth of the 6-PR stack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add `spark/scaffold/new_bridge.py` plus the `bridge-template/` it stamps out: a standalone Maven+Cargo bridge project wired to the datafusion-spark-bridge SDK — a Rust cdylib with `export_bridge!` + a demo in-memory provider, the four Java classes, the DataSourceRegister service file, a shaded-jar pom that bundles the cdylib, and a pyspark smoke test. Stdlib-only generator. Standalone tooling. Fifth of the 6-PR stack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

timsaucer · 2026-06-12T12:10:34Z

Heads-up from the foundation split (01-workspace-foundation): the RAT exclude <exclude>spark/scaffold/bridge-template/**</exclude> was removed from the root pom.xml there because it was forward-referencing files that do not exist until this PR. This PR (05) is the first to add spark/scaffold/bridge-template/** — scaffold templates stamped into user projects by spark/scaffold/new_bridge.py, which must not carry ASF headers.

Action required: re-add the exclude to the root pom.xml RAT <excludes> block in this PR, or apache-rat:check will flag the template files as unapproved once 01 is rebased in.

<!-- Bridge scaffold templates: stamped into USER projects by
     spark/scaffold/new_bridge.py, which must not impose ASF headers on them -->
<exclude>spark/scaffold/bridge-template/**</exclude>

timsaucer and others added 5 commits June 12, 2026 13:23

timsaucer marked this pull request as draft June 12, 2026 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spark): bridge scaffold generator (5/6)#108

feat(spark): bridge scaffold generator (5/6)#108
timsaucer wants to merge 5 commits into
apache:mainfrom
timsaucer:split/05-bridge-scaffold

timsaucer commented Jun 12, 2026 •

edited

Loading

Uh oh!

timsaucer commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timsaucer commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

timsaucer commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timsaucer commented Jun 12, 2026 •

edited

Loading