Skip to content

feat(spark): datafusion-spark-bridge Rust SDK (2/6)#105

Draft
timsaucer wants to merge 2 commits into
apache:mainfrom
timsaucer:split/02-spark-bridge
Draft

feat(spark): datafusion-spark-bridge Rust SDK (2/6)#105
timsaucer wants to merge 2 commits into
apache:mainfrom
timsaucer:split/02-spark-bridge

Conversation

@timsaucer

@timsaucer timsaucer commented Jun 12, 2026

Copy link
Copy Markdown
Member

Stacked PR series (6 parts) — splitting the Spark DataSource V2 connector.
All six target main. They build on each other, so review and merge in order — until the earlier parts merge, this PR's diff includes their changes too.

  1. build: Cargo workspace + native-common extraction (1/6) #104 — Cargo workspace + native-common extraction
    ➤ 2. feat(spark): datafusion-spark-bridge Rust SDK (2/6) #105datafusion-spark-bridge Rust SDK
  2. feat(spark): connector Java SPI module (3/6) #106 — Spark connector Java SPI
  3. feat(spark): DataSource V2 connector, Scala (4/6) #107 — Spark DataSource V2 connector (Scala)
  4. feat(spark): bridge scaffold generator (5/6) #108 — Bridge scaffold generator
  5. feat(examples): end-to-end Spark bridge demo (6/6) #109 — End-to-end examples

Purpose

Add the datafusion-spark-bridge Rust SDK — the foundation a domain connector ("bridge") builds on. Its export_bridge! macro generates the JNI entry points the JVM side calls (schema probe, scan create/execute/close); it also bundles the options decoder, scan glue, and the Arrow type-widening layer for Spark type compatibility.

Self-contained; depends only on native-common.

🤖 Generated with Claude Code

timsaucer and others added 2 commits June 12, 2026 13:23
…e-common

Move the standalone `native` crate into a root Cargo workspace and extract
shared JNI plumbing (error->exception mapping, Tokio runtime singleton,
StreamingReader) into a new `datafusion-jni-common` crate under `native-common/`.
`native/src/errors.rs` moves to `native-common/src/errors.rs`; the nine native
modules now import error/runtime helpers from `datafusion_jni_common`.

Build glue follows: single root `Cargo.lock`, `.cargo/config.toml` redirects
output to `rust-target/`, Makefile/CI/poms updated to build `--workspace` and
target `-p datafusion-jni`. Core javadoc build commands updated to match.

Pure refactor; no behavior change. First of a 6-PR stack splitting the Spark
DataSource V2 connector work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New `spark/bridge` workspace crate providing the `export_bridge!` macro that
generates the six JNI entry points a Spark connector bridge exposes
(providerSchemaIpc, createScan, partitionCount, executeStreamPartition,
executeStream, closeScan). Includes the options decoder, scan planning/execution
glue, and the Arrow type-widening layer (wraps any TableProvider for Spark type
compatibility).

Self-contained SDK with no Java/Scala coupling. Depends only on
datafusion-jni-common. Second of the 6-PR connector stack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timsaucer

Copy link
Copy Markdown
Member Author

Repo-layout docs follow-up (from #104 review): PR #104 trimmed docs/source/contributor-guide/development.md to only what the foundation ships (native, native-common). When this PR lands spark/bridge, re-add its slice of that doc:

  • add spark/bridge to the Cargo.toml workspace-members bullet;
  • add a spark/bridge/ bullet describing the datafusion-spark-bridge Rust SDK (widening, scan machinery, export_bridge!).

timsaucer added a commit to timsaucer/datafusion-java that referenced this pull request Jun 12, 2026
Address review feedback on the workspace-foundation PR:

- development.md: trim the repo-layout section to the crates this PR
  actually ships (native, native-common). It was forward-referencing
  spark/, spark/bridge, datafusion-spark-bridge, and examples/native --
  none of which exist until later PRs in the stack -- and called the
  member list "three" while listing four. Later PRs (apache#105/apache#106/apache#107/apache#109)
  carry notes to re-add their own slice when those dirs land.

- rat_exclude_files.txt: the Rust lockfile moved to the workspace root,
  so the stale native/Cargo.lock entry left the root Cargo.lock with no
  RAT exclude for the source-tarball check (check-rat-report.py). Point
  it at Cargo.lock.

- native-common: dedupe the panic-payload downcast -- StreamingReader::next
  now calls errors::panic_message instead of repeating the String/&str
  match inline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timsaucer

Copy link
Copy Markdown
Member Author

Fold-in from #104 review — avro feature consistency

native-common/src/errors.rs gates the DataFusionError::AvroError(_) => IoException arm behind native-common's own avro feature (avro = ["datafusion/avro"]). That feature is independent of whether the unified datafusion in a given build graph actually has avro on.

In this PR, spark/bridge/Cargo.toml pulls:

datafusion = { workspace = true }                       # no avro
datafusion-jni-common = { path = "../../native-common" } # no avro feature

That's correct today — both are avro-off, so AvroError can't be produced and the missing arm doesn't matter. Each cdylib also statically links its own native-common copy, so native's avro-on graph doesn't leak in.

The footgun: the moment the bridge's datafusion gains avro (directly, or transitively via some dep) without also enabling datafusion-jni-common's avro feature, an Avro read error will fall through native-common's _ => DataFusionException catch-all instead of mapping to IoException — a silent misclassification, no compile error.

To fold in here (or before this lands):

  • Keep the invariant explicit: a consumer must enable datafusion-jni-common/avro iff its datafusion has avro.
  • Consider a compile-time guard in native-common (e.g. compile_error! when datafusion/avro is detected on but native-common's avro is off) so the two can't drift silently.

Ref: #104 (1/6) review.

@andygrove

Copy link
Copy Markdown
Member

Please rebase this one when you can @timsaucer

@timsaucer

Copy link
Copy Markdown
Member Author

Thanks @andygrove . I’m going to leave it in draft for a bit while I review Dewey’s counter proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants