feat(spark): datafusion-spark-bridge Rust SDK (2/6)#105
Conversation
…e-common Move the standalone `native` crate into a root Cargo workspace and extract shared JNI plumbing (error->exception mapping, Tokio runtime singleton, StreamingReader) into a new `datafusion-jni-common` crate under `native-common/`. `native/src/errors.rs` moves to `native-common/src/errors.rs`; the nine native modules now import error/runtime helpers from `datafusion_jni_common`. Build glue follows: single root `Cargo.lock`, `.cargo/config.toml` redirects output to `rust-target/`, Makefile/CI/poms updated to build `--workspace` and target `-p datafusion-jni`. Core javadoc build commands updated to match. Pure refactor; no behavior change. First of a 6-PR stack splitting the Spark DataSource V2 connector work. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New `spark/bridge` workspace crate providing the `export_bridge!` macro that generates the six JNI entry points a Spark connector bridge exposes (providerSchemaIpc, createScan, partitionCount, executeStreamPartition, executeStream, closeScan). Includes the options decoder, scan planning/execution glue, and the Arrow type-widening layer (wraps any TableProvider for Spark type compatibility). Self-contained SDK with no Java/Scala coupling. Depends only on datafusion-jni-common. Second of the 6-PR connector stack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Repo-layout docs follow-up (from #104 review): PR #104 trimmed
|
Address review feedback on the workspace-foundation PR: - development.md: trim the repo-layout section to the crates this PR actually ships (native, native-common). It was forward-referencing spark/, spark/bridge, datafusion-spark-bridge, and examples/native -- none of which exist until later PRs in the stack -- and called the member list "three" while listing four. Later PRs (apache#105/apache#106/apache#107/apache#109) carry notes to re-add their own slice when those dirs land. - rat_exclude_files.txt: the Rust lockfile moved to the workspace root, so the stale native/Cargo.lock entry left the root Cargo.lock with no RAT exclude for the source-tarball check (check-rat-report.py). Point it at Cargo.lock. - native-common: dedupe the panic-payload downcast -- StreamingReader::next now calls errors::panic_message instead of repeating the String/&str match inline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Fold-in from #104 review — avro feature consistency
In this PR, datafusion = { workspace = true } # no avro
datafusion-jni-common = { path = "../../native-common" } # no avro featureThat's correct today — both are avro-off, so The footgun: the moment the bridge's To fold in here (or before this lands):
Ref: #104 (1/6) review. |
|
Please rebase this one when you can @timsaucer |
|
Thanks @andygrove . I’m going to leave it in draft for a bit while I review Dewey’s counter proposal. |
Purpose
Add the
datafusion-spark-bridgeRust SDK — the foundation a domain connector ("bridge") builds on. Itsexport_bridge!macro generates the JNI entry points the JVM side calls (schema probe, scan create/execute/close); it also bundles the options decoder, scan glue, and the Arrow type-widening layer for Spark type compatibility.Self-contained; depends only on
native-common.🤖 Generated with Claude Code