.NET bindings for Apache DataFusion, a fast, extensible query engine built on Apache Arrow for high-performance analytical query processing.
Note: This is an independent community project and is not officially associated with or endorsed by the Apache Software Foundation or the Apache DataFusion project.
| Component | Feature | Status | Notes |
|---|---|---|---|
| Runtime | Tokio runtime | ✅ | Configurable threads, supports multiple instances |
| Logger | ✅ | Configurable with log levels | |
| Session | Create session context | ✅ | |
| Execute SQL queries | ✅ | Returns DataFrame, supports parameters | |
| Data Sources | CSV (read/write) | ✅ | RegisterCsvAsync |
| Parquet (read/write) | ✅ | RegisterParquetAsync |
|
| JSON (read/write) | ✅ | RegisterJsonAsync |
|
| RecordBatch tables | ✅ | RegisterBatch |
|
| Object Store | Local filesystem | ✅ | |
| Amazon S3 | ✅ | ||
| Azure Blob Storage | ✅ | ||
| Google Cloud Storage | ✅ | ||
| InMemory | ✅ | ||
| DataFrame | Count rows | ✅ | CountAsync() |
| Get schema | ✅ | GetSchema() → Arrow Schema |
|
| Collect all data | ✅ | CollectAsync() → RecordBatches |
|
| Stream results | ✅ | ExecuteStreamAsync() → IAsyncEnumerable |
|
| Show/print | ✅ | ShowAsync(), ToStringAsync() |
|
| Arrow | Apache Arrow support | ✅ | Via Apache.Arrow nuget package |
| Zero copy support | ✅ | ||
| ADO.NET | ADO.NET interface | ✅ | ADO.NET interface to support libs like Dapper |
| Advanced | UDF registration | ❌ | |
| Catalog management | ❌ | ||
| Table providers | ❌ | ||
| Platforms | Linux x64 | ✅ | |
| Linux arm64 | ✅ | ||
| Windows x64 | ✅ | ||
| macOS arm64 | ✅ |
✅ Implemented 🟡 Partially implemented ❌ Not yet implemented
dotnet add package DataFusionSharpFor ADO.NET integration, also install the companion package:
dotnet add package DataFusionSharp.Datausing DataFusionSharp;
// Create runtime, which manages Tokio runtime and native resources
using var runtime = DataFusionRuntime.Create();
// Create session context, which manages query execution and state
using var context = runtime.CreateSessionContext();
// Register a CSV file as a table (supports CSV, Parquet, JSONL)
await context.RegisterCsvAsync("orders", "path/to/orders.csv");
// await context.RegisterParquetAsync("orders", "path/to/orders.parquet");
// await context.RegisterJsonAsync("orders", "path/to/orders.json");
// Execute SQL query
using var df = await context.SqlAsync(
"""
SELECT customer_id, sum(amount) AS total
FROM orders
GROUP BY customer_id
""");
// Display results to console
await df.ShowAsync();
// Access schema
var schema = df.GetSchema();
foreach (var field in schema.FieldsList)
... // Process schema field (name, type, etc.)
// Collect as Arrow batches
using var collectedData = await df.CollectAsync();
foreach (var batch in collectedData.Batches)
... // Process Arrow RecordBatch...
// Collect as stream of Arrow batches
using var stream = await df.ExecuteStreamAsync();
await foreach (var batch in stream)
... // Process streamed RecordBatch...See examples/ for more details.
using DataFusionSharp.Data;
using Dapper;
// Wrap any SessionContext as a standard DbConnection
await using var connection = session.AsConnection();
// Use Dapper (or any ADO.NET library) as usual
var results = await connection.QueryAsync<OrderSummary>(
"""
SELECT customer_name AS CustomerName, COUNT(*) AS OrderCount
FROM orders
WHERE status = @status
GROUP BY customer_name
""",
new { status = "Completed" });
record OrderSummary(string CustomerName, long OrderCount);- .NET 8.0 or later
- Supported platforms:
- Linux (x64, arm64)
- Windows (x64)
- macOS (arm64)
- .NET 10.0 SDK or later (how to install: https://learn.microsoft.com/en-us/dotnet/core/install/)
- Rust 1.93+ (how to install: https://rustup.rs)
- Protobuf compiler
protoc(how to install: https://protobuf.dev/installation/)
-
Clone the repository:
git clone https://github.com/nazarii-piontko/datafusion-sharp.git cd datafusion-sharp -
Build the project:
dotnet build -c Release
This will automatically:
- Compile the Rust native library (via cargo)
- Build the .NET library
- Link the native library into the managed library
-
Run tests:
dotnet test -c Release
Full documentation is available at nazarii-piontko.github.io/datafusion-sharp.
- src/DataFusionSharp/ - Core .NET library with managed wrappers
- src/DataFusionSharp.Data/ - ADO.NET provider
- native/ - Rust FFI layer bridging .NET to Apache DataFusion
- tests/DataFusionSharp.Tests/ - Integration tests
- tests/DataFusionSharp.Benchmark/ - Performance benchmarks with native reference implementation
- examples/ - Example usage and sample data
Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.
DataFusionSharp is licensed under the Apache License 2.0. See LICENSE.txt for details.
This project contains bindings to Apache DataFusion, which is also licensed under Apache License 2.0. See NOTICE.txt for attribution details.
- Apache DataFusion - The underlying query engine
- Apache Arrow - Columnar memory format
- The Apache Software Foundation
- Apache DataFusion - Rust implementation
- datafusion-python - Python bindings
- datafusion-java - Java bindings
Apache®, Apache DataFusion™, Apache Arrow™, and the Apache feather logo are trademarks of The Apache Software Foundation.