Skip to content

nicopon/datafusion-sharp

 
 

Repository files navigation

DataFusionSharp

CI .NET Rust License

.NET bindings for Apache DataFusion, a fast, extensible query engine built on Apache Arrow for high-performance analytical query processing.

Note: This is an independent community project and is not officially associated with or endorsed by the Apache Software Foundation or the Apache DataFusion project.

Features

Component Feature Status Notes
Runtime Tokio runtime Configurable threads, supports multiple instances
Logger Configurable with log levels
Session Create session context
Execute SQL queries Returns DataFrame, supports parameters
Data Sources CSV (read/write) RegisterCsvAsync
Parquet (read/write) RegisterParquetAsync
JSON (read/write) RegisterJsonAsync
RecordBatch tables RegisterBatch
Object Store Local filesystem
Amazon S3
Azure Blob Storage
Google Cloud Storage
InMemory
DataFrame Count rows CountAsync()
Get schema GetSchema() → Arrow Schema
Collect all data CollectAsync() → RecordBatches
Stream results ExecuteStreamAsync() → IAsyncEnumerable
Show/print ShowAsync(), ToStringAsync()
Arrow Apache Arrow support Via Apache.Arrow nuget package
Zero copy support
ADO.NET ADO.NET interface ADO.NET interface to support libs like Dapper
Advanced UDF registration
Catalog management
Table providers
Platforms Linux x64
Linux arm64
Windows x64
macOS arm64

✅ Implemented 🟡 Partially implemented ❌ Not yet implemented

Installation

dotnet add package DataFusionSharp

For ADO.NET integration, also install the companion package:

dotnet add package DataFusionSharp.Data

Quick Start

using DataFusionSharp;

// Create runtime, which manages Tokio runtime and native resources
using var runtime = DataFusionRuntime.Create();

// Create session context, which manages query execution and state
using var context = runtime.CreateSessionContext();

// Register a CSV file as a table (supports CSV, Parquet, JSONL)
await context.RegisterCsvAsync("orders", "path/to/orders.csv");
// await context.RegisterParquetAsync("orders", "path/to/orders.parquet");
// await context.RegisterJsonAsync("orders", "path/to/orders.json");

// Execute SQL query
using var df = await context.SqlAsync(
    """
    SELECT customer_id, sum(amount) AS total
    FROM orders
    GROUP BY customer_id
    """);

// Display results to console
await df.ShowAsync();

// Access schema
var schema = df.GetSchema();
foreach (var field in schema.FieldsList)
    ... // Process schema field (name, type, etc.)

// Collect as Arrow batches
using var collectedData = await df.CollectAsync();
foreach (var batch in collectedData.Batches)
    ... // Process Arrow RecordBatch...

// Collect as stream of Arrow batches
using var stream = await df.ExecuteStreamAsync();
await foreach (var batch in stream)
    ... // Process streamed RecordBatch...

See examples/ for more details.

ADO.NET / Dapper

using DataFusionSharp.Data;
using Dapper;

// Wrap any SessionContext as a standard DbConnection
await using var connection = session.AsConnection();

// Use Dapper (or any ADO.NET library) as usual
var results = await connection.QueryAsync<OrderSummary>(
    """
    SELECT customer_name AS CustomerName, COUNT(*) AS OrderCount
    FROM orders
    WHERE status = @status
    GROUP BY customer_name
    """,
    new { status = "Completed" });

record OrderSummary(string CustomerName, long OrderCount);

Requirements

  • .NET 8.0 or later
  • Supported platforms:
    • Linux (x64, arm64)
    • Windows (x64)
    • macOS (arm64)

Building from Source

Prerequisites

Build Steps

  1. Clone the repository:

    git clone https://github.com/nazarii-piontko/datafusion-sharp.git
    cd datafusion-sharp
  2. Build the project:

    dotnet build -c Release

    This will automatically:

    • Compile the Rust native library (via cargo)
    • Build the .NET library
    • Link the native library into the managed library
  3. Run tests:

    dotnet test -c Release

Documentation

Full documentation is available at nazarii-piontko.github.io/datafusion-sharp.

Project Structure

SonarQube

Security Rating Reliability Rating Maintainability Rating Technical Debt

Bugs Vulnerabilities Code Smells

Coverage

Lines of Code

Contributing

Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.

License

DataFusionSharp is licensed under the Apache License 2.0. See LICENSE.txt for details.

This project contains bindings to Apache DataFusion, which is also licensed under Apache License 2.0. See NOTICE.txt for attribution details.

Acknowledgments

Related Projects


Apache®, Apache DataFusion™, Apache Arrow™, and the Apache feather logo are trademarks of The Apache Software Foundation.

About

.NET bindings for Apache DataFusion query engine - execute blazing-fast SQL queries on Parquet, CSV, and JSON with Apache Arrow

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C# 84.2%
  • Rust 15.8%