Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/community/ai-tooling-policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Guidelines for AI-assisted Contributions

The Apache Wayang community welcomes the use of AI and generative tooling as part of the contribution process, provided contributors follow the guidelines below. These guidelines align with the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html).

1. **Verify licensing compliance.** AI-generated code may inadvertently reproduce copyrighted material. Before submitting, ensure the output does not include content that conflicts with the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0) or the [ASF 3rd Party Licensing Policy](https://www.apache.org/legal/resolved.html).

2. **Understand what you submit.** If you cannot explain why the code works, do not submit it. You are accountable for bugs, security issues, and license violations in your contribution.

3. **Disclose AI tool usage in commits.** When any part of a contribution was generated or significantly assisted by an AI tool, include a `Generated-by:` token in the commit message. For example:
```
Fix null pointer in JdbcExecutor

Generated-by: GitHub Copilot
```

4. **Keep PR discussions human.** When participating in PR discussions, e.g., code review comments, questions, clarifications, and responses, the content must be written by a human, not generated by an AI tool. If an AI tool (such as GitHub Copilot) posts a comment, it must be clearly attributed as such and not presented as the contributor's own words. This ensures that code review remains a genuine exchange between people, preserving the quality, accountability, and community trust that Apache Wayang depends on.

*These guidelines will be updated as AI tooling and the legal landscape around it continue to evolve. Questions or suggestions can be raised on the [dev mailing list](https://wayang.apache.org/docs/community/mailinglist).*
24 changes: 12 additions & 12 deletions docs/guide/getting-started.md → docs/guide/examples.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Getting started
sidebar_position: 2
id: getting-started
title: Installation and Examples
sidebar_position: 10
id: examples
---
<!--

Expand All @@ -23,10 +23,10 @@ id: getting-started
-->

## Requirements
Apache Wayang is built upon the foundations of Java 11 and Scala 2.12, providing a robust and versatile platform for data processing applications. If you intend to build Wayang from source, you will also need to have Apache Maven, the popular build automation tool, installed on your system. Additionally, be mindful that some of the processing platforms supported by Wayang may have their own specific installation requirements.
Apache Wayang is built upon the foundations of Java 11 and Scala 2.12, providing a robust and versatile platform for data processing applications. If you intend to build Wayang from source, you wi[...]

### Get Wayang
Apache Wayang is readily available through Maven Central, facilitating seamless integration into your development workflow. For instance, to utilize Wayang in your Maven-based project, simply add the following dependency to your project's POM file:
Apache Wayang is readily available through Maven Central, facilitating seamless integration into your development workflow. For instance, to utilize Wayang in your Maven-based project, simply add [...]
```xml
<dependency>
<groupId>org.apache.wayang</groupId>
Expand Down Expand Up @@ -69,7 +69,7 @@ If you need to rebuild Wayang, e.g., to use a different Scala version, you can s
```

### Configure Wayang
To enable Apache Wayang's smooth operation, you need to equip it with details about your processing platforms' capabilities and how to interact with them. A default configuration is available for initial testing, but creating a properties file is generally preferable for fine-tuning the configuration to suit your specific requirements. To harness this personalized configuration effortlessly, launch your application via
To enable Apache Wayang's smooth operation, you need to equip it with details about your processing platforms' capabilities and how to interact with them. A default configuration is available for [...]
```shell
$ java -Dwayang.configuration=url://to/my/wayang.properties ...
```
Expand All @@ -79,7 +79,7 @@ Essential configuration settings:
* `wayang.core.log.enabled (= true)`: whether to log execution statistics to allow learning better cardinality and cost estimators for the optimizer
* `wayang.core.log.executions (= ~/.wayang/executions.json)` where to log execution times of operator groups
* `wayang.core.log.cardinalities (= ~/.wayang/cardinalities.json)` where to log cardinality measurements
* `wayang.core.optimizer.instrumentation (= org.apache.wayang.core.profiling.OutboundInstrumentationStrategy)`: where to measure cardinalities in Wayang plans; other options are `org.apache.wayang.core.profiling.NoInstrumentationStrategy` and `org.apache.wayang.core.profiling.FullInstrumentationStrategy`
* `wayang.core.optimizer.instrumentation (= org.apache.wayang.core.profiling.OutboundInstrumentationStrategy)`: where to measure cardinalities in Wayang plans; other options are `org.apache.wa[...]
* `wayang.core.optimizer.reoptimize (= false)`: whether to progressively optimize Wayang plans
* `wayang.basic.tempdir (= file:///tmp)`: where to store temporary files, in particular for inter-platform communication
* Java Streams
Expand Down Expand Up @@ -109,10 +109,10 @@ Essential configuration settings:
* `wayang.postgres.cpu.mhz (= 2700)`: clock frequency of processor PostgreSQL runs on in MHz
* `wayang.postgres.cpu.cores (= 2)`: number of cores PostgreSQL runs on

To effectively define your applications with Apache Wayang, utilize its Scala or Java API, conveniently found within the `wayang-api` module. For clear illustrations, refer to the provided examples below.
To effectively define your applications with Apache Wayang, utilize its Scala or Java API, conveniently found within the `wayang-api` module. For clear illustrations, refer to the provided exampl[...]

## Cost Functions
Wayang provides a utility to learn cost functions from historical execution data. Specifically, Wayang can learn configurations for load profile estimators (that estimate CPU load, disk load etc.) for both operators and UDFs, as long as the configuration provides a template for those estimators.
Wayang provides a utility to learn cost functions from historical execution data. Specifically, Wayang can learn configurations for load profile estimators (that estimate CPU load, disk load etc.[...]

As an example, the `JavaMapOperator` draws its load profile estimator configuration via the configuration key `wayang.java.map.load`.
Now, it is possible to specify a load profile estimator template in the configuration under the key `<original key>.template`, e.g.:
Expand All @@ -122,7 +122,7 @@ wayang.java.map.load.template = {\
"cpu":"?*in0"\
}
```
This template encapsulates a load profile estimator that requires at minimum one input cardinality and one output cardinality. Furthermore, it simulates CPU load by assuming a direct relationship with the input cardinality. However, more complex functions are possible.
This template encapsulates a load profile estimator that requires at minimum one input cardinality and one output cardinality. Furthermore, it simulates CPU load by assuming a direct relationship[...]

In particular, you can use
* the variables `in0`, `in1`, ... and `out0`, `out1`, ... to incorporate the input and output cardinalities, respectively;
Expand All @@ -131,12 +131,12 @@ In particular, you can use
* the functions `min(x0, x1, ...))`, `max(x0, x1, ...)`, `abs(x)`, `log(x, base)`, `ln(x)`, `ld(x)`;
* and the constants `e` and `pi`.

While Apache Wayang provides templates for all execution operators, you will need to explicitly define your user-defined functions (UDFs) by specifying their cost functions, which are based on configuration parameters. This involves creating an initial specification and template for each UDF.
While Apache Wayang provides templates for all execution operators, you will need to explicitly define your user-defined functions (UDFs) by specifying their cost functions, which are based on co[...]
As soon as execution data has been collected, you can initiate:
```shell
java ... org.apache.wayang.profiler.ga.GeneticOptimizerApp [configuration URL [execution log]]
```
This tool will attempt to determine suitable values for the question marks (`?`) within the load profile estimator templates, aligning them with the collected execution data and pre-defined configuration entries for the load profile estimators. These optimized values can then be directly incorporated into your configuration.
This tool will attempt to determine suitable values for the question marks (`?`) within the load profile estimator templates, aligning them with the collected execution data and pre-defined confi[...]

## Examples

Expand Down
Loading
Loading