I've been looking through how we generate the data sharing clauses, and what we'd need to be able to do to do it at transformation time instead of lowering time, and I think most of it seems to be quite straightforward (with the caveat we are trusting the users to not modify the tree to result in wrong behaviour, but we've said we're ok with that).
I haven't looked if transformations do anything special for these, only at what happens at lowering as these are probably what we'd want to change.
First up, the ACC directives don't use any of the logic yet, but implementing their own logic should be easy to keep separate whilst using the data_sharing_attribute_mixin without changes - this simply returns lists of private, firstprivate and need_sync variables, so we'd just need to implement something that uses these results according to the OpenACC rules.
OMPLoopDirective doesn't currently do any of the clause generation. It in theory supports private, lastprivate and reduction clauses, though with how we currently use it for GPUs, I think we're better to keep to the default declaration behaviour and enable automatic atomic generation when we're ready.
OMPDoDirective doesn't generate any private clauses (that I can see) at lowering, but it does automatic DSL-related kernel reduction generation, and those override any other reductions created (which I suspect is fine). I'll come back to these later.
OMPTaskTrans does a few failure checks to do with calls/kerns that aren't inline, and then generates its clauses, but these can be done at transformation time if we're saying changes to the tree are user issues.
OMPParallelDirective does the most stuff in lowering:
- Multiple reductions on same variable - disallowed
- Check that if we have reductions that all children of the schedule are the same type - this is one limitation that I don't understand and could limit MaximalParallelRegionTrans. @sergisiso @arporter Do either of you know why we need this check? Could we be more precise with checking if any of the reductions are in non OMPDoDirective/OMPLoopDirective children?
- Generate reproducible reductions before inferring data sharing attributes
- Generate data sharing clauses
- Lowering
- Add LFRIC/DSL reductions
- Reproducible sum loop addition.
As far as I can tell, the main limitation on doing all of this at transformation time vs lowering time is the DSL reductions. The code explicitly says we have to do this post lowering, which means we can't do the reproducible sum loops until after lowering too. Since this is done after everything else, I suspect we can just leave that until lowering and do the data sharing clauses separately?
Edit: The only other thing is what to do if user's (or something else) doesn't use the transformation to create the nodes - I think I just make a standalone call to do this in the directive and its up to the user/developer to make sure they call the relevant code when finished with the creation and addition of the nodes.
I've been looking through how we generate the data sharing clauses, and what we'd need to be able to do to do it at transformation time instead of lowering time, and I think most of it seems to be quite straightforward (with the caveat we are trusting the users to not modify the tree to result in wrong behaviour, but we've said we're ok with that).
I haven't looked if transformations do anything special for these, only at what happens at lowering as these are probably what we'd want to change.
First up, the ACC directives don't use any of the logic yet, but implementing their own logic should be easy to keep separate whilst using the data_sharing_attribute_mixin without changes - this simply returns lists of
private, firstprivate and need_syncvariables, so we'd just need to implement something that uses these results according to the OpenACC rules.OMPLoopDirective doesn't currently do any of the clause generation. It in theory supports private, lastprivate and reduction clauses, though with how we currently use it for GPUs, I think we're better to keep to the default declaration behaviour and enable automatic atomic generation when we're ready.
OMPDoDirective doesn't generate any private clauses (that I can see) at lowering, but it does automatic DSL-related kernel reduction generation, and those override any other reductions created (which I suspect is fine). I'll come back to these later.
OMPTaskTrans does a few failure checks to do with calls/kerns that aren't inline, and then generates its clauses, but these can be done at transformation time if we're saying changes to the tree are user issues.
OMPParallelDirective does the most stuff in lowering:
As far as I can tell, the main limitation on doing all of this at transformation time vs lowering time is the DSL reductions. The code explicitly says we have to do this post lowering, which means we can't do the reproducible sum loops until after lowering too. Since this is done after everything else, I suspect we can just leave that until lowering and do the data sharing clauses separately?
Edit: The only other thing is what to do if user's (or something else) doesn't use the transformation to create the nodes - I think I just make a standalone call to do this in the directive and its up to the user/developer to make sure they call the relevant code when finished with the creation and addition of the nodes.