Job configuration for Galaxy 19.01 or higher¶
Simple configuration¶
The following is a simple job configuration sample that you can use to get started.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | <?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
</plugins>
<destinations default="galaxycloudrunner">
<destination id="local" runner="local"/>
<destination id="galaxycloudrunner" runner="dynamic">
<param id="type">python</param>
<param id="function">cloudlaunch_pulsar_burst</param>
<param id="rules_module">galaxycloudrunner.rules</param>
<param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
<!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
<param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
<!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
<param id="pulsar_runner_id">pulsar</param>
<!-- Destination to fallback to if no nodes are available -->
<param id="fallback_destination_id">local</param>
<!-- Pick next available server and resubmit if an unknown error occurs -->
<resubmit condition="unknown_error and attempt <= 3" destination="galaxycloudrunner" />
</destination>
</destinations>
<tools>
<tool id="upload1" destination="local"/>
</tools>
</job_conf>
|
In this simple configuration, all jobs are routed to GalaxyCloudRunner by default. This works as follows:
- If a Pulsar node is available, it will return that node.
- If multiple Pulsar nodes are available, they will be returned in a round-robin loop.
- You can add or remove Pulsar nodes at any time. However, there’s a caching period (currently 5 minutes) to avoid repeatedly querying the server, which will result in a short period of time before the change is detected by the GalaxyCloudRunner. This has implications for node addition and in particular removal. When adding a node, there could be a delay of a few minutes before the node is picked up. If a Pulsar node is removed, your jobs may be routed to a dead node for the duration of the caching period. Therefore, we recommend attempting a job resubmission through the resubmit tag as shown in the example. See Additional Configuration and Limitations on how to change this cache period.
- If no node is available, it will return the
fallback_destination_id
, if specified, in which case the job will be routed there. If nofallback_destination_id
is specified, the job will be re-queued till a node becomes available.
To burst or not to burst?¶
In the above example, all jobs are routed to the GalaxyCloudRunner by default. However, it is often the case that jobs should be routed to the remote cloud nodes only if the local queue is full. To support this scenario, we recommend a configuration like the following.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | <?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
<plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
</plugins>
<destinations default="burst_if_queued">
<destination id="local" runner="local"/>
<destination id="burst_if_queued" runner="dynamic">
<param id="type">burst</param>
<param id="from_destination_ids">local,drmaa</param>
<param id="to_destination_id">galaxycloudrunner</param>
<param id="num_jobs">2</param>
<param id="job_states">queued</param>
</destination>
<destination id="galaxycloudrunner" runner="dynamic">
<param id="type">python</param>
<param id="function">cloudlaunch_pulsar_burst</param>
<param id="rules_module">galaxycloudrunner.rules</param>
<param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
<!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
<param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
<!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
<param id="pulsar_runner_id">pulsar</param>
<!-- Destination to fallback to if no nodes are available -->
<param id="fallback_destination_id">local</param>
<!-- Pick next available server and resubmit if an unknown error occurs -->
<resubmit condition="unknown_error and attempt <= 3" destination="galaxycloudrunner" />
</destination>
</destinations>
<tools>
<tool id="upload1" destination="local"/>
</tools>
</job_conf>
|
Note the emphasized lines. In this example, we route to the built-in rule
burst_if_queued
first, which determines whether or not the cloud bursting
should occur. It examines how many jobs in the
from_destination_ids
are in the given state (queued
in this case),
and if there are above num_jobs
, routes to the
to the to_destination_id
destination (galaxycloudrunner
in this case).
If bursting should not occur, it routes
to the first destination in the from_destination_ids
list. This provides a
simple method to scale to Pulsar nodes only if a desired queue has a backlog
of jobs. You may need to experiment with these values to find ones that work
best for your requirements.
Advanced bursting¶
In this final example, we show how a complex chain of rules can be used to exert fine-grained control over the job routing process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | <?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
<plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
</plugins>
<destinations default="burst_if_queued">
<destination id="local" runner="local"/>
<destination id="burst_if_queued" runner="dynamic">
<param id="type">burst</param>
<param id="from_destination_ids">local,drmaa</param>
<param id="to_destination_id">burst_if_size</param>
<param id="num_jobs">2</param>
<param id="job_states">queued</param>
</destination>
<destination id="burst_if_size" runner="dynamic">
<param id="type">python</param>
<param id="function">to_destination_if_size</param>
<param id="rules_module">galaxycloudrunner.rules</param>
<param id="max_size">1g</param>
<param id="to_destination_id">galaxycloudrunner</param>
<param id="fallback_destination_id">local</param>
</destination>
<destination id="galaxycloudrunner" runner="dynamic">
<param id="type">python</param>
<param id="function">cloudlaunch_pulsar_burst</param>
<param id="rules_module">galaxycloudrunner.rules</param>
<param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
<!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
<param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
<!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
<param id="pulsar_runner_id">pulsar</param>
<!-- Destination to fallback to if no nodes are available -->
<param id="fallback_destination_id">local</param>
<!-- Pick next available server and resubmit if an unknown error occurs -->
<resubmit condition="unknown_error and attempt <= 3" destination="galaxycloudrunner" />
</destination>
</destinations>
<tools>
<tool id="upload1" destination="local"/>
</tools>
</job_conf>
|
Jobs are first routed to the built-in burst_if_queued
rule, which determines
whether the bursting should occur. If it should, it is then routed to the
burst_if_size
destination, which will check the total size of the input
files. If they are less than 1GB, they are routed to the galaxycloudrRunner
destination. If not, they are routed to a local queue.
Job configuration for Galaxy versions lower than 19.01¶
Simple configuration¶
The following is a simple job configuration sample that you can use to get started.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | <?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
</plugins>
<destinations default="galaxycloudrunner">
<destination id="local" runner="local"/>
<destination id="galaxycloudrunner" runner="dynamic">
<param id="type">python</param>
<param id="function">cloudlaunch_pulsar_burst_compat</param>
<param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
<!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
<param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
<!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
<param id="pulsar_runner_id">pulsar</param>
<!-- Destination to fallback to if no nodes are available -->
<param id="pulsar_fallback_destination_id">local</param>
</destination>
</destinations>
<tools>
<tool id="upload1" destination="local"/>
</tools>
</job_conf>
|
In this simple configuration, all jobs are routed to GalaxyCloudRunner by default. This works as follows:
- If a Pulsar node is available, it will return that node.
- If multiple Pulsar nodes are available, they will be returned in a round-robin loop.
- You can add or remove Pulsar nodes at any time. However, there’s a caching period (currently 5 minutes) to avoid repeatedly querying the server, that will result in a short period of time before the change is detected by the GalaxyCloudRunner. This has implications for node addition and in particular removal. When adding a node, there could be a delay of a few minutes before the node is picked up. If a Pulsar node is removed, your jobs may be routed to a dead node for the duration of the caching period. Therefore, we recommend a job resubmission through a resubmit tag. However, Galaxy versions prior to 19.01 do not support resubmissions for Pulsar, and you may need to change the cache period to zero to handle this scenario. See Additional Configuration and Limitations on how to change this cache period.
- If no node is available, it will return the
fallback_destination_id
, if specified, in which case the job will be routed there. If nofallback_destination_id
is specified, the job will be re-queued till a node becomes available.
Note that you must manually add the galaxy rule as described here: Configuring Galaxy versions lower than 19.01
To burst or not to burst?¶
In the above example, all jobs are routed to the GalaxyCloudRunner by default. However, it is often the case that jobs should be routed to the remote cloud nodes only if the local queue is full. To support this scenario, we recommend a configuration like the following.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | <?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
<plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
</plugins>
<destinations default="galaxycloudrunner">
<destination id="local" runner="local"/>
<destination id="galaxycloudrunner" runner="dynamic">
<param id="type">python</param>
<param id="function">cloudlaunch_pulsar_burst_compat</param>
<param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
<!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
<param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
<!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
<param id="pulsar_runner_id">pulsar</param>
<!-- Destination to fallback to if no nodes are available -->
<param id="pulsar_fallback_destination_id">local</param>
<param id="burst_enabled">true</param>
<param id="burst_from_destination_ids">local,drmaa</param>
<param id="burst_num_jobs">2</param>
<param id="burst_job_states">queued</param>
</destination>
</destinations>
<tools>
<tool id="upload1" destination="local"/>
</tools>
</job_conf>
|
Galaxy versions prior to 19.01 do not support chaining dynamic rules, and therefore, we have provided a single monolithic rule that can handle both scenarios.
Note the burst_enabled
flag, which will activate the bursting rule.
This rule will determine whether or not the cloud bursting
should occur. It examines how many jobs in the
burst_from_destinations
are in the given state (queued
in this case),
and bursts to pulsar only if they are above burst_num_jobs
. If bursting
should not occur, it routes to the first destination in the
from_destinations
list. This provides a simple method to scale to Pulsar
nodes only if a desired queue has a backlog of jobs. You may need to
experiment with these values to find ones that work best for your requirements.
Advanced bursting¶
In this final example, we expand this compound rule to also filter jobs by size.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | <?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
<plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
</plugins>
<destinations default="galaxycloudrunner">
<destination id="local" runner="local"/>
<destination id="galaxycloudrunner" runner="dynamic">
<param id="type">python</param>
<param id="function">cloudlaunch_pulsar_burst_compat</param>
<param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
<!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
<param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
<!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
<param id="pulsar_runner_id">pulsar</param>
<!-- Destination to fallback to if no nodes are available -->
<param id="pulsar_fallback_destination_id">local</param>
<param id="burst_enabled">true</param>
<param id="burst_from_destination_ids">local,drmaa</param>
<param id="burst_num_jobs">2</param>
<param id="burst_job_states">queued</param>
<param id="dest_if_size_enabled">true</param>
<param id="dest_if_size_max_size">1g</param>
<param id="dest_if_size_fallback_destination_id">local</param>
</destination>
</destinations>
<tools>
<tool id="upload1" destination="local"/>
</tools>
</job_conf>
|
Enable the dest_if_size_enabled
flag as highlighted to filter by size.
This will make sure that the job is routed to Pulsar only if the total size of
the input files are less than 1GB. If not, they are routed to
dest_if_size_fallback_destination_id
, which in this case, is a local queue.