Wednesday, 29 May 2019

Fixing dataset download problems for uWSGI+nginx Galaxy configuration

We recently experienced problems downloading datasets via a web browser from one of our local Galaxy instances, which runs release 18.09 and uses a uWSGI+nginx configuration.

While small files (e.g. of the order of Mb) downloaded without problems, larger files (e.g. of the order of Gb) would fail with a dialog box appearing in the user's web browser complaining that "the source file can't be read". (The Galaxy logs also reported an IOError from uwsgi_response_write_body_do() function.)

The initial problem seemed to be with the temporary directory being used for managing the download on the server. Explicitly setting uwsgi_temp_path in the nginx configuration seemed to help, for example:

uwsgi_temp_path /tmp/uwsgi;

This got rid of the dialog box but the larger downloads still failed without completing. Although the user's browser didn't give any more information, the Galaxy logs now reported a timeout error. To address this we explicitly set the UWSGI timeout limits in the nginx configuration, e.g.:

uwsgi_read_timeout 600s;
uwsgi_write_timeout 600s;

The choice of 600s (10 minutes) was arbitrary but seemed long enough to allow the downloads to complete.

Finally as the temporary area on server is quite small, we also explicitly set the maximum size of temporary files to 1Mb:

uwsgi_max_temp_file_size 1024k;

Together these addressed the download problem in our local instance.

Thursday, 30 August 2018

Mercurial-based tool installation issues in Galaxy 18.05

Recently I've encountered a subtle problem with tool installation after upgrading our local production instances to Galaxy release 18.05, which I'd like to document here in case it comes up again in future.

The problem manifests itself when attempting to install a tool from the main toolshed via the admin interface: after clicking Install, the tool installation status goes almost immediately to Error. Further inspection reveals that the tool repository hasn't been cloned to the local filesystem, and no dependencies are installed.

Frustratingly this fails to leave any error messages in the logs which might help to diagnose the cause. However attempting the install via the Galaxy API (using nebulizer) did return an error message:

Error cloning repository: [Errno 2] No such file or directory

I was able to track this down to the clone_repository function in  lib/tool_shed/util/, where it is issued when something goes wrong with the hg clone ... command used in the tool installation process. hg is the name of the Mercurial version control command, and essentially the problem was that this command couldn't be found by Galaxy.

Our local Galaxy installations are configured to use supervisor with uWSGI, with the Galaxy dependencies installed into a Python virtualenv. Since this virtualenv included Mercurial, I wondered why hg wasn't being picked up from there for the tool installation process.

Marius van den Beek offered some helpful insights via the galaxy-dev mailing list which clarified the situation:
Recent galaxy releases are using the `hg` command that should be automatically installed along with other galaxy dependencies.
If you're running galaxy in a virtualenv then that virtualenv should have the `hg` script in the bin folder.
Depending on how you start galaxy you may need to add the virtualenv's `bin` folder to the `PATH`.
Based on this it turned out that I needed to add an 'environment' parameter to the supervisor.ini for Galaxy file, which to specify the virtualenv to use and add its bin directory to the PATH - something like:

environment = VIRTUAL_ENV="/srv/galaxy/venv",PATH="/srv/galaxy/venv/bin:%(ENV_PATH)s"

(This parameter is mentioned in the installation documentation, in the Scaling and Load Balancing section, but only for configuring handler processes. However since our instances are using the uwsgi + mules strategy, it didn't occur to me that it would still be needed.)

Restarting Galaxy with the updated supervisor.ini file enabled tool installation to work without problems again.

Some closing asides:

  • The problem can be masked if Mercurial is installed elsewhere on the system and is on the Galaxy user's PATH (for example /usr/bin/hg)
  • If there is a system version but it is very old (for example Scientific Linux 7 has Mercurial 1.7) then it can cause a slightly different error in clone_repository, but the outcome and fix should be the same as above
  • Since first encountering this issue I've come across a strange variant, whereby Mercurial is installed in the Galaxy virtualenv and supervisor is correctly configured but the tool installations still fail immediately. In this case for some unknown reason it turned out that the hg script in the virtualenv wasn't executable - adding 'execute' permission fixed this one.

Tuesday, 25 April 2017

Securing Galaxy with HTTPS running with Nginx using Let’s Encrypt


To secure communication between a Galaxy instance and its users it is best to enable HTTPS on the Galaxy web server, to ensure that all data transmissions between Galaxy and the end user (including sensitive information such as usernames and passwords) are encrypted. This can done by obtaining and installing SSL/TLS certificates on the server.

The simplest approach in the past was to use self-signed certificates as a way to enable HTTPS while avoiding the cost of purchasing certificates from a commercial Certificate Authority (CA) (for example by using the make-dummy-certs utility found in e.g. /usr/ssl/certs). The downside of this approach is that when a user first tries to access the server their web browser will complain that the certificates are not trusted, and they would typically have to create a one-off security exception before they can access the Galaxy service.

More recently however, a free Certificate Authority called Let’s Encrypt ( been set up which issues free certificates as part of its stated mission to “secure the web”. This blog post gives an overview of how we obtained and installed certificates from Let's Encrypt to enable HTTPS for our production Galaxy instances, using their automated cert-bot client utility.

Before beginning

The procedure described below uses the 'webroot' plugin of cert-bot (see, which is a general method recommended for obtaining certificates web servers running nginx. cert-bot also has a plugin for nginx but at the time of writing this is still at alpha-release stage so I didn't use it for our Galaxy servers (see for more details).

For Apache-based servers you can use a dedicated plugin described at, which offers a more automated procedure than the one described here.

Also, although it targets a different operating system to ours and while many of the details are now out-of-date, DigitalOcean's how-to guide at is still a useful resource and was immensely helpful to me for understanding the overall process.

Finally, please note that the procedure and its details are likely to change over time. Make sure you check the documentation before carrying out any of these operations on your own infrastructure!

Step 1: Install cert-bot (Let’s Encrypt client) on the server

To begin you need to ensure the Let's Encrypt cert-bot utility ( is available on the server, to perform the job of obtaining and installing the certificates.

The documentation recommends that if possible you should use the cert-bot package provided by the package manager for your system (e.g. yum, apt etc). However if one isn't available (or is unsuitable e.g. because it's out-of-date) then you can install the client using the certbot-auto wrapper script instead (see This is the approach I used, putting certbot-auto into /usr/local/bin on the server running Galaxy and nginx.

(Note that certbot-auto takes the same arguments as the cert-bot utility, the only difference is that if necessary it will download and update itself first each time it's run.)

  • Aside: there is also a cert-bot package available via the Python Package Index (PyPI). When I first performed this procedure I noted that the documentation emphasised that cert-bot should not be installed 'pip install', but now I can't find any reference to this. However I would still avoid installing from PyPI for the time being.

Step2: Get certificates using the 'webroot' method

cert-bot provides a number of different ways to obtain certificates depending on the webserver software being used. The 'webroot' protocol used here is less automated than some of the other procedures but is still quite straightforward, and works by placing a special file on your webserver which Let's Encrypt can attempt to fetch in order to verify the server name and details that are supplied when the cert-bot client is run.

First we need to set up a special directory called .well-known, where Let's Encrypt will place its file:
  • Create a directory called .well-known in the document root of the server (the default for nginx is /usr/share/nginx/hmtl but the actual path can be found by looking up the value of webroot-path in the server configuration), e.g.:

    mkdir /usr/share/nginx/hmtl/.well-known

    Optionally also add a dummy index file to help check that the directory is visible via web browser later, e.g.:

    cat >/usr/share/nginx/hmtl/.well-known/index.html <<EOF
    Hello world!
  • Add a new location block inside the server block in the nginx configuration file, to allow access to the .well-known directory:

    location ~ /.well-known {
        allow all;
  • Restart nginx and check that the .well-known directory is visible (e.g. by pointing a web browser at it)
Then we need to run certbot-auto (or cert-bot) interactively to generate and install the certificates:
  • sudo certbot-auto certonly --webroot -w /usr/share/nginx/html -d MYDOMAIN
where MYDOMAIN is the domain name of your Galaxy server (e.g. "").

  • Aside: note that this bootstraps certbot, including checking for the system packages that it requires; you'll be prompted to install any that it thinks are missing via the system package manager e.g. yum.

certbot will then prompt you to agree to Let's Encrypt's terms and conditions and ask you to provide an email address which will be used for notices and for lost key recovery.

If all goes well then this should produce a set of certificate files under /etc/letsencrypt/archive (with links to these from /etc/letsencrypt/live/):
  • cert.perm (your domain's certificate)
  • chain.pem (the Let's Encrypt chain certificate)
  • fullchain.pem (cert.pem and chain.pem combined)
  • privkey.pem (your certificate's private key)
  • IMPORTANT: you should ensure that the certificate files are backed up to a secure and safe location!
Step 3: configure TLS/SSL on nginx using the certificates

To enable HTTPS we need to configure nginx to listen on port 443 with SSL enabled, and to use the certificates from Let's Encrypt. This is done by adding the following to the server block in the nginx configuration file, for example:

server {
    listen 443;
    ssl on;
    ssl_certificate /etc/letsencrypt/live/MYDOMAIN/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/MYDOMAIN/privkey.pem;

(Again, your actual domain name should be substituted for MYDOMAIN above.)

It's also a good idea to block or redirect HTTP traffic, so that users don't accidentally send data via an insecure connection - for example to redirect to :

server {
    listen 80 default;
    server_name MYDOMAIN;
    rewrite ^ https://$server_name$request_uri? permanent;

Once nginx is restarted you can check using your browser that HTTPS is working for your Galaxy instance; you can also use the Qualys SSL Labs website to check your server configuration:

(NB this is useful for flagging up other issues which you might wish to address!)

Step 4: set up automated certificate renewal

Finally: since all certificates issued by Let's Encrypt expire after 90 days, they recommend that they should be renewed at least once every 3 months.

It's straightforward to automate this process by setting up a cron job on the server to run cert-bot or certbot-auto's 'renew' command (which will renew any previously-obtained certificates that are due to expire in less than 30 days) and then restart nginx (so that any renewed certificates will be loaded).

For example I have the following commands in the root crontab on our server:

# Check SSL certificate renewal from Let's Encrypt

30 2 * * 1 /usr/local/bin/certbot-auto renew >> /var/log/le-renew
# Restart nginx after SSL certificate renewal
35 2 * * 1 service nginx restart >> /var/log/le-renew

See the documentation at for more information on certificate renewal.

Update 22nd October 2018: the original crontab lines above didn't work for me - the certificate renewals would fail and have to be performed manually, resulting in downtime for the period when nginx would no longer have valid SSL certificates.

Since then I've replaced the original crontab lines with the following single line:

30 2 * * 1 /usr/local/bin/certbot-auto renew --deploy-hook "/sbin/service nginx reload" >> /var/log/le-renew

which uses certbot-auto's --deploy-hook option to reload the nginx configuration on successful certificate renewal via the service command. Note that the full path to service is required as cron jobs have a minimal PATH which doesn't seem to include /sbin.

Tuesday, 2 June 2015

Exposing Galaxy reports via nginx in a production instance

Galaxy includes a report tool that is separate from the main process but which gives lots of potentially useful information about the usage of a Galaxy instance, for example the numbers of jobs that have been run each month, how much disk space each user is currently consuming and so on.

However there doesn't appear to be much documentation about the report tool on the official Galaxy wiki: the most I could find was a rather sparse page at, which gives a very bare bones overview, and doesn't include any information on how it might be exposed in a secure manner in a production environment. Therefore in this post I outline how I've done this for our local Galaxy set up, which uses nginx; however I imagine it could be adapted to work with Apache instead.

1. Set up up the report tool to run on localhost

The report tool takes its configuration settings from a file called reports_wsgi.ini, which is located in the config subdirectory of the Galaxy distribution.

Configuring the reports for your local setup is a case of:
  • Making a copy of reports_wsgi.ini.sample called reports_wsgi.ini
  • Editing the database_connection and file_path (if not the default) parameters to match those in your galaxy.ini (or universe_wsgi.ini) file
  • Optionally, editing the port parameter (by default the tool uses port 9001)
  • You should also set the 'salt' parameter session_secret if you intend to expose the reports via the web proxy (see below)
Then you can start the report server using


and view the reports by pointing a web browser running on the same server to

If you'd like the report tool to persist between sessions then use

sh --daemon

to run it as a background process. As with Galaxy itself, use --stop-daemon to halt the background process. (The log file is written to reports_webapp.log if you need to try and debug a problem.)

2. Expose the report tool via nginx

If you're running a production Galaxy and want to be able access the reports from a browser running on a different machine to your Galaxy server then you can could consider using SSH tunnelling, which essentially forwards a port on your local machine to one on the server i.e. port 9001 where the report tool is serving from (see "SSH Tunneling Made Easy" at for more details of how to do this).

Alternatively if you are using a web proxy (as is standard for a production setup) then you could try serving the reports also via the proxy (in this case nginx). In this example I assume that if Galaxy is being served from e.g. then the reports will be viewed via

First, make the appropriate edits to reports_wsgi.ini: if you have an older Galaxy instance then you'll need to add some sections to the file, specifically:

use = egg:PasteDeploy#prefix
prefix = /reports

(before the [app:main] section), and

filter-with = proxy-prefix
cookie_path = /reports

(within the [app:main] section.)

For more recent latest Galaxy instances it's simply a case of making sure that the existing filter-with and cookie_path lines are uncommented and set to the values above.

Next it's necessary to add upstream and location sections in your nginx.conf file:

(This has many similarities to serving Galaxy from a subdirectory via nginx proxy at a subdirectory, see

One important thing to be aware of is that the report tool doesn't include any built-in authentication, so it's recommended that you add some authentication within the web proxy. Otherwise anyone in the world could potentially access the reports for your server and see sensitive information such as user login names.

To do this with nginx, first create a htpasswd file to hold a set of user names and associated passwords, using the htpassword utility, e.g.:

htpasswd -c /etc/nginx/galaxy-reports.htpasswd admin

-c means create a new file (in this case /etc/nginx/galaxy-reports.htpasswd); admin is the username to add. The program will prompt for a password for that username, and store it in the file. You can use any username, and any filename or location (with the caveat that it must be readable by the nginx process) that you wish.

Finally to associate the password file with the reports location update the nginx config file appropriately by adding two more lines:

(I found this article very helpful here; note that it also works for https in spite of the title: "How to set up http authentication with nginx on Ubuntu 12.10"

Once nginx has been restarted then anyone attempting to view the reports at will be prompted to enter a username/password combination matching an entry in the htpasswd file before they are given access. Authorised users can then peruse the reports to their heart's content.

Wednesday, 22 April 2015

Using GALAXY_SLOTS with multithreaded Galaxy tools

GALAXY_SLOTS is a useful but not particularly well-publicised way of controlling the number of threads Galaxy allocates to a tool that supports multithreaded operation. It's relevant to both Galaxy admins (who need to ensure that multithreaded jobs don't try to consume more resources than they have access to) and to tool developers (who need to know how many threads are available to a tool at runtime).

Having seen various references to GALAXY_SLOTS on the developer's mailing list I'd assumed this was some esoteric feature that I would need to set up to use, but in actual fact it's almost embarrassingly simple for most cases. Essentially it can be thought of as an internal variable that's set by Galaxy when it starts a job, which indicates the number of threads that are available for that job and which can subsequently be accessed by a tool in order to make use of that number of threads.

The official documentation can be found here:
and this covers the essential details, but the executive summary is:
  • Tool developers should use GALAXY_SLOTS when specifying the number of threads a tool should run with;
  • Galaxy admins shouldn't need to configure anything unless they're using the local runner, or (possibly) a novel cluster submission system.
And really, that's it. However the following sections give a bit more detail for those who like to have it spelled out (like me).

For tool developers

All that is required for tool developers is to specific GALAXY_SLOTS in the <command> tag in the tool XML wrapper, when setting the number of threads the tool uses.

The syntax for specifying the variable is:


where N is the default value to use if GALAXY_SLOTS is not set. (See the "Tool XML File syntax" documentation for the tag at for more details - you need to scroll down to the section on "Reserved Variables" to find it.)

For example, here's a code fragment from the XML wrapper from a tool to run the Trimmomatic program:

The number of threads defaults to 6 unless GALAXY_SLOTS is explicitly set.

(Aside: the Trimmomatic tool itself can be obtained from the toolshed at

For Galaxy Admins

It turns out that generally there is nothing special to do for most cluster systems, although this is not immediately clear from the documentation: in most cases GALAXY_SLOTS is handled automagically and so doesn't require any explicit configuration.

For example for DRMAA (which is what we're using locally), we have job runners defined in our job_conf.xml file like:

In our set up, -pe 4 above requests 4 cores for the job. When using this runner, Galaxy will automagically determine the number of cores from DRMAA (i.e. 4) and set GALAXY_SLOTS to the appropriate value - nothing more to do.

The most obvious exception is the "local" job runner, where you need to explicitly set the number of available slots using the <param id="local_slots"> tag in job_conf.xml; see for more details.

Finally, for other job submission systems see the documentation on how to verify that the environment is being set correctly.

Wednesday, 18 March 2015

Installing ANNOVAR in Galaxy

ANNOVAR ( is a popular tool used to functionally annotate genetic variants detected from various genomes. The Galaxy toolshed includes a tool called table_annovar which can be used to run ANNOVAR. Installation of the tool into a local Galaxy instance is not fully automated and requires some manual steps which are sketched in the tool's README; this post expands on those basic instructions to hopefully make the process easier for others.

Note that these instructions are for the '2014-02-12' revision of the table_annovar tool (changeset 6:091154194ce8), installing into the latest_2014.08.11 version of galaxy-dist.

1. Install the table_annovar tool from the toolshed

This is the devteam owned tool on the main toolshed:

and can be installed via the usual admin interface within Galaxy (see for example

2. Install the ANNOVAR software

In addition to the Galaxy tool you also the actual ANNOVAR software. To download a copy you first need to register at:

Once registered you should receive a link to download the latest version (e.g. annovar-2014nov12.tar.gz). Note that ANNOVAR's licensing conditions prohibit commercial use without a specific agreement, and that users are not permitted to redistribute ANNOVAR to others, including lab members.

Unpack the tar.gz file into a directory where it can be executed by your Galaxy user. For example:

# Make a location for ANNOVAR
$ mkdir -p /home/galaxy/apps/annovar/
# Move into this directory
$ cd /home/galaxy/apps/annovar/
# Unpack the ANNOVAR software
$ tar zxf /path/to/annovar-2014nov12.tar.gz
# Rename the unpacked directory to '2014nov12'
$ mv annovar 2014nov12

This puts the ANNOVAR programs into the directory /home/galaxy/apps/annovar/2014nov12. The actual location isn't so important as long as you know where it is so you can reference it in the next section.

3. Set up the Galaxy environment to make ANNOVAR available to the tool

Essentially we need to manually create the files and directories that Galaxy will use to set the environment appropriately when the ANNOVAR tool is run.

This needs to be done in the directory pointed to by the tool_dependency_dir variable in your Galaxy configuration file (either galaxy.ini or universe_wsgi.ini, depending on the age of your Galaxy distribution) - by default this is ../tool_dependencies (which is relative to your galaxy-dist directory).

Under this directory make a subdirectory for ANNOVAR, for example:

$ cd tool_dependencies/
$ mkdir -p annovar/2014nov12

In the 'annovar' directory make a symbolic link to point to this default version:

$ cd annovar/
$ ln -s 2014nov12 default

Then in the '2014nov12' dir make a file called file which looks like:

$ cd 2014nov12/
$ cat <
export PATH=/home/galaxy/apps/annovar/2014nov12:$PATH

(Substitute the directory that you unpacked the ANNOVAR software into in the previous step.) Galaxy will source this file when running the ANNOVAR tool in order to make the underlying programs available.

4. Add the 'annovar_index' data table to the master list of data tables

The ANNOVAR tool gets information about the installed databases from a file called annovar_index.loc. For the version of Galaxy that I'm using there is already a copy of this file in the galaxy-dist/tool-data directory, but the tool won't pick up any databases referenced there until we add the following to the end of the tool_data_table_conf.xml:

    <!-- Location of ANNOVAR databases -->
    <table comment_char="#" name="annovar_indexes">
        <columns>value, dbkey, type, path</columns>
        <file path="tool-data/annovar_index.loc"> </file>

Important: this must appear before the closing </tables> tag in the file!

(Note that you will need to restart Galaxy after this step to get it to pick up the data table.)

It's possible that newer versions of Galaxy might not include annovar_index.loc, in which case you'll need to locate the copy that's supplied in the tool itself and copy that to the tool-data directory. The following Linux command (executed from galaxy-dist) should do the trick:

$ find tool-data -name "annovar_index.loc"

5. Install ANNOVAR databases and update the .loc file

At this point the tool is almost set up; it's just missing any actual databases to work with.

The list of available databases can be found here:

and they can be downloaded using the script (which is part of ANNOVAR).

It is important to note that the ANNOVAR tool expects all the database files for a specific genome build to be in the same directory.

As an example: say we want to make the hg19 refGene and ensGene databases available in the Galaxy tool. In this case we first download the data:

$ cd /home/galaxy/data/annovar/
$ /home/galaxy/apps/annovar/2014nov12/ -downdb -buildver hg19 -webfrom annovar refGene hg19
$ /home/galaxy/apps/annovar/2014nov12/ -downdb -buildver hg19 -webfrom annovar ensGene hg19

This will download the files for both databases to a subdirectory called 'hg19' under /home/galaxy/data/annovar/ (you should choose or make your own location as appropriate).

Then update annovar_index.loc to point to these data. The header of the .loc file specifies the format of each line, but essentially each database should be described by a line with four tab-separated fields:

  • Database name: the text that appears for the database within the ANNOVAR tool)
  • Genome build
  • Database type: the ANNOVAR databases are divided into three types: "gene annotations" ('gene_ann'), "annotation regions" ('region') and "annotation databases" ('filter') -  empirically, if a database download contains '.fa.' files it appears to be 'gene_ann', if it contains '.idx' files then it's 'filter'
  • Path to the directory holding the downloaded data files

For the refGene and ensGene example above, for both the genome build is 'hg19', the data are of type 'gene_ann', and the directory holding the files is /home/galaxy/data/annovar/hg19/. So the .loc file entry will look like:

refGene   hg19 gene_ann /home/galaxy/data/annovar/hg19

Finally, you will need to restart Galaxy to refresh the available databases for the ANNOVAR  tool (or if you only have a single Galaxy server running then you can use the option under "Manage local data (beta)" in the "Admin" interface to reload the data).

6. Troubleshooting

It's recommended to run a few example ANNOVAR jobs to check that everything is set up correctly. Some problems that I've encountered in the past include:

#1 The expected databases don't appear as options in the tool

Only databases which match the genome build assigned to the input dataset will be presented as options. Check that the input dataset has been assigned to the correct genome build.

#2 The job produces an empty output file when annotating against a single database

Check the log for the a line like: command not found

which suggests that the file created in step #3 above is not correct.

#3 The job produces an empty output file when annotating against multiple databases

First check for the previous error; if this isn't the case then check the stderr output for a message like:

Error: the required database file ... does not exist.

which suggests a problem with your annovar_index.loc file. Check that the database file does indeed exist, and that all the data files for the genome build are in the same directory (see step #5 above).

Tuesday, 17 March 2015

Custom Google web searches for Galaxy help

Google can be a great tool when searching for help with deploying or developing software. However in specific case of Galaxy, searching  the whole of the web can also throw up a lot of unrelated hits (such as astrophysics, or tablet products, to take two examples). I learned recently though that there are now a few custom Google searches available which can help narrow the results.

The full set of searches can be accessed via

with the most useful for me being the "Galaxy Admin & Development" subsearch at

I haven't used them extensively yet but the couple of searches I've tried have been encouragingly relevant. So hopefully these will help me avoid stellar aggregates and mobile phones in future.