Galactic Engineer: Exposing Galaxy reports via nginx in a production instance

Galaxy includes a report tool that is separate from the main process but which gives lots of potentially useful information about the usage of a Galaxy instance, for example the numbers of jobs that have been run each month, how much disk space each user is currently consuming and so on.

However there doesn't appear to be much documentation about the report tool on the official Galaxy wiki: the most I could find was a rather sparse page at https://wiki.galaxyproject.org/Admin/UsageReports, which gives a very bare bones overview, and doesn't include any information on how it might be exposed in a secure manner in a production environment. Therefore in this post I outline how I've done this for our local Galaxy set up, which uses nginx; however I imagine it could be adapted to work with Apache instead.

1. Set up up the report tool to run on localhost

The report tool takes its configuration settings from a file called reports_wsgi.ini, which is located in the config subdirectory of the Galaxy distribution.

Configuring the reports for your local setup is a case of:

Making a copy of reports_wsgi.ini.sample called reports_wsgi.ini
Editing the database_connection and file_path (if not the default) parameters to match those in your galaxy.ini (or universe_wsgi.ini) file
Optionally, editing the port parameter (by default the tool uses port 9001)
You should also set the 'salt' parameter session_secret if you intend to expose the reports via the web proxy (see below)

Then you can start the report server using

sh run_reports.sh

and view the reports by pointing a web browser running on the same server to http://127.0.0.1:9001.

If you'd like the report tool to persist between sessions then use

sh run_reports.sh --daemon

to run it as a background process. As with Galaxy itself, use --stop-daemon to halt the background process. (The log file is written to reports_webapp.log if you need to try and debug a problem.)

2. Expose the report tool via nginx

If you're running a production Galaxy and want to be able access the reports from a browser running on a different machine to your Galaxy server then you can could consider using SSH tunnelling, which essentially forwards a port on your local machine to one on the server i.e. port 9001 where the report tool is serving from (see "SSH Tunneling Made Easy" at http://www.revsys.com/writings/quicktips/ssh-tunnel.html for more details of how to do this).

Alternatively if you are using a web proxy (as is standard for a production setup) then you could try serving the reports also via the proxy (in this case nginx). In this example I assume that if Galaxy is being served from e.g. http://galaxy.example.org/ then the reports will be viewed via http://galaxy.example.org/reports/.

First, make the appropriate edits to reports_wsgi.ini: if you have an older Galaxy instance then you'll need to add some sections to the file, specifically:

[filter:proxy-prefix]

use = egg:PasteDeploy#prefix

prefix = /reports

(before the [app:main] section), and

filter-with = proxy-prefix

cookie_path = /reports

(within the [app:main] section.)

For more recent latest Galaxy instances it's simply a case of making sure that the existing filter-with and cookie_path lines are uncommented and set to the values above.

Next it's necessary to add upstream and location sections in your nginx.conf file:

(This has many similarities to serving Galaxy from a subdirectory via nginx proxy at a subdirectory, see https://wiki.galaxyproject.org/Admin/Config/nginxProxy).

One important thing to be aware of is that the report tool doesn't include any built-in authentication, so it's recommended that you add some authentication within the web proxy. Otherwise anyone in the world could potentially access the reports for your server and see sensitive information such as user login names.

To do this with nginx, first create a htpasswd file to hold a set of user names and associated passwords, using the htpassword utility, e.g.:

htpasswd -c /etc/nginx/galaxy-reports.htpasswd admin

-c means create a new file (in this case /etc/nginx/galaxy-reports.htpasswd); admin is the username to add. The program will prompt for a password for that username, and store it in the file. You can use any username, and any filename or location (with the caveat that it must be readable by the nginx process) that you wish.

Finally to associate the password file with the reports location update the nginx config file appropriately by adding two more lines:

(I found this article very helpful here; note that it also works for https in spite of the title: "How to set up http authentication with nginx on Ubuntu 12.10" https://www.digitalocean.com/community/tutorials/how-to-set-up-http-authentication-with-nginx-on-ubuntu-12-10).

Once nginx has been restarted then anyone attempting to view the reports at http://galaxy.example.org/reports/ will be prompted to enter a username/password combination matching an entry in the htpasswd file before they are given access. Authorised users can then peruse the reports to their heart's content.

Galactic Engineer

Tuesday, 2 June 2015

Exposing Galaxy reports via nginx in a production instance

2 comments:

Search This Blog