Kentaro Kuribayashi's blog

Software Engineering, Management, Books, and Daily Journal.

How to Handle Multiple `X-Reproxy-URL`s in Nginx

Serving large file via proxy server by reproxying is efficient in terms of resource usage of application server. Perlbal has been supporting the feature since long before, and it has been used for that purpose. Since Nginx can do it by default in recent years, you can use it for reproxying instead of Perlbal.

Problem

There is, however, one big problem to be solved before replacing Perlbal with Nginx. Perlbal can handle multiple X-Reproxy-URLs but Nginx can't, without any crafting. The document of Perlbal says:

This support also extends to URLs that can be located anywhere Perlbal has access to. It's the same syntax, nearly:

X-REPROXY-URL: http://foo.com:80/resource.html

You can also specify multiple URLs:

X-REPROXY-URL: http://foo.com:80/resource.html http://baz.com:8080/res.htm

Just specify any number of space separated URLs. Perlbal will request them one by one until one returns a response code of 200. At that point Perlbal will proxy the response back to the user just like normal.

We use Perlbal as a frontend proxy server for MogileFS and it now actually handles such multiple URLs in our production. If we want to use Nginx instead, we have to solve the problem on Nginx.

Reproxy Basic in Nginx

First of all, Nginx doesn't handle X-Reproxy-URL directly. Instead, it provides "internal redirection" feature called X-accel. It's more general way than Perlbal's reproxying feature.

In addition to X-Reproxy-URL, you have to set another HTTP header, X-Acces-Redirect. Below is an example in PHP:

header('X-Reproxy-URL: http://example.com/large_file');
header('X-Accel-Redirect: /reproxy');

Then nginx catches the path set by X-Accel-Redirect header, and fetches a resource located at the URL.

location /reproxy {
    internal;
    set $reproxy $upstream_http_x_reproxy_url;
    proxy_pass $reproxy;
}

Solution

Assume that you're passed 2 URLs per one file from upstream server. It's a string that contains space-separated URLs as described in the Perlbal document quoted above. You now have to split the URLs out from the string. It can be done by such a configuration like below:

location /reproxy {
  internal;

  if ($upstream_http_x_reproxy_url ~ "^([^ ]+)\s+([^ ]+)") {
    set $reproxy1 $1;
    set $reproxy2 $2;
  }

  proxy_intercept_errors on;
  error_page 404 500 502 503 504 = @reproxy2;

  proxy_pass $reproxy1;
}

location @reproxy2 {
  proxy_pass $reproxy2;
}

Let's look into the configuration in detail.

  1. Upstream server responds with an X-Reproxy-URL header that is like "http://example.com/file1 http://example.com/file2"
  2. Split the URLs from the header and set them to $reproxy1 and $reproxy2
  3. First, just try to fetch the file located at the URL in $reproxy1
  4. If it fails, try again to fetch the file from the URL in $reproxy2

You must set proxy_intercept_errors on, otherwise the second trial will never happen.

Conclusion

Now we can handle multiple X-Reproxy-URLs also by Nginx, we're ready to switch to Nginx from Perlbal, at least in terms of reproxying feature. If you know more decent way, please notify to me ;)