NGINX reverse proxy to self

proxy_pass to same server for caching and more when using OpenResty/NGINX as your application server

NGINX is known for its reverse proxy functionality: NGINX acts as a gateway server that can forward requests to a backend, while managing a large number of connections and ensuring clients are behaving correctly. Typically the server you proxy too is an entirely different process written in a different language.

With OpenResty, your application server is NGINX. In all of my projects I've typically used a single NGINX that handles internet traffic and does the application logic.

NGINX’s reverse proxy facilities are powerful though, so in this guide we'll use them to point back to the same instance of NGINX. Then show how we can use NGINX caching, SSI, and gzip compression.

Initial Configuration

Before going into any of the detailed examples we'll set up a configuration uses proxy_pass to pass the request to the same instance of NGINX.

This configuration example isn’t completely standalone, so expect to adapt it for your setup. If you have any questions on how to do that, leave a comment below.

http {
  server {
    server_name mywebsite.com;
    listen 80;
    listen 443 ssl;

    location / {
      proxy_pass http://127.0.0.1:80;
      proxy_set_header Host mywebsite.local;

      # include details about the original request
      proxy_set_header X-Original-Host $http_host;
      proxy_set_header X-Original-Scheme $scheme;
      proxy_set_header X-Forwarded-For $remote_addr;
    }
  }

  server {
    # must match host header & port from above
    server_name mywebsite.local; 
    listen 80;

    # can be used to prevent double logging requests
    access_log off; 

    # only allow requests from same maching
    allow 127.0.0.1;
    deny all;

    location / {
      ## render you application
      content_by_lua_block '
       -- ...
      ';
    }
  }
}

Enhanced Configuration

The following examples focus on making changes to the reverse proxy server block, they will only contain that part of the configuration. Refer to above to see the rest of the configuration example.

Enable gzip Compression

Adding gzip compression to your HTML responses is a good way to boost client performance. If you're using OpenResty to write a response for a web application the gzip configuration option does not work. You can, however, use the reverse proxy server to gzip the response before it returns it to the client. Make the following change:

location / {
  proxy_pass http://127.0.0.1:80;
  proxy_set_header Host mywebsite.local;

  gzip on;
  gzip_proxied any;

  # if necessary, limit by content type:
  # gzip_types application/json text/html;

  # ...
}

Using The NGINX Caching Module

The NGINX proxy module contains a powerful caching system. It’s a great alternative to using separate software like Varnish since it’s already built in.

The cache utilizes the file system to store cached objects, so it survives a server reboot and cached files can be purged by deleting the respective file.

There’s a rich set of configuration options for the cache, so adapt this basic example to fit your needs. Additionally, the caching requirements of applications vary significantly.

A common usecase is caching logged out pages while enabling users who are logged in to see content generated by the application server. In order to accomplish this, the application server must be able to control the cachability of a response, and the proxy server must be able to know when to skip the cache.

Here’s a quck overview:

  • An incoming request should skip the cache if the session cookie is set
  • A rendered response should be not be cached if it was rendered for a logged in account.

It’s important to get both of these right. Mistakes with caching can leak private account information or break your site.

server {
  # create a cache named 'pagecache'
  # has 1g cache with space for 100m of keys
  proxy_cache_path ../pagecache levels=1:2 keys_zone=pagecache:100m max_size=1g inactive=2h use_temp_path=off;

  location / {
    proxy_pass http://127.0.0.1:80;
    proxy_set_header Host mywebsite.local;

    # use our cache named 'pagecache'
    proxy_cache pagecache;

    # cache status code 200 responses for 10 minutes
    proxy_cache_valid 200 10m;

    # use the cache if there's a error on app server or it's updating from another request
    proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;

    # add a header to debug cache status
    add_header X-Cache-Status $upstream_cache_status;

    # don't let two requests try to populate the cache at the same time
    proxy_cache_lock on;

    # bypass the cache if the session cookie is set
    proxy_cache_bypass $cookie_session;

    # ...
  }
}

I recommend going through each of these config directives and reading about how they work on the NGINX documentation. The example above is not something you can copy and paste, but instead is a starting point for researching the different caching options.

In the above example, the $cookie_session variable is used to toggle skipping the cache. The cache should be skipped when for sessions that require dynamic content, typically users who are logged in. When using the Lua NGINX module, it’s easy to insert some code to set this variable:

set_by_lua $cookie_session '
  -- pseudo code example
  local parse_session = require("my_session_library")

  -- if a session is available, return 1 to trigger cache bypass
  if parse_session() then
    return "1"
  else
    return ""
  end
';

proxy_cache_bypass $cookie_session;

Controlling what pages get cached

The response from the proxied request can control whether is able to be cached by using HTTP headers like Cache-Control. By default, the NGINX cache is aware of a handful of headers, but you can disable them using proxy_ignore_headers

The simplest way to prevent a request from being stored in the cache is to use Cache-Control: no-store.

It’s important to send this header for any session-specific request, like a logged in user’s request. If this isn’t done, a logged-in view may be cached and presented to everyone who visits your site. For some sites, this may even leak sensitive data.

Using Server Side Includes

Server side includes, or SSI, is a NGINX module that allows you to modify a request based by injecting special tags into your response that NGINX understansd. It can be used to compose the final response out of multiple HTTP requests.

Combined with some of the techniques above, some interesting performance optimizations can be achived with the NGINX cache.

SSI can be enabled location, server, or http block. I recommend being as specific as possible, and only enabling SSI on the locations where it is needed. SSI can be a security vulnerability if untrusted SSI tags get evaluated.

If your website user submitted data then you need to be careful about santiization. If a user is able to insert a SSI directive then your server may be compromised. Since SSI tags use < and >, a standard HTML sanitizer will generally work, but verify before deploying.

To Enable SSI Only a single config line is necessary:

server {
  location / {
    ssi on;
    # .. other configuration
  }
}

Including cached content

By creating a internal NGINX location that returns fragments of HTML served through the NGINX cache, you can insert cached cached into parts of a page. A good candidate for this might be a comment system, or a recommender system. In this approach, the main page can be rendered with dynamic content, and cached content can be inserted through an SSI.

This works best when pages generally render fast, but there are some sections that can be slow to render.

Because the NGINX cache is being used, all the features it has comes free, like stale revalidation and expiration.

Inserting dynamic content in cached content

SSIs tags are processed after the NGINX cache pulls a response. The SSI tags are stored “as is” in the cache. When serving a cached result with SSI processing enabled, the cached response will be scanned for SSI tags, and any replacements will be made.

By using this approach you can cache an entire page, while still having dynamic sections in the page. A good candidate for this might be inserting data about the user’s session, or adding CSRF tokens.