Heroku Optimisations - Unicorn, S3 and Cloudfront

As part of my continued learnings as a rails developer I had some free time during the week and looked into optimising heroku to make my rails apps far more efficient and effective. When i tested unicorn it showed itself to be a awesomely efficient application server giving you far more bang for your buck on heroku. Heres a quick guide of how I set up unicorn and some additional optimisations using s3 and cloudfront to serve my files that will now become my default set up to get the most out of my dynos.

Setting up Unicorn

Unicorn is a HTTP server available for rails and other ruby frameworks that increase concurrency and the performance of your apps without having to increase the number of dynos (for a better explanation on the fundamentals of concurrency see this excellent post). It does this by spawning extra processes to deal with incoming requests. It rocks, and showed itself to out perform the other popular ruby HTTP servers by a significant margin.

1) Install the Unicorn gem into your gem file

Gemfile
1
2
3
...
gem 'unicorn'
...

2) replace the current web: line in your procfile with

Procfile
1
web: bundle exec unicorn_rails -p $PORT -c ./config/unicorn.rb

3) create a unicorn.rb file in the config directory with the following contents

config/unicorn.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# If you have a very small app you may be able to
# increase this, but in general 3 workers seems to
# work best

worker_processes 3

# Load your app into the master before forking
# workers for super-fast worker spawn times

preload_app true

# Immediately restart any workers that
# haven't responded within 30 seconds

timeout 30

before_fork do |server, worker|

  # Replace with MongoDB or whatever

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
    Rails.logger.info('Disconnected from ActiveRecord')
  end

  # If you are using Redis but not Resque, change this

  if defined?(Resque)
    Resque.redis.quit
    Rails.logger.info('Disconnected from Redis')
  end

  sleep 1
end

after_fork do |server, worker|

  # Replace with MongoDB or whatever

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.establish_connection
    Rails.logger.info('Connected to ActiveRecord')
  end

  # If you are using Redis but not Resque, change this

  if defined?(Resque)
    Resque.redis = ENV['REDIS_URI']
    Rails.logger.info('Connected to Redis')
  end
end

4) finally set up logging so it use STDOUT in the application class of config/application.rb

config/application.rb
1
config.logger = Logger.new(STDOUT)

The additions of the config file and logger changes fix a few issues that can happen when workers are being spawned/terminated, when adding certain gems like new relic or redis and making sure we get logging information reported correctly.

Thats all there is to setting up unicorn on heroku. On the next deployment to heroku it should start the application running on unicorn with however many workers you specified. To check its working you can see it launch the unicorn server and spawn workers by doing heroku logs --tail whilst deploying.

Iv found that 3 workers works best as any higher you risk breaching heroku’s 512mb memory limit per dyno, but you can experiment depending on your application size.

Setting up s3 assets

The easiest way to serve your assets from the cloud is to use the [asset sync gem] (https://github.com/rumblelabs/asset_sync). It simplifies dumping your assets to the cloud, when you run rake assets:precompile (done by default when pushing to heroku) your compiled assets will automatically sent over to the cloud, whether thats Amazon S3, Google Storage or Rackspace. The good news is its also very easy to set up. For this setup I am using S3 and assume you have already set up a bucket and added the correct group policy

1) Add the asset sync gem to your gem file

Gemfile
1
2
3
4
5
6
7
group :assets do

.....
  gem 'asset_sync'
.....

end

2) in production.rb add our asset host

config/environments/production.rb
1
config.action_controller.asset_host = "//#{ENV['FOG_DIRECTORY']}.s3.amazonaws.com"

We chose this url over the other style //s3.amazonaws.com/#{ENV['FOG_DIRECTORY']} as we can then have our bucket in any region – using the second style results in a 301 moved permanently due to reasons explained here. There is one caveat however and that is that the bucket name cannot contain any periods (.’s) as this will break the ssl certificates over https.

3) make sure the following are set in production.rb

config/environments/production.rb
1
2
config.assets.digest = true.
config.assets.enabled = true.

4) One thing that we do is serve out staging assets and production assets from the same bucket. To do this we can add a asset prefix to our environments config file that declares the place the files should be stored within the bucket.

config/environments/production.rb
1
config.assets.prefix = "/production/assets"

5) add in the following environment variables on heroku

1
2
3
4
5
6
7
8
9
10
11
heroku config:add AWS_ACCESS_KEY_ID=xxxx
heroku config:add AWS_SECRET_ACCESS_KEY=xxxx
heroku config:add FOG_DIRECTORY=xxxx  # ( this is the bucket name )
heroku config:add FOG_PROVIDER=AWS

# and optionally:

heroku config:add FOG_REGION=eu-west-1
heroku config:add ASSET_SYNC_GZIP_COMPRESSION=true
heroku config:add ASSET_SYNC_MANIFEST=true
heroku config:add ASSET_SYNC_EXISTING_REMOTE_FILES=keep

6) finally we need to make sure these variables are available to heroku at asset compilation time so run this from the console in your app root directory

1
heroku labs:enable user-env-compile -a myapp

Setting up CloudFront to serve our files even faster

Using a CDN to serve our assets instead of S3 or even the app itself will help improve load times and decrease requests to our app as assets get delivered from a data centre closer to the end users location. Setting up a CloudFront distribution for use with our rails app requires no special configuration options so selecting our bucket and using all the default options will be fine. If you’ve used a CDN before you may have had problems expiring files once you’ve made changes to them. Thanks to the asset pipeline that comes with rails we don’t need to worry about this due to the unique identifier hash that gets appended onto the end of the file name before its extension.

So after our distribution has completed downloading the assets from the bucket we will be given a CloudFront url. All we need to do once we have this url is change our asset host to point to this url

config/environments/production.rb
1
config.action_controller.asset_host = "d1aa907b1q1s7qd.cloudfront.net"


Now when you make any changes to files and push them to heroku these new files with there unique hashes will get pushed to over to S3 and be different than the previous version. When the first user requests this new file from the CDN it wont be present so amazon is clever enough to attempt to fetch it from the bucket and save it to the CDN ready for the next person

Setting up paperclip to use CloudFront

Paperclip is great for dealing with file attachments and automatically saving things onto S3, but we could optimise this to use CloudFront instead. Its very easy to implement. Once your bucket is set up and files are being saved by paperclip to it, all thats needed is to change the following settings

1) Add in an :s3_host_alias

1
:s3_host_alias => 'd1aa907b1q1s7qd.cloudfront.net',

2) Change :url property to use “:s3_alias_url”

1
:url => ":s3_alias_url"

notice the symbol is wrapped in a string

And thats it, your uploaded content will now be served over CloudFront and not directly from s3