With flat-file CMS like Grav gaining popularity, it is useful to maintain a single source of truth on GitHub. Once the site is on GitHub, it can be easily modified by multiple contributors. Using the Webhooks functionality available in GitHub, the hosted website can be updated automatically when any modifications are pushed to GitHub.

This update is usually a simple process with a short PHP script. But when working with private GitHub repositories, because of the security requirements of GitHub and the way Apache works on Ubuntu, it requires some additional configuration for the process to function seamlessly.

Automating Git Pull

Andy Miller has provided an excellent write up on configuring Grav with GitHub Webhooks and Apache: Grav Development with GitHub - Part 2.

Note: the process is not specific to Grav and can be used for any site to implement automated updating via git pull

The PHP script is reproduced here for easy access:

<?php
date_default_timezone_set('America/Los_Angeles');
ignore_user_abort(true);
set_time_limit(0);

$repo          = '~/public_html/grav-blog';
$branch        = 'master';
$output        = array();

// update github Repo
$output[] = date('Y-m-d, H:i:s', time()) . "\n";
$output[] = "GitHub Pull\n============================\n" . shell_exec('cd '.$repo.' && git pull origin '.$branch);

// redirect output to logs
file_put_contents(rtrim(getcwd(), '/').'/___github-log.txt', implode("\n", $output) . "\n----------------------------\n", FILE_APPEND);
?>

A few suggested edits:

  1. Change the timezone to your own.
  2. Set the repo path to the absolute path like /home/ubuntu/public_html/grav-blog
  3. Put the script under the same directory as above to avoid permission issues
  4. Ensure that the entire directory is owned by www-data:www-data

It all works beautifully in the case of public GitHub repositories because public repositories can be pulled using https without the need to login to GitHub or using ssh keys.

Problem with Private Repositories

In the case of private GitHub repositories, the above guide works until the last step, which involves calling a PHP script through a Webhook on GitHub to git pull the latest changes to the Apache server.

Because of the security requirements of GitHub and the way Apache works on Linux, this step fails silently. As the script itself is called successfully, GitHub shows an HTTP 200 (OK) response and there is no indication that the remote site has infact not been updated.

To git pull private repositories from GitHub, there are 2 ways:

  1. If the remote repository is configured as https://, it needs the username and password of authorized user. This password can be cached for a while (usually 15 minutes at a time), to prevent repeatedly asking for it.
  2. If remote repository is configured as ssh://, the ssh public key of the authorized user has to be installed into GitHub to allow cloning and pulling.

Note: The current remote origin of the repository can be verified by using git config --local -l

The best way to clone and pull private repositories is to use ssh:// as the remote origin url. This avoids having to input the username and password and allows for automating workflows since different ssh keys and identities can be configured in ~/.ssh/config

The reason why the git pull step fails is because on Ubuntu, Apache executes the script as the user www-data. Thus, git looks for the ssh keys associated with the user www-data and failing to find them, is unable to complete the git pull request.

The Solution to Pulling Private Repositories with PHP

On Ubuntu 16.04, the user www-data is assigned the home directory /var/www. This is the directory in which git looks for the ssh keys to negotiate the transfer. Thus the solution is make GitHub believe that the user www-data is real by assigning it a valid set of keys. To break down the steps:

Note: This assumes that you have sudo access.

  1. Create a directory /var/www/.ssh owned by www-data:www-data

    $ sudo mkdir -p /var/www/.ssh
    $ sudo chown -R www-data:www-data /var/www/.ssh
  2. Create ssh keys in the directory

    $ cd /var/www/.ssh
    sudo ssh-keygen -t rsa -b 2048

    When ssh-keygen asks for the directory to put the keys in, choose /var/www/.ssh/id_rsa. Also do not set a passphrase for the key.

  3. Ensure that the persmissions and ownership of the keys is correct. chown to www-data:www-data as necessary.

    $ ls -la /var/www/.ssh/
    total 24K
    drwxr-xr-x 2 www-data www-data 4.0K Apr 29 23:58 ./
    drwxr-xr-x 5 root     root     4.0K Apr 30 00:06 ../
    -rw------- 1 www-data www-data 1.7K Apr 29 23:33 id_rsa
    -rw-r--r-- 1 www-data www-data  394 Apr 29 23:33 id_rsa.pub
  4. Copy the id_rsa.pub key to the authorized ssh keys in the GitHub repository settings.

  5. It is important to ensure that the git pull works when performed as the user www-data. Using ssh also needs adding the GitHub server identity to the known_hosts file. However, the user www-data does not have a login shell by default. So we have to use a simple trick:

    $ sudo vi /etc/passwd

    Find the line for www-data and change the /usr/sbin/nologin to /bin/bash and save the file. The entry for www-data should look similar to:

    www-data:x:33:33:www-data:/var/www:/bin/bash
  6. Change to the user www-data

    $ sudo su
    # su - www-data
  7. Once you are logged in as www-data, go to the git repository and perform a git pull manually.

  8. The ssh process will ask you to add the identity of the GitHub server to known_hosts file and use the key pair under /var/www/.ssh to complete the git pull.

  9. If it succeeds, you should be set. Try to push a commit to GitHub from another computer and verify that the PHP script executes the pull request.

  10. Reset the /etc/passwd file to it's original state with the login shell of the www-data user as /usr/sbin/nologin

Comments

  • Try not to reuse ssh keys of the regular Ubuntu user for the www-data user. It is always better to create different keys for different users and different applications.

  • If the ssh keys are named differently from id_rsa, you will have to create a file /var/www/.ssh/config to specify the identityFile to use for the negotiation.

  • Another possibility to specify the ssh command is to use the git config command. For example if the key is id_rsa_apache:

    $ git config core.sshCommand "ssh -i /var/www/.ssh/id_rsa_apache -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no"
  • If you need to reconfigure the repo to use ssh rather than https, here is the command for making the change:

    $ git remote set-url origin <new git@github.com url>
Blog Comments powered by Disqus.

Next Post Previous Post