Configuring a Local, Scalable, High-Availability Kubernetes Postgres Service with Kubegres

In the last post we configured a high-availability vault server in a local k8s cluster. In this fourth post we are going to set up a local, k8s-managed, high-availability postgres database. Developers differ on whether to containerize their databases. The prevailing practice is to containerize the app--but leave database management to cloud providers, or run them on a VM. This is for good reason: containerization works most seamlessly for stateless components such as web apps, because containers are by definition ephemeral. Additionally, scaling production-grade databases and syncing data is notoriously difficult, so most developers choose to leave that chore to cloud-providers who offer databases-as-a-service.

Yet the draw to manage databases in k8s is alluring because we’d like to manage our entire stack (stateless and stateful) with a unified and declarative workflow. In addition, keeping our development and production environment as similar as possible gives us more confidence when testing or debugging.

In recent years managing databases in k8s has become more realistic as production-grade operators, such as the CrunchyData postgres operator have matured. Operators are k8s extensions that use custom resource definitions (CRDs) to manage application components. CRDs are developer-defined objects, defined with yaml, just like deployments, jobs etc.

My goal for this post was to see for myself how easily I could get one of these operators up-and-running locally. I tried to set up a database with two different production-grade operators: CrunchyData’s postgres-operator and Oracle’s mysql-operator. I was unable to successfully install either operator because I got various errors, and I wasn’t able to solve errors by asking for help on the two projects’ GitHub pages.

I was about to give up and settle on a different strategy whereby I'd have different configurations for development vs. production using kustomize to factor out common elements of the yaml manifests. The development environment would run one postgres instance and mount the data directory as a volume. For high-availability and fault-tolerance, I'd use a database-as-service in production. But something still bothered me about this. I didn't want to give up on the ideal of making my dev/prod environments as similar as possible.

Then I found kubegres, a very new postgres operator. It worked out-of-the-box for me. I had one issue, which was a misunderstanding on my part, and the kubegres folks quickly clarified it via their Issues queue.

With that background out of the way, let’s discuss how exactly how I got this set-up working. I borrowed heavily from kubegres' Getting Started guide, but added additional information and addressed some gotchas.

Cleanup the Client Code

Before we go on, let's clean up server.js a bit. First we'll factor out the JavaScript that interacts with vault.

Create a file, vault.js at the root of the project, and cut the vault code from server.js pasting it in vault.js. vault.js:

const fs = require("fs");
const axios = require("axios").default;

function Vault() {
  const axiosInst = axios.create({ baseURL: `${process.env.VAULT_ADDR}/v1` });
  const getHealth = async () => {
    const resp = await axiosInst.get(`/sys/health?standbyok=true`);
    return resp.data;
  };

  const getAPIToken = () => {
    return fs.readFileSync(process.env.JWT_PATH, { encoding: "utf-8" });
  };
  const getVaultAuth = async (role) => {
    const resp = await axiosInst.post("/auth/kubernetes/login", {
      jwt: getAPIToken(),
      role,
    });
    return resp.data;
  };
  const getSecrets = async (vaultToken) => {
    const resp = await axiosInst("/secret/data/webapp/config", {
      headers: { "X-Vault-Token": vaultToken },
    });
    return resp.data.data.data;
  };
  return {
    getAPIToken,
    getHealth,
    getSecrets,
    getVaultAuth,
  };
}
module.exports = Vault;

The top of server.js should now look like this:

const process = require("process");
const express = require("express");
const app = express();

// Import my fns that interact with Hashicorp Vault.
const Vault = require("./vault");
...

Let's also modify our server to expose two routes (/config/ and /films/), one to show config data from the ConfigMap and the secret vault data and the other to serve data from the database we are about to install. We are serving the config data for no real purpose other than to prove to ourselves that we haven't broken our previous code. Obviously we'd never serve secret data in real life. We'll also serve JSON instead of HTML.

First the /config/ endpoint. Replace the / endpoint with this:

const process = require("process");
const express = require("express");
const app = express();

// Import my fns that interact with Hashicorp Vault.
const Vault = require("./vault");
app.set("json spaces", 2);

// config endpoint showing configs, secrets etc.
app.get("/config", async (req, res) => {
  const vault = Vault();
  const vaultAuth = await vault.getVaultAuth("webapp");
  const secrets = await vault.getSecrets(vaultAuth.auth.client_token);
  res.json({
    MY_NON_SECRET: process.env.MY_NON_SECRET,
    MY_OTHER_NON_SECRET: process.env.MY_OTHER_NON_SECRET,
    username: secrets.username,
    password: secrets.password,
  });
});
...

Now point your browser to /config (make sure skaffold is up and running and you told minikube to serve your app via $ minikube start web-servive). You should see:

Install the kubegres Operator

I assume you left off from the previous post.

Download the operator manifest to the manifest directory. Assure you have skaffold dev running. Once you download this your project should rebuild on-the-fly:

$ wget \
   https://raw.githubusercontent.com/reactive-tech/kubegres/v1.9/kubegres.yaml \
   -P manifests/

kubegres.yaml should be downlaoded to your manifests directory, and skaffold should pick up the changes. Once skaffold does its thing, we should be able to see the controllers kubegres installed. Controllers are k8s objects that represent control loops that watch the state of your cluster, then make or request changes where needed:

$ kubectl get all -n kubegres-system
NAME                                               READY   STATUS    RESTARTS   AGE
pod/kubegres-controller-manager-798885c897-pzdgh   2/2     Running   6          11d

NAME                                                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/kubegres-controller-manager-metrics-service   ClusterIP   10.106.22.161   <none>        8443/TCP   11d

NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kubegres-controller-manager   1/1     1            1           11d

NAME                                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/kubegres-controller-manager-798885c897   1         1         1       11d

Create a Secret Resource

Now, we are going to create as secret resource which supplies the (main and replica) postgres superuser passwords. In the manifests directory, create this yaml file:

manifests/postgres-secret.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
  namespace: default
type: Opaque
stringData:
  superUserPassword: postgresSuperUserPsw
  replicationUserPassword: postgresReplicaPsw

A few things to note: in real-life we'd have strong superuser passwords, and since we have a vault server running it would be much more secure to store these passwords there than in a plain-text secret. We'll fix this in the next post when we talk about injecting secrets.

Create postgres Pods

Now we'll create our set of postgres pods using the Kubegres CRD:

Create manifests/postgres.yaml:

apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: postgres
  namespace: default
spec:
   replicas: 3
   image: postgres:13.2

   database:
      size: 200Mi
   customConfig: postgres-conf
   env:
      - name: POSTGRES_PASSWORD
        valueFrom:
           secretKeyRef:
              name: postgres-secret
              key: superUserPassword
      - name: POSTGRES_REPLICATION_PASSWORD
        valueFrom:
           secretKeyRef:
              name: postgres-secret
              key: replicationUserPassword

Once skaffold applies this, we'll have three postgres pods: one primary pod, and two replicas. The pods get the root password from env vars, set in the postgres-secret.yaml. Again, we'll address the security implications of this in the next post. For now we want to focus on getting the db service up-and-running.

The postgres Data Directory

You may wonder about postgres' data directory is in this configuration. After all, we haven't explicitly configured it within a manifest, and this directory is key to how postgres actually persists data. kubegres uses persistent volume claims (PVC) for the data directory. By default it uses whatever the cluster's default storage class. In the case of minikube the storage class is already provisioned and is located within minikube's internals:

$ kubectl get sc
NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
standard (default)   k8s.io/minikube-hostpath   Delete          Immediate           false                  12d

When we configure the cluster for production, we'll have to customize this and set the pvc to a storage class that's appropriate for our cloud provider--perhaps Google Persistent Disk. We'll customize the storageClassName property in postgres.yaml when we get to that. Documentation for how to customize this is here: https://www.kubegres.io/doc/properties-explained.html.

Create postgres User and Database

A client app should connect to its own database, not the default postgres database that exists when postgres is first installed. Additionally it's a security problem to connect and interact with postgres as the superuser, so we need to figure out the best place to create a new, non-root user and a database. In kubegres the way to hook into the initialization is by overriding the primary init script. Create a new manifest, manifests/postgres-conf.yaml. In it we are going to embed a shell script that runs in the primary postgres pod. It calls psql, logging in as the super user and creates a new non-superuser along with a new database:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-conf
  namespace: default

data:
  primary_init_script.sh: |
    #!/bin/bash
    set -e

    # This script assumes that the env-var $POSTGRES_MY_DB_PASSWORD contains the password of the custom user to create.
    # You can add any env-var in your Kubegres resource config YAML.

    dt=$(date '+%d/%m/%Y %H:%M:%S');
    echo "$dt - Running init script the 1st time Primary PostgreSql container is created...";

    customDatabaseName="web"
    customUserName="web"

    echo "$dt - Running: psql -v ON_ERROR_STOP=1 --username $POSTGRES_USER --dbname $POSTGRES_DB ...";

    psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
    CREATE DATABASE $customDatabaseName;
    CREATE USER $customUserName WITH PASSWORD 'akhd5';
    GRANT ALL PRIVILEGES ON DATABASE "$customDatabaseName" to $customUserName;
    EOSQL

    echo "$dt - Init script is completed";

Again, in a later post we will move the database name, user name to configs, and the password to a secret in vault. You should note that when you are writing the init script, it won't get executed after the first time the PVCs are provisioned. So you when debugging you will have to make sure to delete the PVCs:

$ kubectl delete pvc postgres-db-mypostgres-3-0 && \
   kubectl delete pvc postgres-db-mypostgres-2-0 && \
   kubectl delete pvc postgres-db-mypostgres-1-0
   ...

Kubernetes Secret Management

Most apps consume secret data (e.g. API keys, database passwords etc.). We explored managing configuration in the first part of this series using configmaps. However, configmaps are meant
for storing non-sensitive configuration data because they are unencrypted at rest and usually are set by a yaml file, which would likely be checked into source control.

Continue reading "Kubernetes Secret Management"

Local Development in Kubernetes

In this second post of my Exploring Kubernetes series we're going to make our k8s development environment less onerous. The code can be found here.

There's continuing debate whether you should local k8s development at all. One camp says that you should develop on a remote cluster in order to mimic production as much as possible. While this has merits, there are two big disadvantages to remote development: cost and availability.

Continue reading "Local Development in Kubernetes"

Exploring Kubernetes

With cloud native and microservice architectures gaining traction, Kubernetes (k8s) has become the standard tool for managing deployments. But what is it, do I need it, and how do I most effectively get started with it? That's what this post aims to clarify.

I'm no k8s expert. I've been picking it up because I'm interested in the devops space and because I see the problem domain it solves in my daily work. I've found the best way to learn something is to simply start working with it. Even better, is to write about it, as writing reinforces what you learned. If you can't explain something clearly, then you don't really understand it.

In this series of posts we'll develop a basic expressjs server, use k8s to develop locally and deploy it to a production environment. We'll take it step-by-step. After the first post we'll have an expressjs server running and be able to deploy it via k8s to a development environment. In further posts we'll explore local development, secret management, production clusters and stateful components, like databases.

Continue reading "Exploring Kubernetes"

Evaluating an Existing Tech Project

A dilemma often faced by professional programmers is whether to work with what they have, or start fresh when inheriting others' code.

They say it's easier to write code than to read it (especially other people's code). Why is this?

  1. Unless the code is perfectly readable, it's challenging to put yourself in someone else's head.
  2. Every project has context and baggage that is often not documented.
  3. As a new developer you have the luxury to be a critic. The former developer had deadlines to meet, family crises etc. You will also, soon enough...

Continue reading "Evaluating an Existing Tech Project"

The Essence of Functional Programming

This post is aimed toward giving a basic understanding of functional programming (fp) for the beginning to intermediate JavaScript developer with a few years of experience. This is not an endorsement of applying such techniques in arbitrary circumstances. Other factors, such as existing code, co-workers' experiences and preferences etc. are important factors when deciding to use this style.

Functional programming (fp) is getting a lot of attention in the webdev/JavaScript community, due to the visibility of libraries such as ReactJs, Redux and Rxjs.

Continue reading "The Essence of Functional Programming"