Building a production ready self-hosting setup

Editor's Note

Quick disclaimer before we start. This is well-trod ground. Nothing here is new or revolutionary, nothing about this implementation hasn’t already been done by smarter folks than I :) This is just my blog post about it.

"How hard could it be to run my own infrastructure?"

Famous last words, right? But I was genuinely curious. Every tutorial and course talks about cloud platforms - deploy to Vercel, use Railway, spin up on Render. What about the fundamentals? What happens behind those abstractions?

Three servers, dozens of Docker containers, and one Ansible playbook later, I have my answer. And a setup I actually understand from the ground up.

The Architecture

I run a three-server setup. Yes, this is probably overkill for personal projects. No, I don't regret it.

Text

┌─────────────────────────────────────────────────────────────┐
│                     Tailscale Network                        │
│         (Private mesh - everything talks securely)           │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
   ┌────▼─────┐         ┌─────▼────┐         ┌─────▼────┐
   │Analytics │         │    DB    │         │   App    │
   │  Server  │         │  Server  │         │  Server  │
   ├──────────┤         ├──────────┤         ├──────────┤
   │ Grafana  │         │ Postgres │         │  Nomad   │
   │Prometheus│◄────────┤  Redis   │◄────────┤  Client  │
   │   Loki   │         │Pgbouncer │         │ Traefik  │
   │  Umami   │         └──────────┘         │  Apps    │
   │ Uptime   │                              └──────────┘
   │  Kuma    │                                    │
   │  Nomad   │                                    │
   │  Server  ├────────────────────────────────────┘
   └──────────┘

Analytics Server - This is my "control tower." Grafana shows me what's happening, Prometheus and Loki collect metrics and logs, Uptime Kuma yells at me when things break. I also run the Nomad server here to orchestrate everything.

DB Server - Postgres for data, Redis for caching, Pgbouncer for connection pooling. Separated so my databases and apps aren't fighting for CPU.

App Server(s) - Where the actual applications live. Currently just one, but the whole point of this setup is I can spin up more whenever I need to.

Everything runs inside a Tailscale network, which means most ports are locked down to the outside world. Only the servers can talk to each other, plus my personal laptop when I need to check dashboards.

Why I moved away from Coolify

Coolify was great until I realized every new app required the same manual dance: navigate through UI flows, copy environment variables, paste configurations, cross fingers. It wasn't bad, but it was repetitive and error-prone.

More importantly: the configuration lived nowhere. If that server died, I'd have to recreate everything from memory. Backups existed but were unreliable, and I found myself avoiding them by just... manually recreating things.

That's not sustainable when you're planning to run 20+ apps.

Ansible changed everything. Now my entire infrastructure lives in code:

YAML

# This is all it takes to spin up a new app server
- name: Configure app server
  hosts: app_servers
  roles:
    - base_hardening
    - docker_setup
    - tailscale_connect
    - nomad_client
    - monitoring_agent

I've tested this repeatedly - I can flash a completely new server with my entire setup in under 10 minutes. That's not theoretical, I've actually done it multiple times while testing.

The Ansible workflow

I deploy in three phases:

1. Server Foundations (All servers)

OS hardening and security setup
Auto-updates configuration
Tailscale network connection
Docker installation
Monitoring agents (node exporters, log shippers)

Every server gets this base layer. No exceptions.

2. Core Services (Analytics + DB)

These are the "special" servers that need to be up before anything else works.

Analytics setup:

YAML

- grafana (metrics visualization)
- prometheus (metrics collection)
- loki (log aggregation)
- umami (web analytics)
- uptime_kuma (uptime monitoring)
- nomad_server (orchestration controller)

Database setup:

YAML

- postgresql (primary database)
- redis (caching layer)
- pgbouncer (connection pooling)

3. App Servers (The workers)

These connect to the Nomad server as clients. They're completely stateless - they just run whatever containers Nomad tells them to run. Each one has Traefik running to handle routing.

The monitoring setup

I'll be honest: Grafana + Prometheus + Loki is probably overkill for my scale. But if I'm going to monitor things, I might as well do it properly.

Text

┌──────────────────────────────────────────────────────┐
│                   Grafana Dashboard                   │
├──────────────────────────────────────────────────────┤
│  CPU Usage    │  Memory    │  Disk I/O   │  Network │
│  [████░░] 80% │ [███░░] 60%│ [██░░░] 40% │ 2.4 MB/s │
├──────────────────────────────────────────────────────┤
│  Active Apps: 10  │  Uptime: 99.8%  │  Alerts: 0   │
└──────────────────────────────────────────────────────┘

Each server runs exporters that feed data into Prometheus. Loki collects all the logs. Grafana ties it together into dashboards I can actually understand.

Uptime Kuma was an incident-followup addition. It pings all my apps every minute and sends alerts to my personal Discord server when something goes down. Super simple to set up (literally just added it to my Ansible playbook as another Docker container), and it's already caught issues before users noticed them.

Why Postgres + Pgbouncer?

Simple: Postgres is the GOAT. It handles relational data, JSON documents, full-text search, time series, geospatial queries... it just does everything well.

But here's the thing about Postgres: it doesn't love having hundreds of connections open. Each connection is a process, and processes are expensive.

That's where Pgbouncer comes in. It sits between my apps and Postgres, managing a pool of connections. Apps think they have dedicated connections, but Pgbouncer is actually queueing and multiplexing everything through a much smaller pool.

Text

┌─────────┐  ┌─────────┐  ┌─────────┐
│ App 1   │  │ App 2   │  │ App 3   │
│ (50 req)│  │ (30 req)│  │ (40 req)│
└────┬────┘  └────┬────┘  └────┬────┘
     │            │            │
     └────────────┼────────────┘
                  ▼
          ┌───────────────┐
          │   Pgbouncer   │
          │ (pools to 20) │
          └───────┬───────┘
                  ▼
          ┌───────────────┐
          │   Postgres    │
          │ (20 conns max)│
          └───────────────┘

Do I need this for my current scale? No. Will I need it when I'm running 50+ apps? Absolutely.

Why Nomad over Kubernetes?

I researched this heavily. Here's what I concluded:

Kubernetes - Can do everything, requires understanding everything. Every article I read said "consider all other options first."

Docker Swarm / Kamal - Great for simple setups. Single app + database? Perfect. Multiple apps sharing infrastructure? Gets messy fast.

Coolify / Komodo - Excellent PaaS solutions. But I wanted infrastructure-as-code, not infrastructure-as-clicking.

Nomad - The Goldilocks option. Simpler than k8s, more powerful than Docker Compose. It handles service discovery, orchestration, health checks, blue-green deploys, and resource management without making me learn an entirely new paradigm.

I got a working Nomad setup in a few hours. That felt like a good sign.

Deploying an app with Nomad

Here's what a typical app deployment looks like:

HCL

job "my-app" {
  datacenters = ["dc1"]
  
  group "app" {
    count = 1
    
    network {
      port "http" {
        to = 3000
      }
    }
    
    task "web" {
      driver = "docker"
      
      config {
        image = "myapp:latest"
        ports = ["http"]
      }
      
      resources {
        cpu    = 500
        memory = 512
      }
      
      service {
        name = "my-app"
        port = "http"
        
        check {
          type     = "http"
          path     = "/health"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

I push this to Nomad, and it handles the rest: finds available resources, spins up the container, monitors health, and integrates with Traefik for routing.

What I learned

Infrastructure-as-code is non-negotiable. Once you have more than 3-4 apps, clicking through UIs becomes unsustainable. Code lets you version, test, and replicate your entire setup.

Monitoring is worth the effort. The peace of mind from seeing green dashboards and getting alerts before things break is invaluable.

Self-hosting saves money. I did the math while building a price comparison tool - self-hosting can save you 10-100x over cloud providers for the same resources.

Start simple, add complexity as needed. I didn't build this all at once. I started with Coolify, moved to Ansible + Docker Compose, then added Nomad. Each step taught me what I actually needed.

The trade-offs

Time investment: This took weeks to set up properly. Cloud hosting would have been faster.

Maintenance: I'm responsible when things break. No support tickets, just me and the logs.

Complexity: I now maintain 3 servers, a Tailscale network, a Nomad cluster, and a monitoring stack. That's a lot of moving parts.

But here's the thing: I understand every piece of this system now. I know exactly how my apps are deployed, where my data lives, and what happens when things fail. That knowledge is worth the complexity.

FAQ

Why not just use Vercel/Railway/Render?
Cloud platforms are great! But expensive at scale, and you're locked into their ecosystem. I wanted to learn the fundamentals and have full control.

Is this actually cheaper?
For my use case? Massively. I'm running 10+ apps on three servers for about $30/month. That would cost $200-500/month on managed platforms.

Would you recommend this for beginners?
No. Start with Coolify or similar PaaS. Build this level of infrastructure when you actually need it.

How do you handle SSL certificates?
Traefik handles it automatically with Let's Encrypt. One of the many nice things about this setup.

What happens if the analytics server goes down?
Apps keep running, but I lose monitoring and can't deploy new apps. It's a single point of failure I'm okay with for personal projects.

This setup has been running smoothly for months now, and I'm genuinely excited about what I can build on top of it. The best infrastructure is the one you understand completely.