Building a Worker Pool in Go

Building a Worker Pool in Go

The naive way to run N tasks concurrently in Go is to launch N goroutines:

for _, url := range urls {
    go fetch(url)
}

That works fine for 10 URLs. It starts to hurt at 10,000. Every goroutine consumes memory (starting around 8KB for the stack, growing as needed), and if each goroutine is making a network request, you can saturate your file descriptor limit or overwhelm the downstream service before the first result comes back.

A worker pool fixes this. Instead of one goroutine per task, you create a fixed pool of workers and feed them work through a channel. The channel acts as a queue, and backpressure comes for free: if all workers are busy, the producer blocks until one is available.

The Shape of the Problem

The pattern fits any workload where:

  • You have more tasks than you want to run simultaneously
  • Each task is independent (no shared mutable state between tasks)
  • You want all results before moving on

In this project, the tasks are HTTP fetches. Each job carries an ID and a URL. Each result carries the ID and either an HTTP status or an error.

type Job struct {
    ID  int
    URL string
}

type Result struct {
    ID    int
    Value string
    Err   error
}

The ID field threads through so results can be matched to their jobs even when they complete out of order.

The Pipeline

The full implementation has three stages connected by channels:

producer → jobs channel → workers → results channel → collector

All stages run concurrently. The channels between them regulate flow.

The Channels

jobs    := make(chan task.Job, NUM_JOBS)
results := make(chan task.Result, NUM_JOBS)

Both channels are buffered. The buffer size here equals the total number of jobs, which means the producer can enqueue everything without blocking. For a large or infinite job stream you would use a smaller buffer, and the producer would block naturally when the pool is saturated.

The Workers

func worker(ctx context.Context, id int, jobs <-chan task.Job, results chan<- task.Result, wg *sync.WaitGroup) {
    defer wg.Done()
    for job := range jobs {
        results <- task.Run(ctx, &job)
    }
}

Each worker ranges over the jobs channel. When the channel is closed and drained, the range loop exits and the worker calls wg.Done(). No explicit stop signal needed. Channel close is the shutdown mechanism.

Starting the pool looks like this:

var wg sync.WaitGroup
for i := range NUM_WORKERS {
    wg.Add(1)
    go worker(ctx, i, jobs, results, &wg)
}

The Producer

go func() {
    for i, url := range urls {
        jobs <- task.Job{ID: i, URL: url}
    }
    close(jobs)
}()

The producer runs in its own goroutine so it doesn’t block the workers from starting. It closes the jobs channel when all jobs are enqueued, which signals the workers to stop once they drain the remaining work.

The Collector

Results need to be consumed as workers produce them. If the results channel fills up and workers can’t send, they block, which stalls the entire pool. A separate goroutine handles collection:

done := make(chan struct{})
go func() {
    for result := range results {
        if result.Err != nil {
            fmt.Printf("job %d failed: %v\n", result.ID, result.Err)
        } else {
            fmt.Printf("job %d done: %s\n", result.ID, result.Value)
        }
    }
    close(done)
}()

The done channel is a zero-allocation signal. When the results channel closes, the range exits and done gets closed.

Shutdown

The shutdown sequence in main coordinates all of this:

wg.Wait()          // Wait for all workers to finish
close(results)     // Signal collector there are no more results
<-done             // Wait for collector to drain and exit

Order matters here. Closing results before wg.Wait() would be a bug: a worker that’s still running could try to send on a closed channel and panic. Waiting for wg.Wait() first guarantees no worker will ever write to results again before it’s closed.

Context and Cancellation

Each task receives the context passed to Run:

func Run(ctx context.Context, job *Job) Result {
    req, err := http.NewRequestWithContext(ctx, http.MethodGet, job.URL, nil)
    if err != nil {
        return Result{ID: job.ID, Err: fmt.Errorf("request error: %w", err)}
    }

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return Result{ID: job.ID, Err: fmt.Errorf("fetch failed: %w", err)}
    }
    defer resp.Body.Close()

    return Result{ID: job.ID, Value: resp.Status}
}

Using http.NewRequestWithContext means that if the context is cancelled (say, via a timeout or an explicit cancel call), in-flight requests abort cleanly. Workers surface the cancellation error as a Result rather than silently dropping it.

Why This Structure Works

Backpressure is automatic. If workers are slow, the jobs channel fills up, and the producer blocks. You never accumulate more in-flight work than the channel can hold.

Shutdown is deterministic. Channel close propagates through the pipeline in order. Closing jobs stops workers. Workers finishing lets you close results. Closing results stops the collector. There are no goroutine leaks.

The stages are decoupled. The producer doesn’t know how many workers there are. Workers don’t know about the collector. You can change NUM_WORKERS without touching anything else.

Testing is straightforward. task.Run takes a context and a job pointer and returns a result. There’s nothing to mock. You can test it directly:

func TestRun(t *testing.T) {
    tests := []struct {
        name    string
        job     task.Job
        wantErr bool
    }{
        {"valid URL", task.Job{ID: 1, URL: "https://google.com"}, false},
        {"invalid URL", task.Job{ID: 2, URL: "not-a-url"}, true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
            defer cancel()

            result := task.Run(ctx, &tt.job)
            if tt.wantErr {
                assert.Error(t, result.Err)
            } else {
                assert.NoError(t, result.Err)
                assert.Equal(t, "200 OK", result.Value)
            }
        })
    }
}

When to Use This Pattern

A worker pool adds real complexity: channels, goroutines, WaitGroups, and a specific shutdown sequence. It pays off when:

  • You have many tasks and can’t run them all at once (resource limits, rate limits, downstream capacity)
  • Tasks are IO-bound, so goroutines spend most of their time waiting rather than computing
  • You need all results before proceeding

It’s overkill when:

  • You have a small, bounded number of tasks (just spawn the goroutines directly)
  • Tasks are CPU-bound and you’re already at GOMAXPROCS concurrency
  • You don’t need results at all (fire-and-forget has simpler patterns)

When tasks need to cross service boundaries, the same pattern applies at a larger scale. Amazon SQS plays the role of the jobs channel: a durable queue that independent workers pull from, with queue depth standing in for channel buffer length as the backpressure signal. Stripe uses this pattern for webhook delivery: the queue absorbs spikes, workers drain it at a controlled rate, and a slow or down merchant endpoint causes queue depth to grow rather than the caller blocking.

The full source for this project is at github.com/ikristina/go-worker-pool.

What’s Next

This is the foundation. Adding Full-Stack Observability to a Go Worker Pool picks up from here, adding the full LGTM stack: Prometheus metrics for queue depth and job latency, structured logs shipped to Loki, distributed traces in Tempo, and continuous flame graphs via Pyroscope.

What’s still on the list: a proper Pool struct with explicit Shutdown and ShutdownNow modes, retry with exponential backoff and jitter, and a benchmark suite across different worker counts.

If you need jobs to survive process restarts, River is worth a look. It’s a Go job queue backed by PostgreSQL that ships with persistence, retry, scheduling, and a proper pool abstraction - everything this implementation would grow into if you kept adding features. Sidekiq (Ruby) and Celery (Python) solve the same problem in their respective ecosystems.

Knowledge Check
What is the primary danger of spawning an unbounded number of goroutines for concurrent network tasks?
The Go runtime heavily penalizes programs with more than 1,000 goroutines.
Each goroutine actively consumes CPU even when waiting for network responses.
You can easily exhaust memory or hit your OS's file descriptor limit before any responses come back.
Unbounded goroutines cannot safely return errors to the main thread.
Correct! 🎉 Even though goroutines are lightweight, spawning 100,000 of them to make HTTP requests will almost immediately crash your app due to file descriptor limits or out-of-memory errors.
Not quite. The correct answer is C. The OS puts hard caps on how many open sockets (file descriptors) a process can hold. Unbounded concurrency will hit this limit almost instantly under load.
How is backpressure achieved automatically in a Go worker pool using channels?
The worker pool polls a special runtime.GetBackpressure() function before accepting new work.
The producer blocks when trying to send on a full buffered channel, naturally preventing it from overwhelming the system.
The Go garbage collector automatically slows down the producer when it detects high allocation rates.
The workers send a "stop" signal back to the producer when they are busy.
Correct! 🎉 This is the beauty of Go channels. When the channel buffer is full, the sender physically cannot proceed until a worker pulls a job off the queue. Backpressure comes for free.
Not quite. The correct answer is B. Sending to a full channel blocks the sender. This blocking mechanism acts as automatic backpressure without any extra logic required.
In the shutdown sequence, why is it critical to call wg.Wait() before closing the results channel?
Closing the results channel automatically terminates all running goroutines without letting them finish.
Calling wg.Wait() ensures all results are instantly printed to the console.
If a worker is still running and tries to send a result to a closed channel, the program will panic.
WaitGroups must always be reset to zero before any channel can be safely closed.
Correct! 🎉 Sending on a closed channel is a guaranteed panic in Go. Waiting for the workers to finish guarantees that no one will attempt to write to the results channel ever again.
Not quite. The correct answer is C. Sending on a closed channel causes a panic. You must ensure all writers are completely finished (using `wg.Wait()`) before closing the channel they write to.
In this pattern, what is the exact mechanism used to signal the workers to stop?
Sending a boolean true over a dedicated stop channel.
Canceling a context passed directly to the worker function.
Closing the jobs channel that the workers are ranging over.
Using a global atomic flag that the workers check in a loop.
Correct! 🎉 When you `close(jobs)`, the workers' `for job := range jobs` loop automatically terminates once the channel is drained. No explicit stop signals or atomic flags needed!
Not quite. The correct answer is C. A `for ... range` loop on a channel automatically exits when the channel is closed and empty. Closing the jobs channel is the cleanest way to shut down workers.
In which of these scenarios would a worker pool be completely unnecessary overkill?
You have millions of tasks and need to bound memory usage.
You are hitting rate limits on a downstream API and need to control concurrency.
The tasks involve massive file uploads and memory must be managed.
You have a small, known number of independent fire-and-forget background tasks.
Correct! 🎉 If you only have a few tasks and you don't even need the results, just `go func()` them! A worker pool introduces channels and synchronization overhead that isn't worth it for trivial workloads.
Not quite. The correct answer is D. A worker pool adds real complexity. If the task count is small and you don't care about collecting results, simply spawning a few detached goroutines is significantly easier.

Quiz Complete!

You scored 0 out of 5.

Comments

© 2025 Threads of Thought. Built with Astro.