Project: Job Queue

After not finding a developer-friendly NodeJS job queue that fit my team's needs, I developed a custom one — simple at first, but it eventually grew to process ~200,000 jobs per day with multiple queues, automatic retries, and demand-based worker autoscaling by way of Horizontal Pod Autoscaling with a custom metric representing the current queue depth.

The basis of the job queue relies on PostgreSQL's FOR UPDATE SKIP LOCKED functionality. This handles the complexity of running jobs one time at most. A massive benefit of running in PostgreSQL is that developers can query job history, extract historical data, easily enqueue batch jobs by inserting rows directly, and more.

Along with the queue itself, I developed a code generation tool to create all the boilerplate code needed for a new worker type. The workers themselves were implemented as a string-based map of functions; any data returned from the function was persisted to the queue table. Any thrown errors marked the job failed, with optional automated retries.

Takeaways

This approach has drawbacks and a bit of a scale ceiling due to database table size and concurrent access, but it worked smoothly for many years and was very interesting to maintain and improve during that time. I would likely not take the same approach in the year 2025!

Recently, I've really enjoyed using Inngest to manage scheduling, concurrency, retries, etc.