Feb 3, 2025

Why We Built Our Own Support Tooling

Live preview at https://railway-blog-dev.raychen.io/p/why-we-built-our-own-support-tooling

In order to deliver the support experience we wanted at scale, we had to create leverage. So, we built Central Station, our Support & Community platform, entirely from scratch. Here’s why.

Foreword

Cloud providers typically restrict support access through tiered pricing models, reserving direct technical assistance for their highest-revenue customers or packaging it as a premium add-on service with prohibitive costs.

The customers who purchase these support tiers often have to end up navigating multiple escalation layers and handoffs before reaching engineers with the technical expertise to resolve their specific issues.

Railway takes a fundamentally different approach.

When you reach out to Railway, your conversation gets routed to someone that can solve your issue directly, whether that’s our product engineers, the creator of Railpack, the platform engineer building our edge network, or our founder/CEO.

The Journey So Far

Before 2024, we primarily provided support over email and Discord. Any Railway user regardless of spend or plan can receive support from us directly. This stopped working once we surpassed 100 conversations a day because:

We were answering the same questions repeatedly, making us frustrated
We became slower in responding to users, leading to longer ticket resolution times across the board
We lacked tractable escalation or hand-off processes, leading to longer turnaround times on bug reports or feedback on product improvement areas
We were context-sheared badly — multiple reports of the same issue across different channels led to context-loss and consequently lack of follow up because we had no easy way to aggregate them into a single view
We rarely groomed our feature requests board on Canny, because there’s no forcing function to do so when we’re spending all our time in Discord and emails
We were prone to dropping user feedback, because the volume forced us to focus on speed instead of quality

This led us to make some difficult but necessary decisions to improve our support workflow.

We think walled gardens are bad, because Discord and emails are closed loops - email conversations are private, and Discord conversations don’t make it out of Discord.

Scaling Support

We're not trying to build a 100 person support army. We're building systems where 5 exceptional people can support 50,000 customers better than most companies' 50-person teams. This means being ruthless about automation, obsessive about root causes, and creative about self-service.

Our requirements for what we wanted to offer to our users were simple:

Users should feel support is a natural extension of the product
Users should receive fast responses and fast resolution when they have to reach out
Users should be able to search for answers and solutions easily

Internally, we wanted a “single pane of glass” where all conversations we have with users, regardless of platform, are visible and interact-able to us in a single place.

When we evaluated existing solutions, nothing quite fit. Some tools offered fragments of what we needed (like public forums), others had the right primitives (SLOs, metrics, etc.), but none matched our specific requirements.

We had two options: cobble together multiple tools and maintain integrations that will become brittle over time, or pay enterprise vendors to customize their platforms for us. The first path meant integration hell, the second meant becoming vendor managers instead of engineers.

Neither option aligned with our DNA. We're builders, not managers. So we chose to build the leverage we needed rather than negotiate our way to it.

That’s why we built Central Station.

todo sections:

It’s a public forum

Something about community herez`
Something about bounties

It’s a private ticketing platform

Something about community herez`
Something about bounties

It contains a trove of admin tooling for us

Something about community herez`
Something about bounties

Where Are We At Today?

[metrics, metrics, metrics]
We recently crossed ~1.6M total users, and 40k threads on Central Station
MTTFR remain an average of 10 hours across all ticket classes, including tricky technical support ones.

Building Railway on Railway

[ray: copy is ok, but needs to be weaved into the right section]

Central Station is fully deployed on Railway. This gave us a golden opportunity to dogfood the product and platform that we’re building. Some Railway features and bug fixes were prioritized as a direct need for the Central Station project (such as fixing reliability issues with Cron Jobs, and adding Pre-Deploy Commands).

Because we fully own it (the data, domain knowledge, and responsibility of ownership), it becomes a forcing function for us to improve at supporting our userbase as they scale, and to improve the product as we scale.

The Railway Way

[ray: copy is ok, but needs to be weaved into the right section though it feels reductive since it’s already in intro. Probably want an alternative conclusion]

Most platforms treat support as a cost center to be minimized. We see it differently. Support is how we learn what to build next, how we ensure customer success, and how we build trust that Railway will be here for the next century.

By building Central Station ourselves, we're not constrained by what vendor tools allow. We can create exactly the support experience our users deserve - one that scales with them, not against them.

Because at the end of the day, the best infrastructure in the world means nothing if you can't get help when you need it.

Psst! Like what you see? We’re hiring!

Scratchpad

(random notes and ramblings in no particular order/flow)

Example of ticketing:

This lets us share and index knowledge without having to groom a knowledge-base manually, and it gives us the ability to loop in community members to answer support tickets (which in turn lets us scale the community organically through our Conductor Program).

We needed a single pane of glass that lets us interact with every communication channel in a single platform.

Quick resolution requires helpful context

Contacting us through email led to longer resolution time because in 90% of the cases, the correspondents do not provide enough information for us to be able to resolve their issue in our first response. Most of our initial email conversations looked like:

User: “Hey I’m having trouble with X”
Us: “Can you share a link to your deployment, and what errors you’re seeing?”

There is no way to front-run context with emails; the nature of an email conversation leads to many back-and-forth before we can identify and resolve an issue. If we had that information upfront, we can very likely solve the issue in our first response to the user.

Context also cuts both ways. The lifecycle of a support request can be a messy one where other teams need to get involved, so we need tractable handling of user requests and bug reports — a way where where every report on an issue can be aggregated into a single view easily.

“What can we do now that we have this?” / What does this unlock?
Community - “we’ve paid X in template kickbacks and we aim to match that with community contributions”

It’s the number one medium for “XY Problems” where users reach out to us with their solution and ask why isn’t it working when we don’t know (and have to subsequently dig into) what they’re attempting to solve.

When you fire off a support email, you tend to forget about it until you run into the issue again or when we respond.

This is bad for our users and us because it ultimately leads to longer resolution time.

Following the above, we settled on a list of must-have requirements:

Faster resolution times for all users, regardless of spend or plan.
Ability to grab as much context upfront as we can from when a user reaches out, instead of having to follow up for it
Faster workflow for us. It shouldn’t take us more than a few minutes to answer a single ticket, and if we answered the same questions again, users should be able to re-use the previous answer
Tractable handling of user requests and bug reports. Sometimes we’ll have to push things to other teams
Better management of feature requests (roadmap view)
The ability to loop in community members to answer support tickets, and to scale the community organically through our Conductor Program
A public knowledgebase that we didn’t have to groom by hand
First-class, native support experience. Instead of having our users go through a different platform,

Beyond all that, we needed a single pane of glass where we can interact with every communication channel in a single platform. Tapping out to the many different platforms meant grabbing precious context-switching cycles for us. Why go to a separate feature board to log a feature request or subscribe the user to it if you’ve received it over email? They should all be in the same platform.

When we looked at market offerings, we were unimpressed. Some of them fits a small piece of what we needed (e.g. a public forum), others had the primitives we wanted (e.g. SLOs, metrics), but none of them fits into the odd shape that we needed for scaling support.

This was a process/workflow issue as much as a tooling issue. We want to drive meaningful resolutions. A meaningful resolution to us looks like:

Fast response
Fast resolution (escalations)
Posterity

Scratchpad (v0)