Back to Build
Building
ai
agents
architecture

If I Were Starting Today: The Stack I Wish I'd Started With

Dr. Ben Soffer, DOMay 23, 202615 min read

Research and drafting assistance from Claude (Anthropic). All clinical, technical, and strategic decisions are mine.

If I Were Starting Today: The Stack I Wish I'd Started With

Quick context before the rest, since this is the first post in the series where the writing is forward-looking rather than reconstructive.

I'm Dr. Ben Soffer. I'm a board-certified internist running a concierge primary care practice in Boca Raton, Florida. I have no computer science background. I took one HTML class in college and built a few personal websites and a Square store in the years after. That was the full extent of my technical training before any of this.

Claude changed what was possible. Over the past six months I've built and now operate three separate telehealth practices on a roughly parallel stack: drbensoffer.com (concierge primary care, the practice this blog lives under), discreetketamine.com (ketamine-assisted psychiatric care), and tovanihealth.com (a second ketamine-assisted practice with a more polished medical-facing presentation and a broader access model). Each has its own brand, its own clinical workflows, its own patient population, and its own marketing positioning. None of this would have been buildable by me a year ago. Six months in, it's three live practices and a series of posts about what I learned along the way.

I write that not as a flex. I write it because the lessons in this post are filtered through someone who is not an engineer by training. If you are a clinician thinking about building something on top of modern tools, the right starting question is not "can I learn this." The right question is "what should I do differently than the engineer-author of the reference architecture would do, given that I'm running a regulated medical practice instead of a startup." That's most of what this post is about.

The first two posts in this series were about things going wrong: a database wipe in early April, and the month-long agent-driven cascade that led up to it. This post is about what I'd build instead. Same practice, same patient base, same set of compliance and clinical requirements, but the version where I'd known on day one what I learned by month six.

I want to be honest about the scope of the rewrite I'd do, because it isn't a tear-down. Most of the stack is fine. The original choices that bothered me most in retrospect turn out to be a small number of vendor swaps and a handful of architectural decisions that I should have committed to on day one rather than discovering after a near-miss. I'm going to walk through what I'd keep, what I'd swap, and what I'd defer for later. The point isn't to relitigate six months of decisions. The point is that if you're a solo doctor looking at the blank repo today, there are maybe six things to get right early that compound for a long time.

What I'd keep

Some choices have aged well, and I'd make them the same way today.

Next.js was the right framework. The combination of file-based routing, server components for the marketing surface, server actions for form mutations, and a single deployment artifact for both the public site and the patient portal turned out to be exactly the shape I needed. I get audited as a HIPAA-covered entity. Having the patient-facing portal and the marketing site share an auth surface, a database, and a deploy pipeline simplifies the audit story in ways that a microservices split would not. I'd take Next.js again.

Prisma was the right ORM. Strict typing of database rows, generated client, migration files committed to the repo. The reason post #2 exists, in part, is that the agent was bypassing the migration workflow and going around Prisma to mutate the schema directly. That isn't a Prisma problem. That's a process problem. Used as designed, Prisma is the cleanest type-safe ORM I've used in any language. I'd take it again, and on day one I'd lock down which database role is allowed to run schema-altering commands so the bypass isn't physically possible.

Postgres was the right database. I'm sticking with it. There's nothing about a primary care practice that needs a non-relational store. Aurora Serverless v2 is what I'd run it on now. I started on Neon, which I'll get to below.

Stripe was the right payments system. Two separate accounts (one for the practice, one for the consulting income) connected to the same application through a dual-webhook pattern. Subscription billing, invoice retries, and webhook signature verification are all well-handled out of the box. I never want to write payment-processing code from scratch and Stripe is the closest thing to a default in this space. I'd take it again on day one.

A monorepo was the right structure. One repository, one deploy, one set of shared types between the public site, the patient portal, the admin tools, and the cron Lambdas. I see solo doctors split this into three or four repos because that's what their reference architectures show, and then they spend the next year fighting cross-repo type drift. Don't. A single repository at the scale of one to two engineers is the correct shape.

Those five decisions are most of the stack and I'd repeat them.

What I'd swap

The vendor choices I made in the first six months of the practice came from a particular mental model: pick whatever the fastest path to a working integration was, ship it, move on. That model is correct for a startup that has not yet hit production. It is wrong for a healthcare practice that will accept its first patient three weeks after the integration ships. The choices that bit me later were the ones where "fast path" diverged from "production-grade default."

Neon → Aurora. I started on Neon because it had a great onboarding experience and a generous free tier. Neon's defaults were the wrong ones for a regulated workload: a one-day point-in-time recovery window, a backup strategy that lived in their account rather than mine, and a connection pooler that didn't behave correctly under the load patterns my application generated. On day one of a healthcare practice, you want Aurora Serverless v2 with PITR set to thirty-five days at cluster creation, a backup retention policy specified in the same Terraform module that creates the cluster, and a cross-account snapshot copy job running before you sign your first patient. Aurora costs more. The cost difference is irrelevant compared to the recovery story you can tell after the fact when something goes wrong.

Twilio → AWS End User Messaging. Twilio was the obvious choice for SMS in 2024 and is still the obvious choice if you don't have a TFV (toll-free verification) approved and need to ship in days. By 2026, with TFV approved, AWS End User Messaging is the better choice for a HIPAA-covered entity already operating in AWS. The SMS-sending IAM role can be scoped exactly the same way as every other AWS service the application talks to. Costs are lower per message at any meaningful volume. And the operational story (delivery receipts, opt-out handling, sender-ID provisioning) is integrated with the rest of the AWS surface rather than living in a separate vendor dashboard you have to remember to check. I'd start on AWS End User Messaging from day one if I had the TFV in hand at the time, and I'd start the TFV application on day one regardless.

Resend → SES. I used Resend for transactional email in the early months because Resend has the best developer experience in the email category. The problem is that Resend is not the right vendor for clinical-adjacent email at scale. The signed BAA exists and is fine for ops alerts, but the deliverability story for patient-facing transactional mail (appointment confirmations, prescription notifications, billing) is better on SES once you've earned a warmed-up sending domain. SES is also cheaper, integrates with the same IAM identity as everything else, and supports the inbound-routing pattern I use for the patient-reply pipeline. My current setup uses SES as primary with Resend as a fallback for ops-only mail, which is the right inversion of what I started with. On day one I'd be on SES.

Doxy.me → Whereby Embedded. Doxy.me was the default video-visit choice for telehealth in 2024 because it had been compliant and frictionless during the pandemic ramp. By 2026 it's a separate surface the patient has to learn, a separate URL they have to remember, a separate scheduling story to maintain. Whereby Embedded puts the video session inside my own patient portal, on the same domain, with the auth context already established. HIPAA defaults are correct out of the box. Patients no longer click into a different brand to talk to me.

Password/2FA → magic-link auth. The original DBS auth flow had passwords with an optional 2FA layer that I built up over the first six months. The right answer is to skip passwords entirely. Magic-link authentication (one-time signed link, short expiry, single-use) is correct for a patient population that logs in once a month at most. The threat model is dominated by phishing and credential stuffing, both of which are mitigated more cleanly by removing the password from the surface area than by adding a second factor on top of it. I'd start magic-link on day one and never add the password.

The architecture I'd put in place before the first patient

A specific list. These are the things that take an afternoon to set up before any clinical workflow exists, and that would have prevented most of what went wrong over the last six months.

Role-only AWS access from the application. No hardcoded access keys in the code, ever. The application gets an IAM role attached to its compute (Amplify SSR Lambda compute role, in my case), and the SDKs pick up credentials automatically. The fallback ladder of process.env.AWS_ACCESS_KEY_ID || hardcoded is the pattern I'd refuse to ever write again. It looks like a small convenience and it's the source of three credential-leak incidents I've had to clean up since launch.

PITR set to thirty-five days at cluster creation. Not at the first scheduled review, not after the first patient, not after the first scary moment. At cluster creation, in the same Terraform module that brings up the database. Thirty-five is the maximum AWS allows for Aurora. It is also the right number. Reducing it later is reversible; missing the data because you didn't enable it is not.

Cross-account backup wired before patient one. A separate AWS account whose only job is to hold immutable copies of the production database snapshots, written via a Lambda that runs in the production account and reads through a role assumption in the backup account. The point is not redundancy in the abstract. The point is that an attacker (or an agent) with full administrative access to the production account cannot delete the backups. If your backups live in the same account as your application, an over-permissioned agent can wipe both in the same session. I have learned this the hard way and it should be the first item on any healthcare practice's infrastructure checklist.

Append-only event log. Every meaningful state change in the application (patient created, appointment booked, payment captured, prescription written) writes a row to an audit table that is INSERT-only by IAM policy. The application's database role cannot UPDATE or DELETE from this table. The audit log becomes the disaster-recovery story for cases where the primary tables get corrupted; you can replay the events forward to reconstruct state. It also becomes the HIPAA audit story for cases where you need to demonstrate who did what when. Schema is simple: timestamp, actor, action, target_id, payload as JSONB. Cost is rounding error.

Hardcoded fallback pattern for outbound services. Every external service the application talks to (SES, AWS End User Messaging, Stripe webhooks, DrChrono API) has a fallback path that is configured statically in the code rather than dynamically from a database row or admin setting. If the configuration database is unavailable, or if the agent has misconfigured the active path, the fallback still works. The pattern is unglamorous and has saved me from extended outages multiple times. I'd start it on day one.

That's the day-one architecture. None of it is novel. All of it is what any team with an on-call rotation would build by default. The reason solo doctors miss it is that "team with on-call rotation" isn't the mental model they bring to their own work. The agent forced me to upgrade my model. You can upgrade yours before the forcing function arrives.

What I would NOT do on day one

A symmetric list. These are things I built up early because they felt sophisticated, and that I'd defer or skip entirely if I were starting over.

Multi-touch attribution. Before you have a hundred patients, you don't need to know which of three marketing channels contributed to each acquisition. You need to know whether the practice exists and whether anyone is finding it. Last-click attribution is enough. A single referrerUrl column on the eligibility submission and a single landingPage column on the user record will tell you almost everything you can act on. I built a full UTM-tagged multi-touch model in month four. I have never made a different decision because of it.

Cohort analytics. Same logic. You don't need to know retention curves segmented by acquisition month for a practice of fifty patients. You need to know whether patients are showing up to their second visit. A flat dashboard with a single "patients seen this month" number is enough for the first year.

Multi-agent coordination protocols. I have a multi-agent coordination protocol running across my three practice domains (this practice, the ketamine practice, and the wellness practice). It is genuinely useful and I am writing the post about it as part of this series. It would have been wildly premature on day one. The right number of agents for a solo doctor in week one is zero, in month one is one (the Claude Code CLI with you closing the loop), and the question of whether to introduce a second agent should not come up until the first agent is producing more value than it consumes.

Aggressive integrations before they're needed. I have integrations to Google Search Console, Google Analytics 4, Microsoft Clarity, PostHog, Sentry, Stripe webhooks, AWS CloudWatch, EventBridge, DrChrono, DrFirst, AWS End User Messaging, SES, Whereby, and a few others. Each of them is useful now. Most of them I added in the first month because I'd seen another developer use them. At least four of them sat unconfigured for the first six months because there was no signal coming through them. Add integrations when you have a question they would answer. Otherwise the integration is just a cron job you forgot about that breaks silently when its API changes.

The phased build order

Time-ordered, rather than feature-ordered.

Phase 0 (weeks zero through two): Single physician (you), single state of licensure, single payment method, single appointment type, single auth flow (magic link), Aurora with PITR enabled at thirty-five days, cross-account backup wired, role-only AWS access, append-only event log, the four hardcoded outbound fallbacks. The application boots, you can sign in, you can book an appointment with yourself as a test, the appointment writes to the event log, the test payment captures and writes a Stripe webhook to the event log, an email goes out via SES, you receive it. That's Phase 0. Patient zero is allowed to enter after Phase 0 is complete.

Phase 1 (weeks two through twelve): Returning-patient recognition (a patient who exists doesn't go through new-patient onboarding again), document storage on S3 with signed-URL access, the patient portal with chart access, DrChrono wired as the system-of-record for clinical encounters, basic SMS reminders through AWS End User Messaging, the marketing site with one or two SEO pages targeted at your specific city, last-click attribution captured.

Phase 2 (months three through nine): Second payment method (HSA/FSA), additional state licensure, video visits through Whereby Embedded, prescription writing via DrChrono and your e-prescribing vendor, billing-complexity handling, automated cron-driven workflows (appointment reminders, refill reminders, post-visit follow-ups), real preventive-care tracking infrastructure for chronic-disease patients.

Phase 3 (month nine onward): Agent tools for non-critical workflows (marketing copy generation, internal admin utilities, content drafting), multi-touch attribution if and only if the volume justifies it, cohort analytics if and only if you have a specific retention question that the simpler dashboard can't answer, additional brand surfaces or specialty lines under the same legal entity.

Each phase is defined by what's allowed in it, not by what's possible in it. The agent in phase three is allowed to write internal admin utilities and marketing copy. It is not allowed to write to the production database, deploy to production without a human-approved squash, or rotate secrets. The phases are governance phases, not capability phases.

I would not skip phases. I would not let Phase 2 features sneak into Phase 0 because they're easy to add. Every Phase 0 day spent building Phase 1 features is a day not spent making the foundation correct. Phase 0 took me much longer in real life than it should have, and Phase 0 is the only one whose mistakes are expensive to recover from. The other phases mostly correct themselves with iteration.

Next time

That's the architectural reset. Next time I want to start walking through the parts in detail, beginning with what I now think of as the patient pipeline: eligibility, intake, magic-link onboarding, the moment a returning patient is recognized as one rather than starting from scratch, and the small set of metrics that actually matter to track at each step. Most of the rest of the series will be component-by-component in this style, from the patient pipeline through the clinical core, communications, payments, compliance, and the admin tooling that holds the whole thing together. The next post is the first of seven on that.

Frequently Asked Questions

Why Aurora instead of Neon?
Neon's defaults are wrong for a regulated workload. Aurora Serverless v2 gives you PITR of 35 days configurable at cluster creation, a backup retention policy you control, cross-account snapshot copying you can wire before patient zero, and a connection pooler that behaves well under unpredictable load patterns. Cost is meaningfully higher than Neon but irrelevant compared to the recovery story when something goes wrong.
What about HIPAA compliance for the stack choices in this post?
Every vendor named in the 'what I'd swap' section signs a Business Associate Agreement: Aurora, AWS End User Messaging, SES, Whereby Embedded. Stripe explicitly is not a Business Associate under §1179 of HIPAA (payment processing exemption); the BAA-equivalent there is the data minimization in what you send Stripe. The auth flow (magic-link) doesn't transmit PHI either way.
Why a single monorepo instead of multiple repos?
At the scale of one to two engineers, a monorepo eliminates cross-repo type drift, simplifies the deploy pipeline, and lets the marketing site and patient portal share the same auth surface (which simplifies the HIPAA audit story). Multi-repo splits make sense once you have a team of five or more engineers with clear ownership boundaries. Solo doctors should not be in that mode for a long time.
What's the cost difference between the day-one architecture and the cheaper version?
Order of magnitude: roughly $200-400 per month for the Aurora + cross-account backup + role-separated infrastructure versus roughly $50-100 per month for the Neon + single-account version. The difference funds the recovery story you tell after an incident. If you can't afford the higher tier, run on the cheaper tier with a documented plan for what you'd do if the database were lost, and accept that the answer might be 'rebuild from scratch.'
Why magic-link auth instead of passwords with 2FA?
For a patient population that logs in once a month at most, the threat surface of password storage and credential stuffing dominates the threat surface that 2FA addresses. Removing the password entirely eliminates the larger threat. Magic-link via SES, with a short expiry and single-use enforcement, is simpler to implement, simpler for patients, and simpler to audit. The downside is total dependence on email delivery; the mitigation is the hardcoded fallback pattern (SES primary, Resend secondary for the auth path specifically).
When does an agent get write access to production in your build order?
Phase 3 (month nine onward), and only for non-critical workflows: marketing copy generation, internal admin utilities, content drafting. Agents do not get database write access, secret rotation rights, or production deploy approval at any phase. The six tests in the last build post define the boundary.
ai
agents
architecture
practice infrastructure
build log
vendor-selection

If you're a doctor thinking about building (or fixing) your own practice tech and want to talk through your specific situation, I do a small amount of consulting at drbensoffer.com/consulting. I work with a handful of doctor-builders at a time, so the calendar is intentionally narrow.

Get the next post by email

One short email a week, only when there's a new post in this series.

One short email a week, only when there's a new post. Unsubscribe in one click.