Read Article Read Article

Tutorials 10 mins

How to Design Microservice Boundaries That Do Not Come Back to Haunt You

Nben M. 03 Dec, 2025 10 mins

The appeal of microservices is real. Independent deployability, isolated failure domains, the ability to scale specific parts of a system without scaling everything. Teams adopt the architecture for good reasons. The problems appear later, when what looked like a clean boundary at design time turns out to be a seam that two services have to cross on every single request, or when a change to one service requires coordinated deployments across four others, or when a transaction that needs to be atomic is now distributed across three databases with no clear owner.

I have seen this pattern more than once. At Standard Chartered Ireland, we inherited a system that had already gone through one microservices migration before we arrived. Some of the boundaries were well drawn. Others were clearly the result of decomposing the monolith by technical layer rather than by business capability, and those boundaries had been causing pain for years. We spent the first three months understanding where the seams were wrong before we touched anything.

The decisions that produce bad boundaries are not made carelessly. They are made with incomplete information, under pressure, by teams that have not yet lived with the system long enough to understand what it actually does. That is not an excuse for not thinking carefully. It is a reason to understand the failure modes before you start drawing lines.

Decompose by Business Capability, Not by Technical Layer

The most common source of bad microservice boundaries is decomposing a system the way it looks in the codebase rather than the way the business actually works. A monolith often has a controller layer, a service layer, a repository layer. Splitting those into separate services produces a distributed monolith: three services that cannot function independently and that must communicate on every operation.

The right unit of decomposition is a business capability: a cohesive set of behavior that the business performs, owns end to end, and can reason about independently. Payments is a capability. Fee calculation is a capability. Customer onboarding is a capability. Each of these has a clear owner in the business, a clear input and output, and a clear definition of what it means to do its job correctly.

When you decompose by capability, the service boundary maps to something a non-engineer can describe. That is a useful test. If you cannot explain what a service does to a product manager in one sentence without using technical terms, the boundary is probably wrong.

The Bounded Context Test

Domain-driven design's concept of a bounded context is the most practical tool I have found for validating service boundaries before committing to them. A bounded context is a domain in which a specific model applies consistently: the same terms mean the same things, the same rules govern the same behaviors, and the same team owns the decisions.

Before drawing a service boundary, ask whether the domain you are enclosing has a consistent model. Does "account" mean the same thing throughout this service? Does "transaction" have one definition or several depending on context? If the same term means different things in different parts of the proposed service, the boundary is probably enclosing too much. If terms from outside the proposed service keep appearing in its internal model, the boundary is probably drawn in the wrong place.

At Standard Chartered, the original migration had put customer data and account data into the same service because they were stored in related tables. In the business model, customer and account are distinct bounded contexts with different owners, different change rates and different regulatory requirements. Separating them was one of the first things we did, and it simplified both services significantly.

Choreography vs. Orchestration: Choose Before You Regret It

One of the most consequential decisions in microservice design is how services coordinate. The two primary models are orchestration, where a central service directs the flow, and choreography, where services react to events and coordinate implicitly.

Both models work. Both have failure modes that only become obvious at scale. Choosing the wrong one for a given workflow is the kind of decision that is expensive to reverse once services are in production.

Orchestration is easier to reason about and easier to debug. When something goes wrong, the orchestrator has the full context of what was attempted and where it failed. The cost is coupling: the orchestrator knows about every service it coordinates, and adding a new step to the workflow requires changing the orchestrator.

Choreography scales better and produces looser coupling. Services publish events and react to events without knowing about each other. The cost is observability: when a workflow fails, the failure is distributed across multiple services and event logs, and reconstructing what happened requires correlating events across systems.

// Orchestration: the payment service drives the full flow
func (s *PaymentService) ProcessPayment(ctx context.Context, req PaymentRequest) error {
    if err := s.fraudClient.Check(ctx, req); err != nil {
        return fmt.Errorf("fraud check: %w", err)
    }
    if err := s.ledgerClient.Reserve(ctx, req.Amount, req.AccountID); err != nil {
        return fmt.Errorf("ledger reserve: %w", err)
    }
    if err := s.notificationClient.Send(ctx, req.CustomerID, "payment initiated"); err != nil {
        // non-fatal: log and continue
        s.logger.Warn("notification failed", "err", err)
    }
    return s.ledgerClient.Settle(ctx, req.Amount, req.AccountID)
}

// Choreography: each service reacts to events independently
func (s *FraudService) HandlePaymentInitiated(ctx context.Context, event PaymentInitiatedEvent) {
    result := s.checker.Evaluate(ctx, event)
    if result.Blocked {
        s.publisher.Publish(ctx, PaymentBlockedEvent{PaymentID: event.PaymentID, Reason: result.Reason})
        return
    }
    s.publisher.Publish(ctx, FraudCheckPassedEvent{PaymentID: event.PaymentID})
}

For financial workflows where atomicity and auditability matter, orchestration is usually the safer default. The visibility into the full transaction flow is worth the coupling cost. Choreography works well for non-critical side effects: sending notifications, updating derived data, triggering analytics.

Data Ownership Is the Boundary

A microservice that reads another service's database is not a microservice. It is a distributed monolith with extra network latency. The principle sounds obvious. It breaks down consistently in practice because sharing a database is easier than defining an API, and easy wins in the short term.

Each service must own its data. That means one service, one schema, one team with the authority to change the schema without coordinating with other teams. If two services need the same data, one of three things is true: they belong in the same service, one should call the other's API to get the data it needs, or the data should be replicated via events with each service maintaining its own read model.

The third option is the most powerful and the most underused. In the Standard Chartered migration, the reporting service originally read directly from the transaction processing database. The schema coupling meant that any change to the transaction schema required a coordinated release with the reporting team. We replaced the direct read with an event stream: the transaction service published a TransactionSettled event, and the reporting service maintained its own projection optimised for reporting queries.

// Transaction service publishes a clean event, not a raw schema object
type TransactionSettledEvent struct {
    TransactionID string          `json:"transaction_id"`
    AccountID     string          `json:"account_id"`
    Amount        decimal.Decimal `json:"amount"`
    Currency      string          `json:"currency"`
    SettledAt     time.Time       `json:"settled_at"`
    FeeApplied    decimal.Decimal `json:"fee_applied"`
}

The reporting service can now evolve its read model independently. The transaction service can change its internal schema without affecting reporting. The event contract is the boundary, and it is explicit and versioned.

Version Your Contracts from Day One

The most avoidable pain in microservice maintenance is breaking contract changes. A service changes its API or its event schema, a downstream service breaks, and the on-call engineer spends Saturday morning tracing a null pointer exception back to a field that was renamed three days ago.

Version your contracts before you need to. A v1 endpoint that never changes is easier to maintain than an unversioned endpoint that breaks consumers every time it evolves. An event schema with an explicit version field allows consumers to handle multiple versions gracefully during transitions.

// Event with explicit version allows consumers to handle transitions
type PaymentEvent struct {
    Version   string          `json:"version"`    // "2.0"
    Type      string          `json:"type"`       // "payment.settled"
    PaymentID string          `json:"payment_id"`
    // v2 fields
    FeeBreakdown []FeeItem    `json:"fee_breakdown,omitempty"`
}

This costs almost nothing to implement at the start. It costs significant engineering time to retrofit after three services are in production and all consuming the same unversioned schema.

Distributed Transactions Are a Boundary Smell

If a business operation requires a transaction that spans two services, that is a signal that the boundary is wrong, not a problem to solve with a distributed transaction protocol.

The instinct to reach for sagas or two-phase commit when a workflow spans multiple services is understandable. Those patterns exist and they work. They also add significant complexity: compensating transactions, idempotency requirements, partial failure handling, state machines that have to survive process crashes. Every time I have seen a team reach for a saga pattern, I have asked first whether the services involved should simply be one service.

At Standard Chartered, we had a workflow where initiating a payment required debiting a source account and crediting a destination account atomically. The original design had these as separate services. Every approach to making the operation atomic across two services was more complex than the alternative: treating the double-entry ledger as a single service with a single transaction boundary.

Not every multi-step workflow is a boundary smell. Read operations that span services, workflows where eventual consistency is acceptable, side effects that can be retried safely: these do not require atomic transactions. The test is whether the business considers partial completion an acceptable outcome. If it does not, the services probably should not be separate.

Boundaries Should Reflect Team Structure

Conway's Law is not a suggestion. Systems reflect the communication structure of the teams that build them. A microservice boundary that does not map to a team boundary will drift toward the team boundary over time, or it will require constant cross-team coordination to maintain, which is operationally equivalent to having no boundary at all.

Before finalising a service boundary, ask who owns it. One team, with clear responsibility for its reliability, its API contracts, its deployment and its on-call. If the answer is "both teams share ownership," the boundary will erode. Shared ownership of a service means no one feels the full cost of the decisions made inside it.

At Standard Chartered, the boundaries that held up best over the migration were the ones that mapped cleanly to existing team structures. The boundaries that required the most rework were the ones that had been drawn by architecture without reference to who would actually maintain them.

Conclusion

Bad microservice boundaries do not announce themselves at design time. They announce themselves eighteen months later, when a simple feature requires four deployments, when an incident spans three services and nobody owns the investigation, when the distributed transaction saga that was supposed to be temporary has become load-bearing infrastructure that no one wants to touch.

The decisions that prevent this are not technically complex. They require discipline about data ownership, honesty about where the business actually draws its lines, and the willingness to make boundaries explicit and versioned from the first day rather than cleaning them up later.

Draw boundaries around things the business owns, not around things the codebase contains. Make data ownership non-negotiable. Version contracts before you break them. And when an operation wants to be a distributed transaction, ask first whether two services should be one. The answer will be yes more often than the architecture diagram suggests.

Nben M. 03 Dec, 2025 10 mins

Next up

News Overview

Tutorials 13 mins

How to Modernize a Legacy Monolith Step by Step Without Taking the Whole System Down

Nben M. 25 May, 2026

Modernizing a legacy monolith doesn’t require a full rewrite or big-bang cutover, just a phased extraction strategy that keeps production running smoothly.

News Overview