In the recent past there has been flood of requests from major retailers who are using the premium ecommerce packages like Oracle ATG, IBM WCS, SAP Hybris & Magento etc to decouple the functionalities from the monolith deployment by moving the architecture approach to microservices. Decision for separation has to be made with lot of care & consideration as there will be too many factors at play like hosting(traditional + cloud), scaling, data modeling, replication to be used, session, cache management etc. Apart from the above listed factors the custom implementation of the solution would also come into play as we design the solution approach for incremental microservices transformation journey for the enterprise.
As we all are going through the learning curve of developing and deploying the distributed applications this journey provides lot of insights into design & development approaches to the various scenarios & problems. I have tried to summarize some key notes which can be useful during MSA implementations. Please share your learnings and experiences to make it more comprehensive.
Below are the multiple focus areas during designing the incremental microservices journey and list can be improved further based upon the experience we gain from the real ground:
- Domain Architecture
- CQRS (Command Query Request State) : Separates reads & writes to the data
- Event Sourcing : Event based architecture
- Hexagonal Architecture : Logic Kernel & Adaptor based
- Resilience & Stability
- Technical Architecture
While focusing on the above areas some things that will came into light during team discussions are listed below…
- How to manage the service versions?
- Designing for performance
- Designing for integrity
- Designing for failures
- Service Boundaries
- Designing for adapting to third party integrations/Products
- Operational Effectiveness
- Health checks
- Performance Metrics & Alerts
- Resources utilization & effectiveness
- Logging & Error, Exception Handling
- Event & Transaction Tracking
- CICD for packaging & deployment
- Hybrid Infra model support ( cloud + physical )
Domain Architecture
- Defines how microservices will implement core domain based functionalities
- No rigid approach (allows teams to decide the design & structure enabling them to work independently)
- Easy to understand, Simple to maintain & replace
- Defines how the entire domain is split into various areas (leading to mapping one team for the changes)
- Efficient domain architecture will require low co-ordination and communication among teams
Cohesion
- Domain architecture of overall system influences the domain architecture of each microservice
- Microservices should be loosely couple with each other and have high internal cohesion
- With respect to domain it should follow SRP (Single Responsibility Principle)
- Loose coupling & high cohesion
Encapsulation
- Internal details are hidden
- Accessed via defined interfaces
- Helps in easy modification
- In microservices context one microservice will never allow other microservice to access internal data
- Each microservice understand the interface to other microservice
Domain Driven Design
- Functionally structures the overall design of microservices
Transactions
- Bundle of actions (Execute Everything or Nothing)
- One transaction one microservice
- Transaction across microservices (by leveraging messaging based arch or design)
- Very hard to achieve the one transaction one microservice level with domain design
CQRS (Command Query Responsibility Segregation)
CQRS is an architectural pattern for developing software where the system is segregated for the steps or operations dealing with reads & writes (very different from old monolith CRUD architecture where single component has the ownership for reads & writes)
- Each system has a state which can be saved which requires reading(queries) & writing(command) the data
- Command & Query can’t be synchronous in nature resulting in CQS (Command Query Separation)
- Cache’s support the query operations
Principles of CQRS Arch | Benefits of CQRS Arch | Challenges of CQRS Arch |
Event Sourcing
Event Driven Architecture Loose Coupling High Cohesion Domain Driven Design SoC (Separation of Concern) |
Domain/Busines Focus
Scalability Simplification Flexibility Command & Query can be use different technologies Read & Write separation |
Data consistency
Expensive ( Development & Infrastructure cost will go up) Transactions with read & write steps are difficult to implement |
Microservices with CQRS
- The communication infrastructure can implement the Command Queue when a messaging solution is used. In case of approaches like REST a Microservice has to forward the commands to all interested Command Handlers and implement the Command Queue that way.
- Each Command Handler can be a separate Microservice. It can handle the commands with its own logic. Thereby logic can very easily be distributed to multiple Microservices.
- Likewise, a Query Handler can be a separate Microservice. The changes to the data which the Query Handler uses can be introduced by a Command Handler in the same Microservice. However, the Command Handler can also be a separate Microservice. In that case the Query Handler has to offer a suitable interface for accessing the database so that the Command Handler can change the data.
When to choose CQRS ?
- Event based integration is develop & maintain in the long run
- If there is heavy imbalance between number of reads & writes in the given system
- While building multi-view based systems (Web, Mobile, API etc )
- Complex CRUD operations in existing system
- If business process has lot of events and records would be required for further analysis
- Can live with Eventual consistency
Event Sourcing
Event Sourcing is an similar architectural pattern like CQRS, state is recorded as the sequence of events in the system resulting from actions/steps of domain. Events help in rebuilding the state of the actors with in the domain at any point of time.
CRUD stores the current state(not events) of the entity not the sequence or history of the events
- Events are messages
- Can be modelled in separate classes
- Event history is maintained (persists events)
- Event stores provide read and write capabilities
- Used for interaction between components
- Events are immutable ( no modification or deletion)
Principles of Event Sourcing |
Benefits of Event Sourcing |
Challenges of Event Sourcing |
Persisting state of the events
Event Driven Architecture Loose Coupling High Cohesion Domain Driven Design SoC (Separation of Concern) |
Simplicity
Easy Integration Event state traceability(audit trail) Performance Flexibility
|
Consistency (Business drives the level of consistency not the technology)
Parallel updates Validation |
- Event Queue : Sends all the events to recipients (mostly messaging middleware)
- Event Store : Saves all events using them it can reconstruct state
- Event Handler: Business logic which responds to the particular event(s)
Hexagonal Architecture
“Focuses on the core logic of the application i.e. business functionalities with well-defined interfaces which are available for the users & administrators”
- Alternative to the layered architecture which has UI, Business Logic, Persistence
- Has adaptors which interact with logic via ports
- Inbound & Outbound communication
- Some of the adaptors available for users, data, events & administrators are….
- UI Adaptor
- REST Adaptor
- Test Adaptor (Enables isolated testing harness)
- Database Adaptor
- Custom Adaptor (Resilience & Stability)
Resilience & Stability
- Due to the distributed nature of the microservices architecture the system always can run into the risk of cascading failure resulting in hostage situation for the applications accessing the services
- Distributed architecture relies on the network & cloud servers which can be highly unreliable considering the risk of failure system should be designed in such a way that probability of failure can be masked i.e. building the resilience into the system/application to ensure the availability too.
- Patterns that can ensure the availability of the services are:
- Bulkhead
- Bulkhead has been used for several centuries in ship building area, it creates the watertight compartments that can hold water in case of leakage to avoid ship sink
- Comparison to software architecture the concept allows to build sub-systems which are well shielded from cascading problems if any like overloads, response failures etc
- Test Harness
- It is an approach to induce the situation to study the behavior of the application in certain contexts (TCP/IP, HTTP Headers etc i.e some OS or network level issues) most of time these are not considered but addressing these situations can help the overall architecture
- Steady State
- Simple and manageable at any point of time, if the systems are not designed for consistent state they might run into problems at some point of time like heavy database storage operations, log & caching operations which have capability to bring down the system by creating storage & operational challenges. Hence it is recommended to have automated control over storage, caching & log operations
- Uncoupling Via Messaging
- Asynchronous process based communication helps (without any waiting), approach should be to proceed with other activities without waiting on the response
- Timeout
- An individual thread is started which can be terminated after defined timeout
- Easy to implement compared to other patterns
- Can be plugged into monitoring systems as well
- Stability based patterns
- Use Circuit Breaker, Timeouts & Bulkhead patterns to safeguard from the failures and tight coupling
- Apply fail fast mechanism to all the microservices to ease the communication wait times
- Steady State, Fail Fast, Test Harness, Handshaking etc have to be implemented by each microservice
- Decoupling via shared communication is the way to design
- Handshaking
- It is a protocol based feature once leveraged will help us to track the overloaded system & prevent cascading failures, socket based protocols implement these very well but HTTP doesn’t support this mechanism hence it is the responsibility of the application getting built by introducing some type of health check or signaling
- Circuit Breaker
- Are also not a complex to implement in the microservices code but a special attention is required during design to make them work high load scenarios & meet the operational needs to monitoring etc..
- Fail Fast
- This approach allows the consumer system to timeout on the response and detect the failed state as quickly as possible with defined error state. Hence validating the state of the incoming request allows to validate and avoid the processing further by confirming the error state, similar to the timeout model
- Resilience in Reactive Context
- Reactive Manifesto considers “resilience” as one of the core property of the reactive application
- Asynchronous processing with error handling & monitoring
- Hystrix
- Very good support for resilience
- It is a java library given by Netflix shared under Apache license
- Internally it uses Reactive extensions for Java (RxJava)
- Implements timeout & Circuit Breaker
- Hystrix dashboard shares info about state of the thread pools & circuit breaker
- Can be embedded in commands implementation
- Library provides rich annotations
- Leverages thread pools (for each microservice) – ensuring the support for bulkhead
- Non java implementations can use Sidecar Hyrstrix
- Can configured for monitoring single hosts as well as cluster mode
Technical Architecture
- Process Engines
- SOA based orchestrated process engines can be leveraged for microservices as well
- Microservices are designed for one domain i.e. one Bounded Context
- Microservices should not be designed just for integration or orchestration else it will violate the SRP and the changes for microservices will be become complex to do and maintain over the period
- In case of collective business context multiple microservices should be implemented where each microservice supports one business functionality and used in conjunction to support multiple functionalities
- Integration only microservices should be strictly avoided
- Statelessness
- In a distributed architecture stateless microservices are very useful
- Microservices should not save any state in their business logic (State can be maintained in database or client side)
- Having the stored state helps to recover from failures by replicating/recreating the state from database
- Helps in creating multiple instances
- Easy replication
- Load distribution to the instances
- Versioning : Old version can be shutdown or replaced without migrating state
- Reactive
- Reactive properties fits very well into the microservices approaches/needs
- An application/systems which is reactive should have some defined properties like:
- Resilience : Ensure the availability & failure tolerance
- Elastic : Dynamically scalable at runtime
- Message Driven : Message based communication
- Responsive : Fast response times with fail fast setup
-
Actors/Processes can send messages to each other
- Non-blocking I/O communication(Asynchronous)
-
Reactive Technologies
Reactive is just an alternative approach available of the microservices but it is not the only option. Microservices can also be implemented without “reactive” approaches with traditional programming models, resilience can be achieved by using various libraries & elasticity is implemented using VM’s or Containers. Messaging is already used in traditional models. Reactive systems have real advantage in the case of “responsive” implementation
References: