Setting Up A Couchbase Cluster In 10 Minutes With Docker And Docker Compose

Today a tutorial on how to setup a Couchbase cluster on your local machine using Docker.

We often describe Docker as a lightweight solution for isolating your production processes. But sometimes we may forget to say how easy it is to use Docker for setting up your development environment.

Recently I developed a Couchbase adapter and I had to test failure scenarios among which a node failover and a disaster recovery scenario.
In our production environment, we installed 2 Couchbase clusters of 3 nodes each (3 nodes is the minimum for having the auto-failover mode enabled).

I wanted to test my adapter on my development environment but I let you imagine how heavy it would be to install manually 6 nodes considering especially that Couchbase manages many ports for administration and clustering. I would have configured manually each server node to prevent port collisions.

Because Docker provides environment isolation, let’s see how easy it is to create and configure the following Couchbase cluster with 3 nodes and one port mapped on your local machine exposing the admin console:

couchbase-cluster

To instantiate the three Docker nodes, run the following commands:

$docker run -d -v ~/couchbase/node1:/opt/couchbase/var couchbase:3.1.0
$docker run -d -v ~/couchbase/node2:/opt/couchbase/var couchbase:3.1.0
$docker run -d -v ~/couchbase/node3:/opt/couchbase/var -p 8091:8091 couchbase:3.1.0

Each command starts a new Docker instance (the -v option allows to create a volume for persisting the Couchbase node data).

The last command maps the admin console port on your local machine.

The next step is to retrieve the internal container IP address of the node3.
Execute the command hereunder by replacing <node3_docker_id> by the node3 container id:

$docker inspect --format '{{ .NetworkSettings.IPAddress }}' <node3_docker_id>

Connect to your local admin console (http://localhost:8091), follow the setup steps and enter the previously retrieved IP in the hostname field.

The last step is to add node1 and node2 in the cluster. Retrieve their internal IP with:

$docker inspect --format '{{ .NetworkSettings.IPAddress }}' <node1_docker_id>
$docker inspect --format '{{ .NetworkSettings.IPAddress }}' <node2_docker_id>

In the admin console, go in Server Nodes menu and click on Add Servers button. Add the node1 and node2 internal IP addresses.

That’s it! Your Couchbase cluster is configured. If you want to create another cluster for testing Couchbase XDCR mechanism, for example, you just have to repeat these operations (don’t forget though to map the admin console port on another one).

Now, what if you want to have the capability to instantiate this cluster on the fly, during an automated testing process for instance?
There is nothing easier! Let’s use Docker Compose, a simple but powerful tool to define and run multi-container applications.

Create a docker-compose.yml file with the following content:

version: '2'
  services:
    node1:
      image: couchbase:3.1.0
      volumes:
        - "~/couchbase/node1:/opt/couchbase/var"
    node2:
      image: couchbase:3.1.0
      volumes:
        - "~/couchbase/node2:/opt/couchbase/var"
    node3:
      image: couchbase:3.1.0
      volumes:
        - "~/couchbase/node3:/opt/couchbase/var"
      ports:
        - "8091:8191"

In this file, we are simply reusing the information on the Couchbase cluster configuration in a Docker Compose file format.

Start your Couchbase cluster with:

$docker-compose up

This command starts automatically each node and lets Couchbase setting up the cluster based on the configuration you previously made in the admin console.

<3 Couchbase

<3 Docker

MDM: A Disruptive Approach

Today I wanted to share a very insightful and disruptive article on Master Data Management (MDM).
It was written by Michele Goetz, an analyst at the Forrester, in 2012: Master Data Management Does Not Equal The Single Source Of Truth.

MDM is not about creating a golden record or a single source of truth.

Unlike many MDM tools you could find on the market, announcing they are the right choice to manage golden records (basically one unique representation of a master entity), the author explains that MDM is simply not made for that reason.

The number one reason I hear from IT organizations for why they want to embark on MDM is for consolidation or integration of systems. Then, the first question I get, how do they get buy-in from the business to pay for it?

It makes sense. If the business sees the solution just like another integration solution (already sold with EAI then with ESB), it is not the good approach.

IT missed the point that the business wants data to support a system of engagement. The value of MDM is to be able to model and render a domain to fit a system of engagement […] Context is a key value of MDM.

This part is really key. Put yourself in the place of the business.
The IT guy in front of you is trying to sell a solution for having a single source of truth to manage the master entities you are working with every day.
It is very likely that you would not be interested in such solution because many organizations remain silo-based. You would rather be interested in a solution to support as best as possible your own system of engagement, your own utilization of these master identities. In a nutshell, your own context.

A person is not one-dimensional; they can be a parent, a friend, or a colleague, and each has different motivations and requirements depending on the environment.

When organizations have implemented MDM to create a golden record and single source of truth, domain models are extremely rigid and defined only within a single engagement model for a process or reporting. Model the data to allow for flexibility in different situations

It reminds me a good old debate:

  • Should we model canonical data for cross-business domain usage?
  • Or should we rather model a core and lightweight representation but with flexible extensions for domain-specific information with hooks to navigate from one domain to another?

After having worked in a manufacturing industry on trying to model its main master entity, I could not agree more on the fact that one unique view is very often a mistake (still for master data). Master entity representations are not unique but business domains are.

Make sure the reason you do this is to align master entities to a system of engagement defined by the business […] Doing so means your MDM initiative is about supporting a business need

This is the key argument. MDM is not made for having all business domains speaking the very same language. It is a solution to align master entities onto a specific context defined by the business.

We don’t have to forget that in the Business-IT alignment, it is up to the IT to be aligned with the business, not the contrary.
This way an MDM solution has to support a business need, not an IT one.

SSL/TLS Main Concepts

In this post I will describe the main concepts involved during an HTTPS connection. Hopefully this post might help those who want to understand or remind SSL / TLS principles or those who are struggling with SSL handshake failures.

Encryption

Encryption is the process of translating messages (based on a predefined algorithm) in such a way that only authorized parties can read it.

There are two types of encryption:

  • Symmetric encryption: a key is shared between communicating parts and will be used during both steps, encryption and decryption.
  • Asymmetric encryption: two keys are now involved, a public and a private one. The public key is available for anyone but the private one is known only by the owner. The encryption will be made using the public key and the decryption can be only achieved by the corresponding private key.

X.509 certificate

The main objective of a X.509 certificate is to bind a public key with the identity contained in the certificate itself.

It includes:

  • A certificate version
  • A serial number to uniquely identify the certificate
  • The certificate public key
  • A subject distinguished name to identify the certificate owner
  • An issuer distinguished name to identify the certificate issuer (we will detail this notion hereafter)
  • A validity period (From, To)
  • Some optional extensions such as the extended key usage to identify the certificate role (client or server role) etc…

Let’s come back to the issuer notion. A certificate can be self-trusted, for example: “I swear I am the server certificate of www.mywebsite.com, you have to trust me”.
In that case, the client must blindly trust the certificate.

Or it can be also trusted by a certificate authority (CA): “I am the server certificate of www.mywebsite.com and this authority (CA) guarantees it”.

An issuer certifies the ownership of a public key by the subject distinguished name. It can be either a private or a public third-party authority such as GoDaddy, VeriSign etc…

At the end, if a certificate is trusted by other CAs, we will have this type of chain:

cer

In this example, *.google.com is trusted by Google Internet Authority G2 (a Google’s private CA) and is itself trusted by GeoTrust Global CA (a public CA).

Last but not least, concerning the certificate generation process, everything starts from a private key.

This private key is used then to generate a public certificate (either a .CRT or .CER).
At this stage, the certificate is still self-trusted. The next step is to generate from the certificate a CSR (Certificate Signing Request) and send it to a certificate authority. This authority will certify you are really the owner of the public key (there are different levels of verification to get a certification, this is what we called the certificate class). At the end, you will receive another public certificate, this time trusted by the certificate authority.

Of course during this process the private key remains private and is not shared with the certificate authority.

One-way vs two-way SSL

There is something really important to understand, the difference between one-way SSL and two-way SSL (also called mutual SSL).

In both cases, the server has to present its SSL server certificate to let the client verify the server identity.

Indeed, if you navigate with a web browser on https://www.facebook.com, the Facebook server will send you its own server certificate. Nevertheless, you are going to use one-way SSL.
The reason is simple and concerns the authentication part. In this very example, you will be authenticated in Facebook service using credentials (basically a user/password).

With two-way SSL, the authentication will not be managed by some credentials (such as an HTTP basic authentication or a WS-Security UsernameToken for instance) but directly from a SSL client certificate (certificate authentication).

This is the main key difference between both SSL approaches. In day-to-day Internet navigation we are all using one-way SSL but with B2B exchanges the mutual SSL is way more frequent.
This is not a question of encryption security though, both approaches are secure in terms of data encryption. This is simply for the authentication part where we consider that a private key is more protected/hidden than simple credentials.
With certificate authentication, there are also simple ways to manage revocation if a certificate should no longer be trusted for instance.

SSL handshake: deep dive

In this section, we are going to dig into a two-way SSL connection.

Following the standard 3-way TCP handshake, if a client wants to send data to a server over HTTPS, it has to pass another handshake: the SSL one.

This handshake can be split into 8 different steps:

ssl_handshake

1. The client sends to the server a Client Hello to propose the SSL options: protocol, protocol version, cipher (encryption algorithm), session-specific data, and other information that the server will need.

2. The server sends to the client a Server Hello to confirm the SSL options.
It is worth saying that an SSL handshake failure may occur directly during this step. If for instance the SSL (or TLS) version proposed by the client is not supported by the server, it will return an error. Or if among the cipher list proposed by the client, the server cannot support any of them, it will also return an error.
If a handshake occurs during this step, you must check whether the client and the server are compatible.

3. The server sends its SSL server certificate

4. The client checks the server identity based on its SSL server certificate.
The client relies on a truststore containing all the trusted certificates (on the opposite the client keystore contains its own client certificate and the private key). If an error occurs you have to check whether the certificate has been well added in the truststore.
This verification step may depend according to the HTTP server type and its configuration. The verification can be done for instance only on the different CA. It means the HTTP server will not check if the SSL server certificate itself is trusted, but it will check if the CAs (the ones on top) are present in the truststore.
This might be a best practice because an SSL certificate has often a shorter validity period than a CA (it might simplify the operational part).
Other errors may occur at this stage, if for instance, the SSL certificate is no more valid or if the client requires receiving only certificates whose extended key usage is set to server certificate (it is normally part of the standard verification but some HTTP servers disable this very check).

5. The client sends its SSL client certificate.

6. The server checks the client identity based on its SSL client certificate. This step is similar to the fourth one. The server will also verify if the SSL client certificate is present in its truststore.

7. If the server acknowledges the client identity, the client will generate a session key, encrypt it with the SSL server certificate (the public key of course) and sends it to the server. The session key is so based on an asymmetric encryption.
The server receives the encrypted session key, decrypt it using its own private key.
Other actions may be done during this step such as the Change Cipher Spec to potentially decide on another cipher during the data transmission.

8. The client starts to send data to the server. Each packet is encrypted using the session key. This time though the encryption is symmetric because it is based on the same secret session key, shared between the client and the server.
Moreover, it is worth noting that symmetric encryption is faster than asymmetric encryption. This is the reason why we use it for data exchange.
The server receives a packet and then decrypts it with the session key.

Conclusion

For the one-way SSL, the steps are almost the same except for the fifth and sixth. Because the authentication is not based on a client certificate, these steps will not be executed during the handshake.
The session key tough is still asymmetrically encrypted using the server public key and each packet is still symmetrically encrypted.

In this post, I used SSL and TLS acronyms. Please note that, even if TLS is now the de facto standard, both protocols are structurally very similar and intends to fulfill the same security requirements:

  • Server authentication based on the server certificate
  • Client authentication in case of a two-way SSL/TLS connection based on the client certificate
  • Confidentiality of the exchanged data
  • Integrity of the exchanged data

Last but not least, if you are facing SSL handhake failure, it is often really useful to capture the network traffic (based on TCPdump or Wireshark for instance). This way you may have some clues to identify where the problem comes from. Good luck!

N.B. I am not a security expert but I spent literally days on HTTPS connection issues with partners so I wanted to share my knowledge 🙂

reactiveWM: A Reactive Framework For webMethods

reactiveFor the very first post I am going to introduce a framework of mine based on webMethods Integration Server: reactiveWM.

In a nutshell this framework extends standard webMethods capabilities in terms of multithreading. It will allow to create native webMethods threads within the Integration Server thread pool (just like Service.doThreadInvoke()) but the main difference is the capability to create your own logical pools to prevent potential side effects.

Let’s imagine a service creating 10 threads per execution. If this very service is called 20 times at once, it will create 200 threads. The only limit is the IS service thread pool and if this limit is reached, the IS will simply freeze and impact the other services.

With reactiveWM, a developer can simply create his own logical thread pool and set a limit. Then the parallelization will be executed within this pool to make sure it will not introduce any side effects and provide a guarantee in terms of security.

Why is this framework reactive by the way? It is simply because reactiveWM provides non-blocking services. This way you can submit to a thread pool some tasks (either Flow or Java services) and get a return directly. The execution is then managed in the background by the framework (basically behind the main execution thread).
The developer can also create asynchronous execution chains by implementing dynamic callbacks depending on the status of the previous task.

Several other capabilities have been developed such as timeout management (not taken into account with the webMethods API), a failfast mechanism (if at least one thread fails, the parallelization is stopped), a solution to create pool hierarchy to ease the governance, an atomicity mechanism to guarantee at most one execution chain etc…

If you need more information, see the code or give reactiveWM a try, please visit the Github project.

By the way, if you want more information about what is reactive programming, you should read this great post: The introduction to Reactive Programming you’ve been missing.