Hyperledger Fabric stress test using low cost AWS EC2s instances

GoLedger
5 min readJan 30, 2021

--

Many articles have discussed about the maximum speed of a Blockchain network.

When it comes to public Blockchains, like Bitcoin or Ethereum, the number of transactions per second is well known, but when you develop your own permissioned Blockchain using technologies like Corda, Quorum or Hyperledger Fabric, there are lots of variables that can shake the results enormously.

We have seen permissioned Blockchains that can scale up to more than 5000 transactions per second, deployed by companies that have lots of processing power available, affording the use expensive CPUs in order to architect their business network.

Well, there is not our case.

Before we can deliver a production permissioned Blockchain network, it is necessary to test all possible failures and prevent them before the final release, and the instances we use are some of the cheapest instances available on AWS.

Prior to releasings the network, the stress test is a very important DevOps phase. The Blockchain operator must be aware of the capability of the network architecture, scaling it if necessary, and preventing denial security attacks.

For this article, I will show the results of a stress test in a Hyperledger Fabric network using low cost cloud instances, the problems that appeared and the solutions we provided.

First let me show the architecture used for this scenario:

Fig. 1 — Network architecture

Some basic information about this network:

· Hyperledger Fabric 1.4.1

· Channel: loadchannel

· Chaincode: template-cc, using Golang

· AWS EC2 Instance: t2.medium (2 vCPUs, 4Gig RAM, 12 Gig SSD)

· 4 organizations (org1, org2, org3, org4)

· Initially 3 peers per organization

· All peers are endorsing peers

· 2 clients per organization (REST APIs in Node.js deployed in the same instance as the peers)

We used our platform GoFabric as the HLF orquestrator, to instantiate and upgrade the chaincode, upgrade the REST APIs, remove and add more peers to the organizations.

No CouchDb index was defined for this test, in a further experiment this could be implemented to increase query speed.

The first instantiation of the chaincode used a 4-of–4 endorsing policy, so each one of the four orgs must endorse the transactions.

To proceed with the stress test, we launched 8 separate EC2 instances (t2.medium), each one running a server developed in Node.js that will be triggered in order to launch several hundreds of POST requests almost simultaneously direct to a API connected to the HLF peer. We called these instances ‘load servers’.

Our main objective is to hit the bottleneck of the network transaction limit, measure latency and try to find some reliability enhancements.

For our first test we will use the following schema.

· Number of load servers: 8

· Requests per load server: 2000 (16000 total)

· Chaincode endorsing policy: 4-of-4

· Peers per org: 3

Fig 2 — First test schema

This is as ugly as it gets, specially because it is necessary 4 endorsing signatures to commit one single transaction.

After the test we achieved the following results:

· Success rate: 9,1%

· Average transactions (success/error) per second: 26,1

This was a great deception. Less than 1 per 10 transaction was committed successfully.

After an analysis of the results, we could see that the transactions’ results degraded from a total success flow to an almost total fail after a certain point. There was something making the network turn from green to red very fast.

In order to proceed with the tests, we changed the network so we could figure out some of the possible reasons for this failure.

First, we updated the endorsing policy of the chaincode from 4-of-4 to 3-of-4, so we would reduce the dependency on all the orgs being online.

In the next test, we diminish the number of requests per load server with the following schema:

· Number of load servers: 8

· Requests per load server: 500

· Chaincode endorsing policy: 3 of 4

· Peers per org: 3

Results:

· Success rate: 82,5%

· Average transactions (success/error) per second: 12,7

Then, we removed some peers (by revoking the certificates and updating the syschannel), leaving only 1 peer per org. This would slow the transaction flow, as the discovery service should offer only one peer per org to endorse the transactions. Although the network was diminished to 4 peers, all 8 REST Apis will continue be used in the next stress test.

This leads us to the following schema and results:

· Number of load servers: 8

· Requests per load server: 1000

· Chaincode endorsing policy: 3 of 4

· Peers per org: 1

Fig 3 — Debug test schema

Results:

· Success rate: 13,2%

· Average transactions (success/error) per second: 17,4

Analysing the results, we could see that the transactions were committed gracefully for some time and after a certain point almost all requests failed. Checking the logs of the client, we saw that it was a Channel Event Hub timeout, and everything started to make sense. Even though the transactions were committed successfully, the excess of requests led the REST API client to timeout.

We changed the Channel Event Hub timeout configuration from 30 seconds to 120 seconds and upgraded all the 8 REST APIs.

· Number of load servers: 8

· Requests per load server: 1500 (12000 total)

· Chaincode endorsing policy: 3 of 4

· Peers per org: 1

Results:

· Success rate: 99,8%

· Transactions (success/error) per second: 14,4

And that’s it. With a simple change in the API and the endorsing policy we could increase the network stability tremendously, making it possible to use low cost instances (AWS EC2 t2.medium) to architect a much more reliable Hyperledger Fabric Blockchain network.

Fig 4 — Final results of HLF stress test

--

--