Azure Event Hub – all my thoughts

One of the technology I really love in Azure in the Event Hub (EH) ,  I use EH for a lot of different scopes, I like to use the technologies in different scenarios and EH is a really versatile framework.

Event Hub is able to receive millions of message per second, this depending by the setting used in the Azure portal and the message size, for this particular feature, EH is a perfect solution where we need to integrate/coordinate/receive millions of things (IoT Internet of Things), for example receive messages from millions of sensors, coordinate messages between thousands of trains or taxis and why not, send millions of logging messages to the Cloud (On Premise Tracking/Monitoring/Advertising Centralization) and more.

Under the Hood recap.

AzureEventHubsOverview

EH is inside the Service Bus namespace, this doesn’t mean that it is the Service Bus but just only one of the parts of it, inside the Service Bus now we have Queues, Topics and Event Hub as well, for some aspects they have some same behaviours but they also have some important differences.

In the first some quick technical information we need to know to understand EH and his position in Azure, I like to organize the positioning of the technologies inside Azure, this is very important to decide what the best  is for you when you need.

At the end EH is a mix between Queues and Topic but with some differences, with EH you can manages messages in FIFO     pattern as Queues and you can use publish/subscribing pattern as Topics, the difference is that EH cans persist a message no more than 7 days, the maximum dimension of the message is 256 Kb and we have three way to manage a publish/subscribing scenario, logically one is using different Event Hubs, another is using his Partitions and the last one is using different Consumer Groups per EH, now some other important information we need to know.

  • The number of partitions can be between 8 and 32 but you can extend them until 1024 calling the Microsoft support.
  • Each Partition receives messages in an ordered sequence of events.
  • An event is a message composed by a Message Data object class.
  • EH can support batch messaging, so we can send messages bigger than 256k in batch, logically we have to manage the mechanism, EH provide the SequenceNumber property of the EventData class to do that.
  • A Message Data object is a message formed by a body and other context property as:
    • Offset: to set the positioning of the message in the partition, could be a time stamp or any other unique value you need, after you will use the method Checkpoint() to inform the EH about your last reading.
    • Sequence Number: to set a sequence number of the event inside the Partition, this could be very useful to manage  a batch of messages, yes the limit is 256 Kb per message but we can image to use a batch mechanism for the larger message just using this property.
    • Body: we can write all we want inside the body and we can use three different approach,EventData(Byte[]) for byte array,  EventData(Stream) fore stream and EventData(Object, XmlObjectSerializer) to take the content and serialize it.
    • User Properties: we can define all context properties we want and need.
    • System Properties: we can use to set the some Event Data system properties as  Offset, PartitionKey and other more, check here (http://msdn.microsoft.com/en-us/library/azure/microsoft.servicebus.messaging.eventdatasystempropertynames.aspx) , the important thing to know is that we can set this property in two different way, using the context or the object properties

 

systempropertties

 

  • The EH need to use a storage account, we also can configure a local storage account for development purpose and we can do that using the Azure Storage Emulator

 

storage emulator

 

  • The capacity of EH is measured in Throughput Units and by default you have in Ingress: 1MB per second or 1000 events per second and in Output: 2MB per second.
  • We can modify this Throughput in the Azure portal under Service Bus, select the Namespace and Scale.
  • scalePublish/subscribing
    • As I said before we can organize the pubs/subs in different way, we can send messages to a particular Partition and Receive messages from this particular Partition, we can create different Consumer Group following a group URI naming convention, an example…
      • <my namespace>.servicebus.windows.net/<event hub name>/<Consumer Group #1>
      • <my namespace>.servicebus.windows.net/<event hub name>/<Consumer Group #2>
      • <my namespace>.servicebus.windows.net/<event hub name>/<Consumer Group #3>
  • The EH receive message using HPPS and AMQP 1.0 and we can push a message using all language or client technology we want, .Net, Java, Apache Quid and so on.

 

Develoment and Challenges

The development aspect is pretty simple and you can find a lot of samples, the challenge is to organize the messaging pattern and the  Hubs/Partitions/Groups, this is the most important thing, so my best advice is, before to start writing code is better to plan the hubs architecture and the messaging exchange pattern carefully.
I created a lab to try the performances, I was totally amazed, with the default configuration in the Azure portal, I sent 40.000 messages in less than 30 seconds form my laptop and the network was pretty bad.
Logically I used a multi-threading pattern and all the cores worked perfectly and I sent all the messages in one batch, this means that for the EH is really simple to work in multicast scenarios.

a1

 

 

The most interesting things are the correlations and the different combinations we have using Event Hub with the other Azure Technologies as Azure Stream Analytic, Azure Batch, Power BI or Machine Learning and more, but these are others thoughts.

Below all the most important resources we need to start to use it:

Related blog posts