Keisan

Keisan Knowledge Base

Sharing the secrets of our success

Knowledge Base Article Listing

  Filtering By Tag - Avro

Real-Time Machine Learning Pipeline with Apache Spark
Real-Time Machine Learning Pipeline with Apache Spark


In a previous article entitled 'Real-Time Data Pipeline with Apache Kafka and Spark' I described how we can build a high-throughput, scalable, reliable and fault-tolerant data pipeline capable of fetching event-based data and eventually streaming those events to Apache Spark where we processed them. I ended the last article by simply using Apache Spark to consume the event-based data and printing them to the console. In my last article entitled

Read More
Apache Kafka Producer with Avro Bijection
Apache Kafka Producer with Avro Bijection


In my last article entitled 'Real-Time Data Pipeline with Apache Kafka and Spark', I used Apache Flume to fetch tweets from the Twitter Stream using the demo Flume Twitter Source that is bundled with Flume out-of-the-box. The demo Twitter Source connects to the Twitter Stream and continuously downloads a sample of tweets. The tweets were then published to a Topic in the Kafka Channel that we setup. In this article, we will be writing a custom Kafka Pr…

Read More
Real-Time Data Pipeline with Apache Kafka and Spark
Real-Time Data Pipeline with Apache Kafka and Spark


It was in 2012 when I first heard the terms 'Hadoop' and 'Big Data'. At the time, the two words were almost synonymous with each other - I would frequently attend meetings where clients wanted a 'Big Data' solution simply because it had become the latest buzz word, with little or no consideration as to whether their requirements and data actually warranted one. 'Big Data' is of course more than just Hadoop and as scalable technologies in both batch and real-time became more mature, so did our knowledge of them. However, for those of you…

Read More
TOP