Keisan

Keisan Knowledge Base

Sharing the secrets of our success

Knowledge Base Article Listing

Machine Learning with Apache Spark
Machine Learning with Apache Spark


In September 2018, I was fortunate enough to be approached, and soon thereafter commissioned, by Packt Publishing to write a book on Machine Learning with Apache Spark. After 3 frantic months of juggling client and project commitments with putting together the contents and case studies for this book, I am delighted to announce that, as of 28th December 2018, the book is now published and available via the following retailers, bookstores and online learning platforms…

Read More
Real-Time Machine Learning Pipeline with Apache Spark
Real-Time Machine Learning Pipeline with Apache Spark


In a previous article entitled 'Real-Time Data Pipeline with Apache Kafka and Spark' I described how we can build a high-throughput, scalable, reliable and fault-tolerant data pipeline capable of fetching event-based data and eventually streaming those events to Apache Spark where we processed them. I ended the last article by simply using Apache Spark to consume the event-based data and printing them to the console. In my last article entitled

Read More
Apache Kafka Producer with Avro Bijection
Apache Kafka Producer with Avro Bijection


In my last article entitled 'Real-Time Data Pipeline with Apache Kafka and Spark', I used Apache Flume to fetch tweets from the Twitter Stream using the demo Flume Twitter Source that is bundled with Flume out-of-the-box. The demo Twitter Source connects to the Twitter Stream and continuously downloads a sample of tweets. The tweets were then published to a Topic in the Kafka Channel that we setup. In this article, we will be writing a custom Kafka Pr…

Read More
Real-Time Data Pipeline with Apache Kafka and Spark
Real-Time Data Pipeline with Apache Kafka and Spark


It was in 2012 when I first heard the terms 'Hadoop' and 'Big Data'. At the time, the two words were almost synonymous with each other - I would frequently attend meetings where clients wanted a 'Big Data' solution simply because it had become the latest buzz word, with little or no consideration as to whether their requirements and data actually warranted one. 'Big Data' is of course more than just Hadoop and as scalable technologies in both batch and real-time became more mature, so did our knowledge of them. However, for those of you…

Read More
CentOS 7 Open Street Map Tile Server
CentOS 7 Open Street Map Tile Server


One of the challenges that I encountered when designing and developing Ramus® was how to render entities (graph vertices) onto a map when the Ramus instance was installed in an environment where there was no internet access or the customer did not have a 3rd party commercial map provider. We are all familiar with Google Maps which is free for apps or websites that are free for anyone to use. However for commercial applications, access to Google…

Read More
TOP