Machine Learning with Apache Spark

Data Intelligence

Machine Learning with Apache Spark

Machine Learning with Apache Spark Quick Start Guide

Machine Learning with Apache Spark

In September 2018, I was fortunate enough to be approached, and soon thereafter commissioned, by Packt Publishing to write a book on Machine Learning with Apache Spark. After 3 frantic months of juggling client and project commitments with putting together the contents and case studies for this book, I am delighted to announce that, as of 28th December 2018, the book is now published and available via the following retailers, bookstores and online learning platforms:

What the book is about

Short Answer
Hands-on theoretical and applied introduction to machine learning and deep learning using Apache Spark.

Long Answer
Every person and every organization in the world manages data, whether they realize it or not. Data is used to describe the world around us and can be used for almost any purpose, from analyzing consumer habits in order to recommend the latest products and services to fighting disease, climate change, and serious organized crime. Ultimately, we manage data in order to derive value from it, whether personal or business value, and many organizations around the world have traditionally invested in tools and technologies to help them process their data faster and more efficiently in order to deliver actionable insights.

But we now live in a highly interconnected world driven by mass data creation and consumption, where data is no longer rows and columns restricted to a spreadsheet but an organic and evolving asset in its own right. With this realization comes major challenges for organizations as we enter the intelligence-driven fourth industrial revolution—how do we manage the sheer amount of data being created every second in all of its various formats (think not only spreadsheets and databases, but also social media posts, images, videos, music, online forums and articles, computer log files, and more)? And once we know how to manage all of this data, how do we know what questions to ask of it in order to derive real personal or business value?

The focus of this book is to help us answer those questions in a hands-on manner starting from first principles. It introduces the latest cutting-edge technologies (the big data ecosystem, including Apache Spark) that can be used to manage and process big data. It then explore advanced classes of algorithms (machine learning, deep learning, natural language processing, and cognitive computing) that can be applied to the big data ecosystem to help us uncover previously hidden relationships in order to understand what the data is telling us so that we may ultimately solve real-world challenges.

Who the book is for

Short Answer
Anyone interested in making a hands-on start in the world of machine learning and deep learning, with no prior mathematical or software engineering experience required.

Long Answer
This book is aimed at business analysts, data analysts, data scientists, data engineers, and software engineers for whom a typical day may currently involve analyzing data using spreadsheets or relational databases, perhaps using VBA, Structured Query Language (SQL), or even Python to compute statistical aggregations (such as averages) and to generate graphs, charts, pivot tables and other reporting mediums.

With the explosion of data in all of its various formats and frequencies, perhaps you are now challenged with not only managing all of that data, but understanding what it is telling you. You have most likely heard the buzzwords like big data, artificial intelligence and machine learning, but now wish to understand where to start in order to take advantage of these new technologies and frameworks, not just in theory but in practice as well, to solve your business challenges. If this sounds familiar, then this book is for you!

What the book covers

  • Chapter 1 - The Big Data Ecosystem
  • Chapter 2 - Setting Up a Local Development Environment
  • Chapter 3 - Artificial Intelligence and Machine Learning
  • Chapter 4 - Supervised Learning using Apache Spark
  • Chapter 5 - Unsupervised Learning using Apache Spark
  • Chapter 6 - Natural Language Processing using Apache Spark
  • Chapter 7 - Deep Learning using Apache Spark
  • Chapter 8 - Real-Time Machine Learning using Apache Spark

What software and technologies does the book cover

Where can I buy the book


Thank you to the Packt Publishing team for providing me with this tremendous opportunity, with special thanks to:

  • Siddharth Mandal - Acquisition Editor
  • Mohammed Yusuf Imaratwale - Content Development Editor
  • Diksha Wakode - Technical Editor
  • Emmanuel Asimadi - Reviewer

What's next

Another book aimed at more advanced readers...but more importantly, given that it has been 18 months since my last Knowledge Base post (sorry!), I plan to spend a lot more time in 2019 writing useful guides and tutorials predominantly covering topics in distributed systems, machine learning, deep learning and cognitive computing. So watch this space! Additionally, if you have something specific in these subject areas that you would like to learn more about, please don't hesitate to get in touch and I would be happy to explore writing a Knowledge Base article for it.