Serverless Data Pipelines: AWS Glue is a potential future poster figure; maybe not just yet!

Data engineering plays a pivotal role in the success of organizations, especially those that are data-driven. It can make or break an organization’s growth. Data pipelines form the foundation of data engineering. However, the term “pipeline” has become somewhat clichéd in software engineering, as it can mean different things depending on the context and significance of the tasks involved.

In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion. Some amount of buffer storage is often inserted between elements

We’re talking about big data, Hadoop, and data pipelines — that should and will lead to Spark. Spark is a one-stop solution for a variety of big data problems, a unified platform for batch, real-time(almost), machine learning, deep learning, and Graphs.