What should you build?
Given these significant variables, it may seem daunting to try and construct a single infrastructure pipeline capable of handling all big data analytics use cases across the enterprise with performance that fulfills end user expectations. And it probably is.
Between the extreme of building a single monolithic pipeline for every use case and the extreme of having every use case consume infrastructure independently, however, are a number of reasonable approaches. These approaches typically involve some limited number of pipelines that are aligned with certain categories of workload.
For example, some IT organizations have found it practical to build two primary pipelines. One is reserved for experimental workloads that require lots of data sources and highly exploratory analytic operations. The other is dedicated to more clearly defined analytic operations.
This approach has two advantages. The first is that each pipeline can be better tailored to their respective workloads profiles. The second is that it allows dataops teams to focus on different pipeline goals. With experimental workloads, those goals tend to revolve more around appropriately filtering data inputs and improving accuracy of results. With more established workloads, end users typically look for snappier performance and richer visualization. Segmenting analytics across the experimental and established dimension thus allows for more efficient use of infrastructure and staff time.
There other ways to taxonomize analytic pipelines, as well. Some IT organizations find that segregating real-time from batch is very practical, because they can schedule batch jobs in ways that minimize their overall infrastructure requirements. Others find it necessary to segregate pipelines leveraging data subject to certain compliance constraints from those that don't.
Whichever approach -- or combination of approaches -- makes sense for your organization, the key is to both improve service levels and drive down costs by achieving economies of scale across as many different analytic use cases as possible.
(Image: Nostal6ie/iStockphoto)