Home Artificial Intelligence Creating Data Infrastructure for AI and BI At Scale

Creating Data Infrastructure for AI and BI At Scale

by Bernard Marr
0 comment

In business today, the difference between success and failure often comes down to an organization’s ability to leverage data. Data lets us understand our customers, our markets, our competition, and our processes and operations. By applying analytics to this data – from basic business intelligence (BI) up to cutting-edge artificial intelligence (AI) technologies like machine learning – we extract insights that help us drive growth, innovation, and efficiency.

Any business can work with data, but as with anything in life, the better prepared we are, and the more in-depth our understanding is of the platforms and processes involved, the better our results are likely to be.

As they become more proficient at extracting insights and turning them into business growth, businesses move along what is often referred to as an “analytics journey,” becoming more mature in their ability to deploy the technological infrastructure needed to make the magic happen.

The aim is to reach a level of maturity where an organization can consider itself to be data-driven. I’ve worked with businesses across many industries to help them along this path, and it’s my experience that while a lot of them like to say they are data-driven, or follow data-driven business practices, far fewer are actually at the stage where they can apply data, at scale, throughout all of their organization.

This means using it for all of the objectives listed above: Understanding customers, understanding their markets and competition, understanding their internal operations and processes, and ultimately, using it to create better products and services.

For every business, the journey will be unique, and the way it’s completed depends on the desired outcomes, the strategic objectives of the business, and the resources – including skills – that are available. However, there are certainly some core principles that apply to any business set out on this journey. In this article, I’m going to cover some of the most important ones, so let’s dive in!

The semantic layer – giving data meaning

Firstly, it’s important to understand that while data may be the fuel of the information age, when it comes to working with it at scale to drive organization-wide growth, it’s not that useful on its own.

For a data strategy to be effective, a business needs to implement a “semantic layer” – a level of process that sits between the data and the people whose job it is to make decisions and helps them understand what the data is telling them.

Say you have a business that sells 100 different products, and you’re able to measure how many of each item is sold. The person in charge of buying products might be able to use that information to make basic decisions about what stock is needed.

But there’s very little there that will give a marketing person a clue about what customers the business should be targeting with its advertising, and even less that will tell an HR person what employees the company should be hiring.

In traditional BI, the solution can be as simple as creating charts and visualizations that put the data into context and highlight the key findings, along with the recommended course of action.

In more advanced cases, such as when we are looking towards using data at scale, organization-wide, to enable machine learning, the semantic layer needs to be tailored towards the specific user that the insights are intended for.

These people – sales staff, marketing staff, HR staff – may very well not be data professionals themselves, but they can benefit from having better access to data or, more precisely, better access to the insights it contains. An intelligent semantic layer imparts meaning to the data end-user in a way that is specifically helpful to them.

Data for everyone

When planning data infrastructure, a guiding principle should be that all information, regardless of where it originates from in the business, needs to be accessible to the whole business.

Traditionally, businesses have often fallen into the trap of keeping data “siloed” within the department or operation where it’s generated. Without a unified structure – such as a data warehouse or data lake strategy – for storing information, it can end up stuck in databases or data formats that others who can benefit from it might not be able to use or access – or even know that it exists!

To illustrate this, data scientists – including Kirk Borne, chief science officer at DataPrime – sometimes use an analogy involving an elephant in a room full of people wearing blindfolds. With only their hands to work out what is in the room with them, one might feel the trunk and say, “It’s a snake,” another might feel the legs and say, “It’s a tree trunk,” and another might feel the tusks and say “It’s a spear.”

Until they start putting together what they know, it’s very difficult for any of them to tell what they are dealing with.

In business, we often have marketing datasets, financial datasets, manufacturing datasets – all valuable within their departments, but putting them together – breaking down siloes and taking a unified approach to data strategy – can potentially make them much more valuable to the business as a whole.

Two approaches to achieving this are known as the data warehouse and the data lake. To put it simply, a data warehouse is a unified repository for processed data, that conforms to standardized structure and labeling, ready for use in BI. The concept was first defined by Bill Inmon – known as the father of the data warehouse – and it is often the foundation of enterprise BI and analytics strategies.

However, as a model, it isn’t always flexible enough when it comes to handling the new and exotic types of unstructured data that businesses need to work with today (more on this in the next section!)

 A data lake, on the other hand, is a unified repository for raw, generally unstructured data that data scientists might find any number of ongoing uses for. Today, Inmon likes to talk about an approach termed “data lake house,” which attempts to build some of the architecture of the data warehouse model onto the data lake – thereby preventing it from becoming a “data swamp”!

Use new types of data

Most businesses have some proficiency at getting insights from very straightforward, simple data, such as structured transactional data. But for the really valuable insights – the sort that can be a differentiator between an innovation leader and an also-ran in a competitive marketplace – we have to be a bit more adventurous these days!

Unstructured data is the sort of data that doesn’t fit neatly into rows and columns of a traditional computer spreadsheet – it includes picture and video data, audio data such as recordings of conversations and telephone calls, and written text, like emails, customer comment slips, and even handwritten doctors’ notes.

Structuring this data to analyze it at scale involves working with advanced, AI-based technologies like computer vision and natural language processing. But considering that this type of data is the most abundant by far – accounting for over 80% of the data generated by business – ignoring it means overlooking what is potentially your most valuable source of insights.

Building a data culture

In these days of cloud platforms and services that can quickly be configured to fill just about any data requirements, a business may have, getting the technology right is the easy part when it comes to leveraging data at scale.

More tricky is getting the human elements right – and this is where culture comes in. Building a data culture means creating an environment where everyone is a stakeholder in moving towards data-driven decision-making, innovation, and growth. Many well-intentioned data initiatives have been grounded because of an insufficient level of buy-in – both at the executive leadership level or among the wider workforce which has to put them into action – or a lack of belief in the value it will bring.

Good ways to start making sure this culture is in place include making sure data is available to everyone and has meaning for everyone (as discussed above), as well as focusing on “quick win” initiatives that demonstrate value with a minimum of invested time and resources. Data infrastructure should be designed to facilitate a culture of experimentation and innovation at all levels, so employees can quickly test ideas and measure results, regardless of their role.

You may also like