big-data-tpr-header.png
 Image: Getty Images/iStockphoto

To get an overview of the state of play in big data, Tech Pro Research recently talked to Sumit Nijhawan, CEO and president at data integrity and data governance solutions specialist Infogix. Founded over 30 years ago as Unitech Systems, the company morphed into Infogix in 2005, at the beginning of the big data era. Its most recent corporate activity was the February 2017 acquisition of Data3Sixty, a leading cloud-based data governance provider.

Back in November 2016, Infogix offered its Top Ten Transformative Data Trends for 2017, issuing a mid-year update in August. We began with Infogix's crystal ball-gazing...

Infogix recently updated its big data predictions for 2017; was that because of unexpected developments, or some other reason?

big-data-17-sumit-nijhawan.jpg
Sumit Nijhawan, CEO & president at Infogix
 Image: Infogix
"In the first six months of the year we've been engaging quite a bit with our customers, and that shaped our go-to-market strategy and our product strategy. Based on what we were observing in the market with our customers, it just made sense to update the trends discussion -- especially around the big data governance space."

"Almost every customer I go to has a big data initiative, and many projects start with a lot of momentum, investment and 'buzz'. But the progress they've made, the value they're getting out of their investment, often does not meet initial expectations."

"Some things we are working on with our customers, which we think can be transformative, are a combination of data governance, data preparation, self-service and smaller data lake deployments."

Would you say that the main bottleneck in extracting insights from big data is actually in discovering the valuable data that companies have, and making it available for analysis?

"Yes, most of the focus has been to provide the storage environment -- Hadoop -- and let everyone dump whatever data they can into it. Two things are missing here: first, what's really the end goal and objective of what they're dumping into Hadoop? And second, even if the data is there, it's not governed, it's not searchable, it's not findable, and it's not there in a way that draws consumers to the data and helps them get value. It's very IT-dependent, still requiring very technical people to work on it -- that's not how you'll get value out of these investments."

You mentioned Hadoop. Is that still the key product in the big data space?

"Big data is not just about Hadoop, but I would also predict that Hadoop will remain a component of big data initiatives for a while because enough investment has gone into it. And quite honestly, as a technology, it meets the need. But there's also a whole ecosystem around it: you have Spark, you have big data databases like Cassandra, MongoDB and HBase -- all Hadoop is providing is storage and a framework on top of that. That's how these technologies evolve."

Is it fair to say that most companies are still struggling to extract business value from their investments in big data technology?

"That's right, because it all started as an experiment, and the experiments have gone somewhat OK. But how do you operationalise these experiments into something that truly provides business insights in a sustainable way? That's what everybody is struggling with."

"It's almost like saying: 'Here's a bunch of products that I have, which I've put in this giant warehouse in a remote location. Now tell me what you want and I'll ship it to you'. But people don't know what's in your warehouse, or how to get at it. You almost need that layer of e-commerce on top of it -- that Amazon-style website. Amazon has many warehouses, but what makes it work is the e-commerce paradigm they've put together: anyone can visit, find what they want, drag it to their shopping cart, assign a value to it -- Amazon even recommends to the consumer what they should try and get. And behind the scenes there's a supply chain that works, that ships the product to the consumer."

"What needs to happen with big data, what's missing, is that 'Amazon' website on top of it, and the automation of the supply chain behind it."

So is there a disconnect in many organisations between 'the business' and IT? Do organisations need to foster a 'data culture', in which business units know how to ask the right questions of the data, and generate insights themselves?

"We certainly need more of a business-driven data culture. It's not that the IT guys don't want to share: it's just that they have these tools and they feel like they're doing a good job, but they don't really know what the end goal is. That's why, unless it's a business-driven initiative, it's hard for it to materialise into anything meaningful."

Is there perhaps a missing link in many organisations -- a Chief Data Officer (CDO), who can link between the C-suite and business units, and the IT department?

"There's absolutely a missing link, but I wouldn't say it's just about one person. The 'data culture' just mentioned is about people, processes and technologies, along with the data itself. It's really about the end-to-end process: here's how I'm going to source my data; this is what I'm going to do with my data; and this is how I'm going to deliver my data. That end-to-end process needs to be initiated by a business sponsor, which certainly could be the CDO. The problem with the Chief Data Officer paradigm today is, it's almost a bureaucratic position in many organisations: the CDO supposedly has influence, but has ended up becoming the person that vendors go to to pitch their technologies, rather than someone who's there to meet business objectives."

"Many Chief Data Officers today have come from IT -- that's the mindset that they bring. But you really need to have somebody with a business background."

When you talk to customers, which data-related skills are currently most in demand? Some analysts have detected a softening in the demand for data scientists, for example...

"I think demand is softening, but it's not because there's a plethora of data scientists out there: it's more because existing data scientists haven't been able to deliver the value that businesses want. So the question becomes: 'What's the point in recruiting more data scientists if I'm not getting value? Why can't I have my operational folks, my day-to-day data analysts, take on more of this work?' And quite honestly, they can, because 80 percent of the problems that data scientists address can be solved by maybe 20 percent of the algorithms -- and those algorithms can be exposed in easy-to-use ways that data analysts and business analysts can incorporate into operational and business processes. I think more of that is happening, and the result is less demand for data scientists."

We hear a lot about 'self-service' analytics, allowing even less expert people to get involved. Where do you think we are along that road?

"Along those lines, what we're doing with our customers is, we're going and seeing where they've had data lake initiatives -- big data with Hadoop, Cloudera and all of that -- and saying: 'Maybe you don't need any of those open-source technologies that you're spending months and millions of dollars integrating. We're going to give you an end-to-end appliance for big data that's completely self-service enabled -- everything comes integrated, and all you have to do is consume data and unleash your business folks, data scientists, whoever.' That's getting a lot of traction in the market. I don't know of another competitor who is actually providing a single end-to-end environment with Hadoop embedded, so that it becomes a 'black box' to the customer."

Everyone is talking about machine learning and AI: how do you think it's impact will play out in the big data space?

"It's been around for a while, but there's currently a lot of buzz associated with it. But it's like I said earlier: 80 percent of the problems can be solved by 20 percent of machine learning algorithms such as segmentation, recommendation, classification, regression and forecasting. One area where we're seeing a lot of traction is big data quality, where traditionally data quality has been about specifying exact matching rules and duplicate rules, and all of that stuff. Now the data volumes are so high, and people are throwing more data into the data lake, they don't necessarily know what the exact rules are. Instead, we're using machine-learning algorithms, such as segmentation and classification, to find outliers for instance. That's where machine learning is already adding a lot of value -- but again, you don't need very sophisticated data scientists to do that."

One of your top-ten trends for 2017 was 'The Move toward Informatics and the Ability to Identify the Value of Data'. How do you define the differences between 'informatics' and data science/engineering?

"The key difference here is, you have to add the monetisation of data -- the value of data -- to informatics, as traditionally defined. Once you add that, I feel it's truly informatics. Again, while everyone talks about the value of data, they talk about it very subjectively. Some research we've initiated is really thinking of data almost as a balance-sheet component, where data can be an asset but also a liability. So the different data sets that a company has should be looked at from the lens of: what would make it an asset and what would make it a liability, and assign value to it. That's an area of research for us, and that really completes the informatics picture for me."

How do you think the proliferation of IoT devices feeding ever more data into companies' systems will transform the big data market?

"Right now they're a driver for investment into big data, but at the end of the day they're another source. I think the technology is already there to interpret this data, although you do need a bit more business acumen to make sense of it and understand the implications of what's in it. That's where some of the evolution will happen: can the data tell you what you don't know, rather than requiring someone who knows what should happen, or what could happen? The auto-discovery of conclusions and interpretations is where I think more work needs to be done."

Do you think any incumbent big data technologies will decline during 2017 -- is this a watershed year in any way?

"No, I think 2017 will continue the way it is. More projects are likely to fail, and then people will realise they need to have the right culture in place. Only then will they see success."

Finally, do you think that, with the advent of self-service tools and the increasing involvement of non-specialists and even 'citizen data scientists', there's a democratisation process going on in big data?

"I do think that will happen: it's the only way that investment in 'big data' can be sustained, and value realised -- there is no other option. And there are enough people, both in the IT and vendor world, who will force the issue and find ways to do that. It might be three to five years away, but I don't think much beyond that. In three to five years, people won't talk much about 'big data': instead they'll talk about the outcomes of the big data that's being delivered in a self-service kind of way."

Read more