Big Data London, the annual data and analytics event, this year hosted leading industry expert’s eager to impart their knowledge to help arm attendees with the tools and knowledge to drive any organisation’s data-driven strategy and approach. As always, there was a huge breadth of topics covered but this year there was a strong focus on Data Fabric Architecture and Azure Synapse Analytics. Other highlights included fabulous sessions from Snowflake and Cloudera, the Cloud Data Platform providers and much fanfare around the leading Business Intelligence (BI) providers.

There were several key themes to come out of this year’s event – many of which were highlighted in a panel discussion at the end of day one including:

  • The management of Machine Learning (ML), Cognitive Artificial Intelligence (AI) and tooling to support these advanced analytics workloads, especially with high volume streaming data.
  • Cloud, multi-cloud and the push for data centralisation in the cloud vs data virtualisation.
  • Data platforming and data governance with the emergence of robust tooling to support the implementation of a data strategy.
  • Containerised computing and Kubernetes, and the impact this architecture is having on compute resources and analytics.
James Mandikos

James Mandikos

Head of Data & Analytics

Implementing a data strategy

As the Data & Analytics lead at Keytree, coupled with my passion around all things related – much of my focus over the two days was wider data architectures and tooling focusing specifically in this area. Many speakers spoke at length about the importance of organisations having a data strategy in place, with the express aim of building trust in data. Discussions also looked at the tooling on offer to support the implementation of a data strategy, data engineering plus data governance, metadata management and data cataloguing (data profiling and semantic tagging). These topics were all covered in a variety of sessions and one such session featured Sue Laine from ASG Technologies who spoke about the importance of providing trusted insights across the ‘distributed enterprise’ through a combination data cataloguing, data lineage and governed access through multiple reporting environments.

The theme of data trust continued throughout the first day and Horia Selegean from BT spoke about how trust is linked to financial return arguing that BI alone has only realised so much financial gain and that further EBIT realisation can only come from bridging the ‘Trust Gap’.

Bridging the ‘Trust Gap’ allows for advanced analytics capability to be unleashed – empowering expert data scientists resulting in growth along the analytics maturity curve towards cognitive ML. This approach allows firms to realise a competitive advantage and resultant financial returns. BT has successfully implemented the Collibra Platform as the platform to unlock the value in BT’s data and to build the trust in data. Similarly, Tibco showcased their Unify platform aimed at achieving similar data management successes through best of breed data management tooling while Cloudera spoke about the Data Hub component of its platform.

Keytree have worked closely with SAP’s Data Hub – a tool which provides many of the features I’ve outlined, across SAP’s tooling estate. Similarly, SAP’s Data Intelligence in addition provides the management of complex ML workflows through integrating Python and Jupyter notebooks to build complex ML and AI workflows. To learn more about Keytree’s position take a look at Richard Benson’s interview at SAP TechEd India on Data Intelligence. I for one fully expect Data Hub and Data Intelligence to become major components of SAP’s data and analytics strategy.

The Great Data Debate rounds off day one

The keynote session at the end of day one, dubbed the Great Data Debate, featured speakers from Microsoft, Thought Spot, Tibco and IBM covering hot topics such as the challenges and complexity which come from data and analytics in the cloud and multi-cloud, the importance of data architecture and strategy, cloud data warehousing, data ops and continuous deployment and automation and AI.

Much of the debate centred around how cloud and multi-cloud have led to more fragmented or disparate data landscapes for organisations and the panellists spoke about how this has led to a tendency towards data centralisation in the cloud. However, Dan Streetman, CEO at Tibco, shared details on the Tibco approach when addressing organisational data complexity and challenges, one that advocates data virtualisation i.e. keeping data at source and at rest, throwing compute power to where the data resides to avoid data issues resulting from moving data and to drive high performance analytics. Dan also spoke about the importance of getting data governance, metadata management and having the right data strategy in place before applying compute and augmented intelligence to data in a virtualised data landscape.

The panel spoke about how ML workflows are ‘birsty’ and require large storage and compute resources to complete complex ML tasks, and how cloud meets this variable and elastic demand. The recent trend of separating compute from storage has led to greater flexibility in addressing this ‘birsty’ demand. Other trends discussed by the panel included the importance of containerised computing and Kubernetes as the fabric to tie multi-cloud together, how ML/AI workflows and streaming multi-function analytics will support the data-driven enterprise and the importance of analytics tooling providing a richer user experience.

Amit Prakash co-founder and CEO of Thought Spot covered this last point highlighting how easy it is to blame culture – people often want to do the right thing and provide value in an organisation, however, tooling and lack of usability of tooling has often been the cause of data problems and an impediment for organisational aspirations to become more data-driven. The proliferation of Shadow IT, which Gartner estimates at 40% of IT spend, has thrown a challenge to IT departments to provide a higher standard of service and provide usable tooling to ward off this threat.

One interesting comment, that concluded the event, noted how “kids are coming out of college” armed with the latest open source tooling and have been able to use these tools to great effect. CV’s are now augmented with Git repositories and that the use and tendency towards free and open source tooling will significantly challenge current thinking in IT departments and analytics architectures.