A data-driven organisation seeks to collect data across all of its activities in order to make better-informed decisions. In some cases, however, that data does not originate in its own systems. It is valuable to an organisation, for example, to have visibility of the movement of goods or materials along its supply chain and, in turn, its goods or services may be part of somebody else’s supply chain. It is also becoming commonplace to incorporate industry data from a 3rd party provider into the decision-making process.
Being able to provide data relevant to somebody else’s business has a value, whether monetary or in terms of cementing a relationship.
Sharing data with 3rd parties is not new. From simply exporting it to Excel and emailing it (still an incredibly common mechanism) through to purpose-built messaging systems, such as EDI, or business to business (B2B) information exchanges, there are a number of mechanisms available.
In a world where data is regarded as an asset and where GDPR brings considerable responsibilities, there is an enhanced focus on security and governance when it comes to data sharing. Emailing Excel spreadsheets just doesn’t cut it anymore; transferring files via FTP etc. is a point-to-point solution that doesn’t scale well and is difficult to govern; EDI and B2B information exchanges are fine for repeatable, structured, machine to machine messaging, provided the cost of implementation is acceptable, however none of these approaches provides the flexibility and low-latency required to service data-driven decisioning requirements.
One approach that is gaining traction to enable more flexible information sharing is the concept of Data Services. This involves the information provider publishing an API that allows an authenticated connection to gain access to data, which is governed and protected with robust security and access control mechanisms. This is a significant improvement on the methods mentioned above but it does have limitations in that coding is required to handle the data (which is usually returned in a semi-structured file format such as JSON) and it is not ideal for high volumes of data.
Snowflake is a cloud-native data warehouse platform which has taken an innovative approach to the problem of sharing data. It has introduced a mechanism (which it calls “Data Sharing”) which allows a Snowflake account owner to publish a read-only view of any of its tables, secure views or user defined functions (UDF) to other Snowflake accounts – whether in the same organisation or a 3rd party. Perhaps more interestingly, it also allows the publishing (“Provider”) account to create “Reader” accounts that permit organisations that are not Snowflake customers to log into the Snowflake platform and access data to which they have granted them access.
An existing Snowflake account holder with whom data has been shared creates a read-only database in their own instance of Snowflake using that share and thereafter treats it as just another Snowflake database.
Whether accessing via a Full or Reader account, no data is copied and effectively a remote view of it is being granted. This has several major advantages:
Updates in the provider database are immediately available to the consumer
Access permissions can be amended or revoked at any time by the Provider
Consumers do not incur storage charges
This approach is possible because of Snowflake’s unique architecture which separates storage, compute, metadata and an orchestrating service layer.
Taking this a step further, Snowflake is looking to commoditise the concept of data sharing via the Snowflake Data Exchange. This is intended to be the App Store equivalent for data broking with potential consumers able to subscribe to data sets made available via the portal by Providers.
A fascinating initiative which we will watch with interest.