Technical Level: Intermidiate

Realising the Potential of Data Whilst Preserving Privacy with EyA and Conclave from R3

There are vast gaps in data spanning industries and gathering into a high quality data lake for analysis is extremely costly and often mistakes lead to incorrect data intelligence

Data is the new gold being actively sought after by many organisations and data scientists for analytics

People and companies are more aware than ever of the issues related to privacy and sensitive data, containing personal information, intellectual property and cross border data leaking.

EyA has developed an enterprise grade platform to embrace rich data relationships using the template engine and various levels of asset associations / hierarchy. With this in mind, EyA is now developing a full data analytics platform to enable organisations and people to “loan out” suitable data for research and development purposes. This is labelled Data as a Commodity (DaaC) and this paper discusses the use case and benefits of the platform, along with how privacy is completely intact with a federated machine learning framework using Conclave from R3 built on the Intel SGX solution.

References from other EyA technologies utilised within this paper are as follows

As a prerequisite to this paper, it is suggested to read the above references in order to understand the fundamentals of this discussion.

Dynamic Relationships in Data

Due to the hierarchal nature of both the templating of EyA and parent and child, along with horizontal relationships of assets, an organic relationship between assets of many different types is formed naturally. This takes on the form of literally the impossible, a dynamic relational database growing relationships organically spanning nearly all industries and scientific disciplines injected into a distributed ledger Template Derived Relationships

Template Derived Relationships

The diagram above depicts a very simple extraction of data from templates within EyA. The actual template data is vast and would be far too much to display for this case study. One can clearly see the flow of inherited properties from lower level templates, through to those templates used in a production environment, with the properties in green being inherited from parent templates and the properties in black being unique to the current template.

Very quickly, it is possible to derive relationships both within a template block; for instance the shared properties within the life form block, thus a cow and a human both contain many identical properties (e.g, type, gender etc). However, it is also possible to see that the life form block and vehicle block contain properties derived from the lower level templates.. Thus, we are able to immediately see that during data analysis, it would be extremely simple to use relational database methods to link up the data sets.

Asset derived and associate derived data relationships

Asset Derived Relationships

In the diagram above, a very simple asset inheritance is depicted for both the plane and lorry. A common feature here is the child asset, being an engine. Within this, there is a new common property, marked in red being the engine emissions. So, with this in mind, there is now a number of relationships which can be analysed. For example, it would be possible for an emissions analyst to run complex analytics on the lifecycle of both engines using the relationships during the entire lifecycle of both engines against other asset information derived from a completely different sector, including environmental information etc. The sharing of data across two different sectors could provide key information which may reduce emissions, or prolong the life of an engine based on another engine type, or how the environment has impacted the efficiency etc.

Rather than try to source environmental data , which would be difficult to pinpoint the entire history of locations, driver type etc., the data platform of EyA is connecting the asset associations, which naturally develop complex datasets of the organic associations made during the lifecycle of anything.

Asset Association Derived Relationships

The diagram above depicts a simple, but typical set of associations for a single driver hauling a cow on their lorry. During a given journey, there would be many touch points and continuously new and of course, old associations taking place. A data scientist could derive the data of any given journey and align against data stored in EyA by a separate entity; on this occasion an organisation storing environmental data. Relationships naturally occurring through the asset associations and the common properties within the environmental asset can be joined for complex analytics.

Of course, this scenario would not be useful for environmental analysis, but could be for many other cases including driver fatigue, animal welfare, tyre wear in different conditions etc.

The value of data within the EyA platform

Commoditising Data

The rich sources of data entering EyA via people and companies is what empowers the ability to derive data for analysis. As the EyA platform is completely industry agnostic, vast amounts of overlapping data are entering into the lake, but as previously discussed, common relationships are forming globally on the platform.

In order to commoditise the data, it is important firstly to have a mechanism whereby the data can be classified for use. As such a company wishing to commoditise their data within the EyA platform must enrol in the data programme and submit a request for our service to analyse their data for accuracy, cleanliness and worthiness, rather than self-electing.

As discussed in previous papers, data within EyA is marked as public, private or granularly scaled by a company when subscribing to any given template/s. The data “in the middle” will be the most likely to be of a certain value to those organisations wishing to analyse it. This data in most cases will be of high sensitivity to the “owner” and it’s possible leaking even to those hiring it can cause the seller to not wish to disclose it. Instead with EyA, Conclave from R3 is used, where the data can be analysed and categorised without parties ever having to actually “see” the data. The data is moved into a secure container and enclaves operate on the data within the Intel SGX architecture. Upon completion, the data is no longer available within the container.. The value of the data just as with any market then increases, or decreases in value through demand and willingness to pay.

As with markets, a data purchaser may also wish to either purchase the rights to analyse a dataset just once on a spot contract, or choose a future contract with a given date of commitment.

Sourcing Data

An organisation wishing to source datasets can do this through analysing the template structure within EyA. At this point, they are not able to see any private information, as the templates are simply empty structures from which assets are created. However, they are powerful in the fact that not ony do they provide the key relationships of both template relationships, but also in many cases on how an asset will be created with parent / child relationships and even to a large extent associations.

EyA will continuously add logic to the sourcing engine based upon the hiring of data and the analytics executed when our scoring engine analyses data for sale.

Data Gamification

Many businesses are still operating in extremely inefficient ways, utilising Excel spreadsheets and other archaic methods for running their daily operations. This large loss of valuable data is ignored by many as their system “just works”. Through the understanding of the capability to earn extra revenues businesses will be incentivised to adopt changes to their daily operational processes.

Human Data for Sale

As people bond templates to their digital twin through the use of various applications, their own lifecycle becomes enriched with extremely valuable data. However, privacy is of key importance to the EyA platform and our digital civilians. At no point in time will any organisation or body be able to access any data of a given person unless it is based on data which their application has access to through the bonding of a template.

People are still open to loan their data though and be paid accordingly. EyA with Conclave provides the solution to allow organisations to hire data from the masses, whilst never being able to view any personal or sensitive data at all.

Sourcing Human Data

Exactly as with the sourcing of data from organisations in the market, companies can source data from the templates which are actively being bonded to people. This may be medical records, use of a vehicle, ownership and use of assets or activity templates. Again, the templates are empty, so no information can be derived.

When an organisation wishes to source a given dataset, people who have opted in for data loaning will be notified if their template profile matches that of the organisation requirements. The organisation will never know the person directly.

The Analysis Process

Once an organisation has completed the hiring of data from a person or company, their pre agreed algorithms are executed in enclaves within Conclave and the results returned. At no given time are they permitted to see any data and no private data is returned in the data set of results. The data hired is never seen by the organisation during any part of the process.