Tech Architecture
Last updated
Last updated
Polyzoa’s scope is to create a distributed database and make the data Polyzoa and other produces freely available for web3 actors to use.
Polyzoa solution involves different layers.
Producer Layer: produces the data to be distributed in the network; data producers could use different sources and techniques to produce the data; data will be propagated on the network when a quorum is reached on the validity of the data itself (more to follow)
Data Storage: the overall bulk of data will be stored in a traditional database, with the possibility to have distributed back-ups in different systems; from this database, data are distributed to the distribution layer.
Distribution layer: it is responsible for make the data available across the network and also to keep them up to date.
Consumer layer: handles the access to the distributed system to various registered consumers
Governance: it will keep track of queries and updates, and charge, reward or discipline actors, either producers, operators or consumers.
Oracle: handles all the request coming from smart contracts, allowing the data to flow into the chain; each supported chain will have its own oracle system
Below a vision of how the overall system will look like in its final form.
Each layer is composed of different components:
Each data producer analyses and publish data about a particular address or family of addresses; each producer can use any techniques or methodologies for analysis; this data are then passed to the network for validation.
Data producer can either:
produce data using their own model/heuristic (eagle nodes)
produce data using Polyzoa model (gargoyles nodes)
produce data using and training Polyzoa model (wizards nodes)
Data produced must adhere to a common schema wen producing results.
A producer will be rewarded for the data produces and sanctioned if it produces low quality data.
It collects the data produced by the producers and validates the correctness of it; to evaluate correctness a quorum mechanism will be used: a data is considered correct if a majority of the producers reach the same results within a given errors; results outside of the error windows are discarded.
More regarding the validation mechanism can be found in the appendices.
Once a results is accepted as valid, it is written in a database and propagated by the distributor to the network.
The distributor is responsible for
registering new distribution nodes into the system
removing distribution nodes from the system
creating data patches to be distributed to the various nodes; it is responsibility of each node to retrieve the patch and update its local database.
The nodes are at the core of the distributed database.
Each node must stake an amount of token as proof of acceptance of the network protocol rules, and will be rewarded based on the traffic it handles.
Each node has it is own database, and it is its responsibility to keep the database up to date and to handle the queries sent to them by the gateway. A node is not responsible to produce data and it is anchored on the content of the database. It is its responsibility to keep the database up to date.
The nodes will be organised in shard's groups. Each shards group will give access to a particular subset of data.
The gateway it is the interface between the consumers and the nodes.
It contains the routing tables to propagate a query to the relevant node's group and to redistribute the results.
When a consumer query the gateway, it will select a node that will receive the query and provide a response for the consumer.
A node is selected based on its availability and its previous performance history. It is responsibility of the gateway to select nodes in a way to balance performance and fairness.
The gateway will also collect statistics about the queries submitted to each node, the number of nodes actually online and the availability and performance of each of them. These data will be provided to the governance to decide rewards and sanctions.
In general producers calculate the risk score of an address and submit it for validation before the score end up in the global database.
For special cases, we can need the score to be immediately available.
For this purpose, special producers can be selected to provide such data in real-time without need to submit them to validation before distribution.
The data are sent to the network as they are but they will be included in the network database only after the validation.
Governance will take care of the consumers and nodes, charging the first for usage and rewarding or sanctioning the seconds.
All governance will be on-chain and the governance smart contract will have access to the logs generated by the distributors (i.e. patches made available) and by the gateway (i.e. nr of queries).
Consumers will be charged for the number of queries they performs, either simple or batch queries.
Nodes will be rewarded for the number of request they serve, according to different measurement and service level agreements, such as:
idle time / availability
performance
number of query served
Nodes will be sanctioned if they:
fail to be available for a long period of time
fail to update the database, so providing obsolete data
…
Oracles can distribute data directly in the network. In such a case, the oracle will be accessed through a special token implementing ERC677 (transferAndCall), so to collects fees from the consumer contracts directly.
Oracles are deployed for each supported network and will serve a network of special nodes that will react to the event produced by them.
Initially there will be a simple networks, with one single producer and a few nodes that will serve the consumers.
While this network is small, it still maintain the same features described above.
This network will allow us to fine tuning the system in order to achieve the better balance for the distribution network.
We will introduce the oracles in one selected network, and create a distribution system able to work seamlessly between off-chain and on-chain network.
The new network will use the existing one for accessing the data and make them available to smart contracts to reduce the impact of malicious address directly within a transactions.
In phase 3 we will open the network to multiple external producers.
In this phase, we will define the protocol that will govern the producers' network and that producers have to adhere to. As we plan to work wit multiple entities, we plan to make this job a collective one.
In order to smooth the process, we will need to approve a schema that would make sure we have the right data at the right place.
Also we need to make sure that each producer is able not only to accept request for handling address but that is also able to propose unknown address to the network.
In the latest phase, we will enhance the network with real-time producers.
In order to be a real-time producer, one have to have a record of successful discoveries, and this will be possible only if we have an history from which to grab the data and analyze them.
A real-time data producer will make their discovery directly available to the network when the needs arise, without need for previous validation. In such a case, the produced data are not stored but passed to the consumer directly.
The produced data will still be subject to validation, but this will happens before they will be accepted in the network database.
In order to achieve this, we need to strengthen our trust in the producers.