If I perform DDL operations on one node in the cluster and run query/DML on a different node, will the metadata be consistent? How is that achieved?
Yes, in YSQL, the metadata (system catalogs) should stay consistent under multi-node DDL/DML statements.
Similarly it should stay consistent if one or more nodes die (up to the redundancy allowed by the Replication Factor).
The way it works is:
- The metadata is stored persistently in our DocDB storage layer similar to regular tables (except it lives in a dedicated tablet/shard). For instance, metadata updates generated by DDL statements get replicated using Raft consensus).
- The storage layer will also maintain a metadata version that gets incremented consistently by any metadata update (i.e. DDL statement) regardless of the node where it originated.
- The YSQL query layer on each node will cache the metadata and also maintain its own notion of the metadata version that it has in its cache. Then it will send that version along with the query when executing each statement (i.e. the version against which this query was parsed/analyzed/rewritten, etc).
- The storage layer will check the query layer’s metadata version against its own (accurate) version and if it is too old it will return a special “Catalog Version Mismatch” error code.
- If the query layer receives a “Catalog Version Mismatch” error the query layer will refresh its cache to get the new metadata updates (and update its metadata version number). If possible, it will also retry the original query seamlessly – in some cases this is not possible/safe currently so the error is just returned to the client/app which can just retry the query (now that the metadata is up to date).
Note that there are some current limitations.
Mainly, the conflict detection and catalog refresh can be more fine grained. E.g. in case of version mismatch need to be aware of which relations were used to parse/analyze/prepare a statement and which relation’s metadata was actually updated. Currently we may refresh needlessly to be safe.
Improving this is currently in the works.