When will index AMs such as `hnsw` and `ivfflat` be added to vector indexes for external use to fully enable end-to-end support for vector indexes?

ZhenNan2016 · July 2, 2024, 12:30pm

Hi
I understand from the following documentation that our ybdb does not support distributed vector storage. May I ask what is the plan for this feature?

github.com/yugabyte/yugabyte-db

[#22195] YSQL: Vector index creation/read/write from YSQL side

committed 04:45PM - 21 May 24 UTC

tanujnay112

+1232 -6

Summary: This change implements the YSQL side of vector index creation. This dif…f adds support for index creation statements with a dummy ANN method called `ybdummyann` for now in the form ``` create extension vector; CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3)); CREATE INDEX ON items USING ybdummyann (embedding vector_l2_ops); ``` This creates an inverted index in DocDB with a schema that looks like `BaseYBCTID | embedding |` With only `BaseYBCTID` as the key. We can do an index ANN scan based on certain query vector such as ` SELECT * FROM items ORDER BY embedding <-> '[1.0, 0.4, 0.35]' LIMIT 5; ` or an index only scan such as ` SELECT embedding FROM items ORDER BY embedding <-> '[1.0, 0.4, 0.35]' LIMIT 5; ` Note that the results from a `ybdummyann` index won't actually be sorted by their distance from the given query vector as the DocDB side of vector indexing has not been implemented. This is made clear by the following client warning when such an index is created. In the future, when we fully have end-to-end support of vector indexing we will add index AM's such as `hnsw` and `ivfflat` meant for external usage. ``` WARNING: ybdummyann is meant for internal-testing only. It does not yield ordered results. ``` When a vector index is created, a message of type `PgVectorIdxOptionsPB` found in `common.proto` is populated into `IndexInfo`. A log message has been inserted into `tablet.cc` to show how this can be accessed. Vector index scans populate a field of type `PgVectorReadOptionsPB` in the `PgsqlReadRequestPB`. The relcache preloader is adjusted to not load index relations whose user-defined AM handler procs might not be loaded yet. A new access method handler called `ybdummyannhandler` is created by this diff. Any future vector index AM/AM handler will share functionality very similar to `ybdummyann`. For this reason, this common functionality is all placed in `src/ybvector/ybvector*`. The main remaining TODOs after this change are: - Build out DocDB side. - Add capabilities to mergesort rows from tablets based on their distance from the query vector. - Add an extra key column to denote future sharding information of each row. - Allow included values. - Allow a mix of vector and non-vector key attributes. **Upgrade/Rollback safety:** This adds vector index protobuf fields that should not be used by anybody production customer right now. Jira: DB-11118 Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressThirdPartyExtensionsPgvector' Reviewers: timur, jason, mbautin, sergei Reviewed By: timur, jason Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D34200

One of them is mentioned:
“Note that the results from a ybdummyann index won’t actually be sorted by their distance from the given query vector as the DocDB side of vector indexing has not been implemented. This is made clear by the following client warning when such an index is created. In the future, when we fully have end-to-end support of vector indexing we will add index AM’s such as hnsw and ivfflat meant for external usage.”

dorian_yugabyte · July 2, 2024, 12:45pm

Hi @ZhenNan2016

Why make another issue when you can ask in the github issue?

Issues being worked on for the near term are mentioned in the roadmap GitHub - yugabyte/yugabyte-db: YugabyteDB - the cloud native distributed SQL database for mission-critical applications.

dorian_yugabyte · July 3, 2024, 8:43am

Moved in When will index AMs such as `hnsw` and `ivfflat` be added to vector indexes for external use to fully enable end-to-end support for vector indexes?[yugabyted] Title · Issue #23095 · yugabyte/yugabyte-db · GitHub

ZhenNan2016 · August 29, 2024, 3:43am

Hi, @dorian_yugabyte
Excuse me.
I have an urgent need to use hnsw index. I asked a question on the relevant github, and also @ mbautin, but he didn’t respond.
Can you help me to confirm when this hnsw index can be supported?

github.com/yugabyte/yugabyte-db

When will index AMs such as `hnsw` and `ivfflat` be added to vector indexes for external use to fully enable end-to-end support for vector indexes?[yugabyted] Title

opened 03:22PM - 02 Jul 24 UTC

ZhenNan2016

kind/enhancement area/docdb priority/medium

Jira Link: [DB-12030](https://yugabyte.atlassian.net/browse/DB-12030) ### Descr…iption Hi I understand from the following documentation that our ybdb does not support distributed vector storage. May I ask what is the plan for this feature? https://github.com/yugabyte/yugabyte-db/commit/5ca67e496d8c40cdef56c71513ef6d3b6d630596 One of them is mentioned: “Note that the results from a ybdummyann index won’t actually be sorted by their distance from the given query vector as the DocDB side of vector indexing has not been implemented. This is made clear by the following client warning when such an index is created. In the future, when we fully have end-to-end support of vector indexing we will add index AM’s such as hnsw and ivfflat meant for external usage.” Thanks a lot. ### Warning: Please confirm that this issue does not contain any sensitive information - [X] I confirm this issue does not contain any sensitive information. [DB-12030]: https://yugabyte.atlassian.net/browse/DB-12030?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

github.com/yugabyte/yugabyte-db

[DocDB] Create our own experimental HNSW implementation

opened 12:18AM - 03 Aug 24 UTC

mbautin

kind/enhancement area/docdb priority/medium status/awaiting-triage

Jira Link: [DB-12298](https://yugabyte.atlassian.net/browse/DB-12298) ### Descr…iption Create our own experimental HNSW implementation. - Initial implementation will index vectors in memory. - DocDB integration can be added gradually. - Parts of the experimental implementation can be moved to the production implementation. - Command-line tools and benchmarks, with the ability to tune parameters, can be added on top of the experimental implementation. ### Issue Type kind/enhancement ### Warning: Please confirm that this issue does not contain any sensitive information - [X] I confirm this issue does not contain any sensitive information. [DB-12298]: https://yugabyte.atlassian.net/browse/DB-12298?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Thanks a lot.

FranckPachot · August 29, 2024, 6:17am

Hi @ZhenNan2016, It is being worked on, but I have no dates. Can you explain your use case (here or fpachot@yugabyte.com)? Knowing about user cases may help increase the priority or get more precise roadmap

ZhenNan2016 · August 29, 2024, 6:31am

Hi @FranckPachot
I have some business scenarios on my side, including but not limited to the following:
Finance, e-commerce and other fields related to text, audio, video, images and other media resources, have ANN search or KNN search requirements, and the storage and retrieval of these resources are expected to reach the level of billions.
Thanks a lot.

FranckPachot · August 29, 2024, 7:19am

If some of those customers may need support, we can involve pre-sales, and they may help increase the priority. If it is open source users, follow the git issue for updates on the roadmap.
Are they already on PostgreSQL or others? Or is is new application?

ZhenNan2016 · August 29, 2024, 7:46am

Most of these users are open source users, including me.
Some of these users are already on PostgreSQL, some are new applications.
Thanks a lot.

Topic		Replies	Views
Extension support General	1	33	August 8, 2024
How should we expand pgvector General	3	399	September 2, 2023
Why are there already 8 tablets in the wals sub-directory under the data directory after a database cluster is initialized? General	33	109	July 31, 2024
Query performance on table with indices General	10	1354	March 18, 2022
CUSTOM INDEX Support General	1	737	May 12, 2020

When will index AMs such as `hnsw` and `ivfflat` be added to vector indexes for external use to fully enable end-to-end support for vector indexes?

Related topics