Using custom dataset fails with error None type object #420

nsai1 · 2024-12-10T22:27:43Z

Hello, I am trying to execute vectordbbench with a custom dataset. For ease of use, I have used the openAI50k dataset parquet files that were downloaded from a previous run into from the /var/vectordb_bench/datasets.
When i execute this : vectordbbench pgvectorhnsw --config-file custom_config.yml it throws the following error:

INFO: INIT_SEARCH_RUNNER (task_runner.py:259) (3057207)
WARNING: test_data None (task_runner.py:260) (3057207)
WARNING: Failed to run performance case, reason = 'NoneType' object has no attribute 'columns' (task_runner.py:192) (3057207)
Traceback (most recent call last):
  File "/home/VectorDBBench/vectordb_bench/backend/task_runner.py", line 179, in _run_perf_case
    self._init_search_runner()
  File "/home/VectorDBBench/vectordb_bench/backend/task_runner.py", line 261, in _init_search_runner
    log.info(f"test_data {self.ca.dataset.test_data.columns}")
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'columns'

How do I go ahead and use the custom dataset?

The text was updated successfully, but these errors were encountered:

alwayslove2013 · 2024-12-11T02:49:00Z

@nsai1 It looks like you downloaded the code via git clone. Could you show me the current git log? I couldn't find the line in the latest code.

File "/home/VectorDBBench/vectordb_bench/backend/task_runner.py", line 261, in _init_search_runner
log.info(f"test_data {self.ca.dataset.test_data.columns}")

Maybe fetching the latest code directly will help~

nsai1 · 2024-12-11T04:59:29Z

This is the git log

commit 6a634163101396f79272908f7261e69886b0d91e (HEAD -> main, origin/main, origin/HEAD)
Author: Teynar <[email protected]>
Date:   Wed Dec 4 03:09:57 2024 +0100

    Add Milvus auth support through user_name and password fields (#416)

    * tweak(milvus): add auth support through user_name and password fields (like zilliz cloud)

    * tweak(frontend): make username and password for milvus optional fields

commit 4bb299424099190e6e833746bb74a34c9eb9361a
Author: Sheharyar Ahmad <[email protected]>
Date:   Fri Nov 29 18:59:44 2024 +0500

    fix: invalid value for --max-num-levels when using CLI.

This is the error from line 258 on file task_runner.py

2024-12-11 04:50:46,807 | WARNING: Failed to run performance case, reason = 'NoneType' object is not subscriptable (task_runner.py:191) (3188648)
Traceback (most recent call last):
  File "/data1/nsai/vectordbbench2/vectordb_bench/backend/task_runner.py", line 178, in _run_perf_case
    self._init_search_runner()
  File "/data1/nsai/vectordbbench2/vectordb_bench/backend/task_runner.py", line 258, in _init_search_runner
    test_emb = np.stack(self.ca.dataset.test_data["emb"])
                        ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
TypeError: 'NoneType' object is not subscriptable

This is a snippet of my config file

  case_type: PerformanceCustomDataset
  custom_case_name: test_case
  custom_case_description: this is a customized case
  custom_dataset_name: test_openai
  custom_dataset_dir: /tmp/vectordb_bench/dataset/openai/openai_small_50k/
  custom_dataset_size: 50000
  custom_dataset_dim: 1536
  custom_dataset_metric_type: "COSINE"
  custom_dataset_file_count: 1
  custom_dataset_use_shuffled: False
  custom_dataset_with_gt: False
  load_timeout: 1000000
  optimize_timeout: 1000000

If i try the same with a completely different dataset, with files that are populate based on the requirement I get this None type error.

nsai1 · 2024-12-11T05:14:01Z

Closing the issue, directly getting the code helped.

alwayslove2013 self-assigned this Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using custom dataset fails with error None type object #420

Using custom dataset fails with error None type object #420

nsai1 commented Dec 10, 2024

alwayslove2013 commented Dec 11, 2024

nsai1 commented Dec 11, 2024 •

edited

Loading

nsai1 commented Dec 11, 2024

Using custom dataset fails with error None type object #420

Using custom dataset fails with error None type object #420

Comments

nsai1 commented Dec 10, 2024

alwayslove2013 commented Dec 11, 2024

nsai1 commented Dec 11, 2024 • edited Loading

nsai1 commented Dec 11, 2024

nsai1 commented Dec 11, 2024 •

edited

Loading