Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using custom dataset fails with error None type object #420

Open
nsai1 opened this issue Dec 10, 2024 · 3 comments
Open

Using custom dataset fails with error None type object #420

nsai1 opened this issue Dec 10, 2024 · 3 comments
Assignees

Comments

@nsai1
Copy link

nsai1 commented Dec 10, 2024

Hello, I am trying to execute vectordbbench with a custom dataset. For ease of use, I have used the openAI50k dataset parquet files that were downloaded from a previous run into from the /var/vectordb_bench/datasets.
When i execute this : vectordbbench pgvectorhnsw --config-file custom_config.yml it throws the following error:

INFO: INIT_SEARCH_RUNNER (task_runner.py:259) (3057207)
WARNING: test_data None (task_runner.py:260) (3057207)
WARNING: Failed to run performance case, reason = 'NoneType' object has no attribute 'columns' (task_runner.py:192) (3057207)
Traceback (most recent call last):
  File "/home/VectorDBBench/vectordb_bench/backend/task_runner.py", line 179, in _run_perf_case
    self._init_search_runner()
  File "/home/VectorDBBench/vectordb_bench/backend/task_runner.py", line 261, in _init_search_runner
    log.info(f"test_data {self.ca.dataset.test_data.columns}")
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'columns'

How do I go ahead and use the custom dataset?

@alwayslove2013 alwayslove2013 self-assigned this Dec 11, 2024
@alwayslove2013
Copy link
Collaborator

@nsai1 It looks like you downloaded the code via git clone. Could you show me the current git log? I couldn't find the line in the latest code.

File "/home/VectorDBBench/vectordb_bench/backend/task_runner.py", line 261, in _init_search_runner
log.info(f"test_data {self.ca.dataset.test_data.columns}")

Maybe fetching the latest code directly will help~

@nsai1
Copy link
Author

nsai1 commented Dec 11, 2024

This is the git log

commit 6a634163101396f79272908f7261e69886b0d91e (HEAD -> main, origin/main, origin/HEAD)
Author: Teynar <[email protected]>
Date:   Wed Dec 4 03:09:57 2024 +0100

    Add Milvus auth support through user_name and password fields (#416)

    * tweak(milvus): add auth support through user_name and password fields (like zilliz cloud)

    * tweak(frontend): make username and password for milvus optional fields

commit 4bb299424099190e6e833746bb74a34c9eb9361a
Author: Sheharyar Ahmad <[email protected]>
Date:   Fri Nov 29 18:59:44 2024 +0500

    fix: invalid value for --max-num-levels when using CLI.

This is the error from line 258 on file task_runner.py

2024-12-11 04:50:46,807 | WARNING: Failed to run performance case, reason = 'NoneType' object is not subscriptable (task_runner.py:191) (3188648)
Traceback (most recent call last):
  File "/data1/nsai/vectordbbench2/vectordb_bench/backend/task_runner.py", line 178, in _run_perf_case
    self._init_search_runner()
  File "/data1/nsai/vectordbbench2/vectordb_bench/backend/task_runner.py", line 258, in _init_search_runner
    test_emb = np.stack(self.ca.dataset.test_data["emb"])
                        ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
TypeError: 'NoneType' object is not subscriptable

This is a snippet of my config file

  case_type: PerformanceCustomDataset
  custom_case_name: test_case
  custom_case_description: this is a customized case
  custom_dataset_name: test_openai
  custom_dataset_dir: /tmp/vectordb_bench/dataset/openai/openai_small_50k/
  custom_dataset_size: 50000
  custom_dataset_dim: 1536
  custom_dataset_metric_type: "COSINE"
  custom_dataset_file_count: 1
  custom_dataset_use_shuffled: False
  custom_dataset_with_gt: False
  load_timeout: 1000000
  optimize_timeout: 1000000

If i try the same with a completely different dataset, with files that are populate based on the requirement I get this None type error.

@nsai1
Copy link
Author

nsai1 commented Dec 11, 2024

Closing the issue, directly getting the code helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants