Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinecone Import: Multiple matches for FieldRef.Name(__filename) in id: string #70

Open
chriscow opened this issue Mar 29, 2024 · 3 comments

Comments

@chriscow
Copy link

chriscow commented Mar 29, 2024

Attached is one of the parquet files generated from a Pinecone export. When I try to re-import I get these errors regarding duplicate fields.

Multiple matches for FieldRef.Name(__filename) in id: string vector: list<element: double> __filename: string __ingested_at: string content_id: string filename: string ingested_at: string text: string __fragment_index: int32 __batch_index: int32 __last_in_fragment: bool __filename: string

i2.parquet.zip

@dhruv-anand-aintech
Copy link
Member

Thanks for reporting! Will have a look soon

@dhruv-anand-aintech
Copy link
Member

I have pushed a potential fix to the export script (can't test since I'm not sure how the __filename is showing up in the vectors as well as metadata).

You can install the latest version of the package: vdf-io==0.1.232 and try exporting your dataset again. Please let me know if that works. Thanks

@chriscow
Copy link
Author

Well after exporting for 2 hours, it failed with the below. To get around my original problem, I modified the code to use JSON as the output since it is human readable and easy to fix. Worked great.

Final Step: Fetching vectors: 196it [00:03, 60.60it/s]        | 0/1 [1:53:47<?, ?it/s]
Fetching namespaces: 100%|████████████████████████| 624/624 [1:53:48<00:00, 10.94s/it]
Fetching indexes: 100%|█████████████████████████████| 1/1 [1:53:49<00:00, 6829.88s/it]
Error: 1 validation error for VDFMeta                          | 0/78 [00:00<?, ?it/s]
authorStep: Fetching vectors: 100%|███████████████████| 78/78 [00:02<00:00, 33.79it/s]
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.6/v/string_type
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf_cli.py", line 64, in main
    run_export(span)
  File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf_cli.py", line 131, in run_export
    export_obj = slug_to_export_func[args["vector_database"]](args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf/pinecone_export.py", line 118, in export_vdb
    pinecone_export.get_data()
  File "/usr/local/lib/python3.11/site-packages/vdf_io/export_vdf/pinecone_export.py", line 448, in get_data
    internal_metadata = VDFMeta(
                        ^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 171, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for VDFMeta
author
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.6/v/string_type
Final Step: Fetching vectors: 156it [00:02, 58.90it/s]                                

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants