You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, i’m trying to create a HF dataset from a list using Dataset.from_list.
Each sample in the list is a dict with the same keys (which will be my features). The values for each feature are a list of dictionaries, and each such dictionary has a different set of keys. However, the datasets library standardizes all dictionaries under a feature and adds all possible keys (with None value) from all the dictionaries under that feature.
How can I keep the same set of keys as in the original list for each dictionary under a feature?
Steps to reproduce the bug
from datasets import Dataset
# Define a function to generate a sample with "tools" feature
def generate_sample():
# Generate random sample data
sample_data = {
"text": "Sample text",
"feature_1": []
}
# Add feature_1 with random keys for this sample
feature_1 = [{"key1": "value1"}, {"key2": "value2"}] # Example feature_1 with random keys
sample_data["feature_1"].extend(feature_1)
return sample_data
# Generate multiple samples
num_samples = 10
samples = [generate_sample() for _ in range(num_samples)]
# Create a Hugging Face Dataset
dataset = Dataset.from_list(samples)
dataset[0]
Describe the bug
Hi, i’m trying to create a HF dataset from a list using Dataset.from_list.
Each sample in the list is a dict with the same keys (which will be my features). The values for each feature are a list of dictionaries, and each such dictionary has a different set of keys. However, the datasets library standardizes all dictionaries under a feature and adds all possible keys (with None value) from all the dictionaries under that feature.
How can I keep the same set of keys as in the original list for each dictionary under a feature?
Steps to reproduce the bug
{'text': 'Sample text', 'feature_1': [{'key1': 'value1', 'key2': None}, {'key1': None, 'key2': 'value2'}]}
Expected behavior
{'text': 'Sample text', 'feature_1': [{'key1': 'value1'}, {'key2': 'value2'}]}
Environment info
datasets
version: 2.19.1huggingface_hub
version: 0.23.0fsspec
version: 2023.10.0The text was updated successfully, but these errors were encountered: