Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV #79

Open
eadmaster opened this issue May 27, 2020 · 7 comments
Open

CSV #79

eadmaster opened this issue May 27, 2020 · 7 comments
Labels
format request Request for a new format good first issue Good for newcomers

Comments

@eadmaster
Copy link

eadmaster commented May 27, 2020

With separators auto-guessing and optional overrides:
https://en.wikipedia.org/wiki/Comma-separated_values

faq -f json -o csv < file.json

@jzelinskie jzelinskie added the format request Request for a new format label May 27, 2020
@jzelinskie jzelinskie changed the title csv I/O support Format Request: CSV May 27, 2020
@jzelinskie jzelinskie added the good first issue Good for newcomers label May 27, 2020
@jzelinskie
Copy link
Owner

I think this is a reasonable addition, given the existence of a good Go CSV library. Suggestions welcome!

@eadmaster
Copy link
Author

eadmaster commented May 28, 2020

converting csv->json should be a no brainer, while the opposite is not because csv only supports a flat schema.

Using the dot notation seem the most common solution for nested records:

Btw since this is rarely used i think you could just make the conversion fail in this case.

@chancez
Copy link
Collaborator

chancez commented May 28, 2020

Encoding is indeed a huge pain. It may be best if you just start with decoding only.

When I tried to implement this and I found it to be a pain because you have to basically decide how to handle non-arrays and arrays of non-homogenous objects. It's probably okay to simply fail in many cases, but limits the usefulness quite a bit.

For decoding, you'll also need to think about the following:

  • Do you produce an array of arrays or an array of objects? When I talked to Jimmy about this I brought up the idea that there could be a new flag to control decoder specific behavior. Eg; --decoder-opts="csv=array-of-arrays".
  • How do you handle headers missing if you're producing an array of objects? Eg probably just set the objects fields to something like 0, 1 or field1, field2.
  • If producing an array of arrays: do you include the CSV headers in the first array? Maybe another decoder option.

@eadmaster
Copy link
Author

eadmaster commented May 29, 2020

* How do you handle headers missing if you're producing an array of objects? Eg probably just set the objects fields to something like `0`, `1` or `field1`, `field2`.

An array of objects should be nicer.

* How do you handle headers missing if you're producing an array of objects?

you can omit the fields in the objects or leave them as empty strings.

e.g. you have this csv:

name,children,thing
1,2,3
,,2,3
1,,3

This is the catmandu convert CSV to JSON output:

[
  {
    "thing": "3",
    "name": "1",
    "children": "2"
  },
  {
    "children": "",
    "name": "",
    "thing": "2"
    // note the 3rd value is omitted here because it has no header
  },
  {
    "children": "",
    "name": "1",
    "thing": "3"
  }
]

Alternative with omitted missing fields:

[
  {
    "thing": "3",
    "name": "1",
    "children": "2"
  },
  {
    "thing": "2"
  },
  {
    "name": "1",
    "thing": "3"
  }
]

@chancez
Copy link
Collaborator

chancez commented May 29, 2020

I wasn't talking about values missing, but headers. Eg:
What does this CSV produce as JSON?

1,2,3
,,2,3
1,,3

@eadmaster
Copy link
Author

eadmaster commented May 29, 2020

catmandu just assume the first row is always the header.

$ cat test.csv
1,2,3
,,2,3
1,,3

$ catmandu convert CSV to JSON < test.csv  | jq .                                                                                                                                                                                                                                   
[
  {
    "1": "",
    "2": "",
    "3": "2"
  },
  {
    "3": "3",
    "2": "",
    "1": "1"
  }
]

Another command that is able to convert csv->json is rows (you can install via the python-rows package) :

$ rows  convert  test.csv  test.json
$ cat test.json | jq
[
  // same behavior here
  {
    "field_1": null,
    "field_2": null,
    "field_3": 2
  },
  {
    "field_1": 1,
    "field_2": null,
    "field_3": 3
  }
]

With the headers i get this, which looks a bit confusing to me:

[
  {
    "name": 1,
    "children": 2,
    "thing": 3,
    "field_3": null
  },
  {
    "name": null,
    "children": null,
    "thing": 2,
    "field_3": 3
  },
  {
    "name": 1,
    "children": null,
    "thing": 3,
    "field_3": null
  }
]

@chancez
Copy link
Collaborator

chancez commented May 29, 2020

I think the 2nd output is more what I was describing. I don't think it's wise to assume every file has headers, hence the point I was trying to make about having decoder specific options to control this kind of behavior. It's probably fine to start with assuming headers are set. Someone can add headers like this:

faq -fcsv -ojson '.' <(echo col1,col2,col3; cat my-csv-file.csv)

@jzelinskie jzelinskie changed the title Format Request: CSV CSV Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format request Request for a new format good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants