Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow quotes to be escaped in JSON path #12033

Merged
merged 4 commits into from
May 21, 2024

Conversation

lnkuiper
Copy link
Contributor

We conveniently support escaping JSON object keys that contain the special [ and . characters by surrounding them with quotes.

{
  "duck": [42, 43]
}

The path $.duck points to [42, 43], and the path $.duck[0] points to 42.

If the key was "du[ck" instead, we wouldn't be able to extract because we would interpret [ as the syntax for extracting from an array. We already implemented escaping these special characters by surrounding them with quotes: $."du[ck".

If the key was "du\"ck" instead, we would also be able to extract, by just specifying the path $.du"ck. However, if it is necessary to escape special characters like [, but keys also contain quotes, you were out of luck.

So, we would not be able to extract if the key was, for example, "du[\"ck". Surrounding the path with quotes does not work because there is a quote in the middle of the key.

This PR solves this problem by allowing quotes within quoted paths to be escaped with a backslash:

select '{"d[u]_\"ck":42}'->'$."d[u]_\"ck"' v;
┌──────┐
│  v   │
│ json │
├──────┤
│ 42   │
└──────┘

Fixes #11997

@lnkuiper lnkuiper added the Needs Documentation Use for issues or PRs that require changes in the documentation label May 13, 2024
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 14, 2024 07:08
@lnkuiper lnkuiper marked this pull request as ready for review May 14, 2024 07:28
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good - some comments:

extension/json/json_common.cpp Show resolved Hide resolved
test/sql/json/scalar/test_json_extract.test Show resolved Hide resolved
extension/json/json_common.cpp Show resolved Hide resolved
ptr++;
key[key_len++] = *ptr++;
}
if (ptr == end || backslash) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be an error if backslash is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we ignore other backslashes as we do for regular (non quoted) JSON paths

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but then returning an empty result here doesn't seem correct to me - shouldn't we match x\ as the string literal 'x\' in that case, instead of returning an empty result? Could we add such a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I understand the confusion. If ptr == end || backslash, we return an empty result, which will cause a descriptive error, pointing to the location where the error occurred. So, if we have $."x\ as a path, you will get an error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a bit inconsistent, no? We accept a backslash as a regular backslash except when it is the last character? I would say we should turn it into a regular backslash in this scenario perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have the path $."x\, we need to throw an error, because the path is "escaped", i.e., the object key starts with a ", so it needs to end with a " too. If it ends with a backslash, it's wrong. Actually, I just realised that it can be removed from the check, since, ptr == end will be true

@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 14, 2024 19:57
@lnkuiper lnkuiper marked this pull request as ready for review May 14, 2024 19:57
@lnkuiper lnkuiper marked this pull request as draft May 15, 2024 07:36
@lnkuiper lnkuiper marked this pull request as ready for review May 15, 2024 07:36
@Mytherin Mytherin merged commit 242d91e into duckdb:main May 21, 2024
81 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request May 21, 2024
Merge pull request duckdb/duckdb#12086 from gitccl/fix_list_zip
Merge pull request duckdb/duckdb#12158 from lnkuiper/simple_external_join_test
Merge pull request duckdb/duckdb#12155 from Tishj/python_sqllogictest_rm_testdir
Merge pull request duckdb/duckdb#12033 from lnkuiper/json_extract_escape
@lnkuiper lnkuiper deleted the json_extract_escape branch May 29, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changes Requested Needs Documentation Use for issues or PRs that require changes in the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Escaping string double quote in JSON Path Not Working
2 participants