Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/lexbase: remove map access overhead when encoding SQL strings #124294

Merged
merged 2 commits into from
May 20, 2024

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented May 16, 2024

opt: add benchmark for simple INSERT with JSON column

Release note: None

sql/lexbase: remove map access overhead when encoding SQL strings

Map accesses have been replaced with more efficient array accesses for
determining characters to quote in EncodeSQLStringWithFlags. In a CPU
profile from a recent POC these map access accounted for a significant
portion of CPU time spent in query optimization because the optimizer's
interner converts tree.DJSON objects to strings using this encoding
function when hashing and checking equality of JSON constants.

Epic: None

Release note: None

@mgartner mgartner requested a review from a team May 16, 2024 20:00
@mgartner mgartner requested a review from a team as a code owner May 16, 2024 20:00
@mgartner mgartner requested review from rytaft and removed request for a team May 16, 2024 20:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner
Copy link
Collaborator Author

Benchmark results of the new test case:

name                                                  old time/op    new time/op    delta
Phases/json-insert/Simple/Parse                         5.02µs ± 6%    4.79µs ± 2%     ~     (p=0.151 n=5+5)
Phases/json-insert/Simple/OptBuildNoNorm                12.9µs ± 2%    12.8µs ± 9%     ~     (p=0.548 n=5+5)
Phases/json-insert/Simple/OptBuildNorm                  13.5µs ±12%    12.4µs ± 2%   -8.42%  (p=0.008 n=5+5)
Phases/json-insert/Simple/Explore                       14.8µs ± 3%    13.9µs ± 1%   -6.14%  (p=0.008 n=5+5)
Phases/json-insert/Simple/ExecBuild                     16.1µs ± 5%    14.7µs ± 0%   -8.74%  (p=0.008 n=5+5)
Phases/json-insert/Prepared/AssignPlaceholdersNoNorm    4.12µs ± 1%    3.46µs ± 0%  -15.95%  (p=0.008 n=5+5)
Phases/json-insert/Prepared/AssignPlaceholdersNorm      4.34µs ± 6%    3.45µs ± 0%  -20.41%  (p=0.016 n=5+4)
Phases/json-insert/Prepared/Explore                     5.75µs ± 6%    4.89µs ± 1%  -14.89%  (p=0.008 n=5+5)
Phases/json-insert/Prepared/ExecBuild                   6.75µs ±10%    5.64µs ± 2%  -16.37%  (p=0.008 n=5+5)

@mgartner mgartner requested a review from yuzefovich May 16, 2024 20:01
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some nits to consider, but :lgtm:

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/lexbase/encode.go line 117 at r2 (raw file):

const minQuoteChar = ' ' // 0x20
const maxQuoteChar = '~' // 0x7E
var mustQuoteMap = [maxQuoteChar]bool{

nit: array needs to have maxQuoteChar + 1 elements (or maxQuoteChar needs to be 0x7F). (We could also use const minQuoteChar = byte(0x20) to be more explicit.)


pkg/sql/lexbase/encode.go line 157 at r2 (raw file):

		ch := byte(r)
		if r >= minQuoteChar && r <= maxQuoteChar {
			if mustQuoteMap[ch] {

nit: might be worth trying switch with 4 cases to see whether it'd be faster.


pkg/sql/lexbase/encode.go line 161 at r2 (raw file):

				bareStrings = false
			}
			if !stringencoding.NeedEscape(ch) && ch != '\'' {

nit: we can pull this if to be above mustQuoteMap check (we could avoid mustQuoteMap lookup if this condition is true).

@mgartner mgartner force-pushed the json-insert-opt-improvement branch from 0d2bf82 to 1a41d27 Compare May 17, 2024 14:42
@mgartner mgartner requested a review from yuzefovich May 17, 2024 14:42
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @yuzefovich)


pkg/sql/lexbase/encode.go line 117 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: array needs to have maxQuoteChar + 1 elements (or maxQuoteChar needs to be 0x7F). (We could also use const minQuoteChar = byte(0x20) to be more explicit.)

Oops, great catch! I switch this to use unicode.MaxASCII, let me know what you think. Using const minQuoteChar = byte(0x20) seems less obvious because you need to know that ' ' == 0x20 to make the connection to the ' ': true entry in the array.


pkg/sql/lexbase/encode.go line 157 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: might be worth trying switch with 4 cases to see whether it'd be faster.

I tested a simple benchmark and the array is the fastest: https://gist.github.com/mgartner/f1b0fbd7f05faf37038c8545dabc7b80


pkg/sql/lexbase/encode.go line 161 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we can pull this if to be above mustQuoteMap check (we could avoid mustQuoteMap lookup if this condition is true).

I think the semantics would change if any chars satisfy both the conditional and the mustQuoteMap—in that case we wouldn't set baseStrings = false. It looks like that is impossible currently, but not guaranteed by anything. So I'm hesitant to change it.

@mgartner
Copy link
Collaborator Author

pkg/sql/lexbase/encode.go line 157 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I tested a simple benchmark and the array is the fastest: https://gist.github.com/mgartner/f1b0fbd7f05faf37038c8545dabc7b80

Hmm, maybe not the most fair test because the array likely sits in the CPU cache for the entire benchmark, which might be less likely in a production workload. 🤷

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


pkg/sql/lexbase/encode.go line 157 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Hmm, maybe not the most fair test because the array likely sits in the CPU cache for the entire benchmark, which might be less likely in a production workload. 🤷

Thanks for trying (I thought I remembered reading somewhere that switch in recent Go versions became faster, perhaps not fast enough :) )


pkg/sql/lexbase/encode.go line 161 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I think the semantics would change if any chars satisfy both the conditional and the mustQuoteMap—in that case we wouldn't set baseStrings = false. It looks like that is impossible currently, but not guaranteed by anything. So I'm hesitant to change it.

Good point, I missed that.

Map accesses have been replaced with more efficient array accesses for
determining characters to quote in `EncodeSQLStringWithFlags`. In a CPU
profile from a recent POC these map access accounted for a significant
portion of CPU time spent in query optimization because the optimizer's
interner converts `tree.DJSON` objects to strings using this encoding
function when hashing and checking equality of JSON constants.

Release note: None
@mgartner mgartner force-pushed the json-insert-opt-improvement branch from 1a41d27 to 1ea9ba6 Compare May 20, 2024 14:27
@mgartner
Copy link
Collaborator Author

My previous attempts were actually broken in subtle ways. @yuzefovich do you mind taking one more look and letting me know if this still LGTY?

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

@mgartner
Copy link
Collaborator Author

TFTRs!

bors r+

craig bot pushed a commit that referenced this pull request May 20, 2024
122864: sql: add randomized test for plan-gist logic r=yuzefovich a=yuzefovich

This commit adds a randomized test for plan-gist logic. It utilizes sqlsmith to generate random stmts, then runs EXPLAIN (GIST) for it to retrieve the gist followed by decoding that gist, and ensures that no internal errors occur. All generated stmts are also executed with a short timeout to allow for DB state to evolve. Additionally, all gists are accumulated and decoded against an empty DB.

The test is placed in the ccl package to exercise features gated behind CCL license.

The test quickly discovered bug fixed in 1315dda (when the corresponding fix is reverted) but haven't found anything new for plan-gist logic (but did hit a few issues elsewhere which are currently ignored in the test).

Fixes: #117634.

Release note: None

124284: kvserver: rebalance ranges with one voter using joint configurations  r=nvanbenschoten a=kvoli

The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to #74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: #108420
Fixes: #124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.

124294: sql/lexbase: remove map access overhead when encoding SQL strings r=mgartner a=mgartner

#### opt: add benchmark for simple INSERT with JSON column

Release note: None

#### sql/lexbase: remove map access overhead when encoding SQL strings

Map accesses have been replaced with more efficient array accesses for
determining characters to quote in `EncodeSQLStringWithFlags`. In a CPU
profile from a recent POC these map access accounted for a significant
portion of CPU time spent in query optimization because the optimizer's
interner converts `tree.DJSON` objects to strings using this encoding
function when hashing and checking equality of JSON constants.

Epic: None

Release note: None


124444: packer: move `add-apt-repository` call to after `sleep` r=jlinder a=rickystewart

Updates are still going on on the VM while this is happening.

Epic: none
Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented May 20, 2024

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request May 20, 2024
122864: sql: add randomized test for plan-gist logic r=yuzefovich a=yuzefovich

This commit adds a randomized test for plan-gist logic. It utilizes sqlsmith to generate random stmts, then runs EXPLAIN (GIST) for it to retrieve the gist followed by decoding that gist, and ensures that no internal errors occur. All generated stmts are also executed with a short timeout to allow for DB state to evolve. Additionally, all gists are accumulated and decoded against an empty DB.

The test is placed in the ccl package to exercise features gated behind CCL license.

The test quickly discovered bug fixed in 1315dda (when the corresponding fix is reverted) but haven't found anything new for plan-gist logic (but did hit a few issues elsewhere which are currently ignored in the test).

Fixes: #117634.

Release note: None

124294: sql/lexbase: remove map access overhead when encoding SQL strings r=mgartner a=mgartner

#### opt: add benchmark for simple INSERT with JSON column

Release note: None

#### sql/lexbase: remove map access overhead when encoding SQL strings

Map accesses have been replaced with more efficient array accesses for
determining characters to quote in `EncodeSQLStringWithFlags`. In a CPU
profile from a recent POC these map access accounted for a significant
portion of CPU time spent in query optimization because the optimizer's
interner converts `tree.DJSON` objects to strings using this encoding
function when hashing and checking equality of JSON constants.

Epic: None

Release note: None


Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented May 20, 2024

Build failed (retrying...):

@craig craig bot merged commit a896503 into cockroachdb:master May 20, 2024
21 of 22 checks passed
@mgartner mgartner deleted the json-insert-opt-improvement branch May 21, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants