Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: sysbench/oltp_read_write/nodes=3/cpu=32/conc=256 failed #124279

Closed
cockroach-teamcity opened this issue May 16, 2024 · 5 comments
Closed
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-testeng TestEng Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented May 16, 2024

roachtest.sysbench/oltp_read_write/nodes=3/cpu=32/conc=256 failed with artifacts on master @ 992d4573fb8b71e28717b4d14c4c7ebccb38c412:

(monitor.go:154).Wait: monitor failure: COMMAND_PROBLEM: exit status 132
test artifacts and logs in: /artifacts/sysbench/oltp_read_write/nodes=3/cpu=32/conc=256/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=azure
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=32
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-38805

@cockroach-teamcity cockroach-teamcity added branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-testeng TestEng Team labels May 16, 2024
@cockroach-teamcity
Copy link
Member Author

roachtest.sysbench/oltp_read_write/nodes=3/cpu=32/conc=256 failed with artifacts on master @ 855b9cc97afa3df4f7e17f928c04ab0834b2630c:

(monitor.go:154).Wait: monitor failure: COMMAND_PROBLEM: exit status 132
test artifacts and logs in: /artifacts/sysbench/oltp_read_write/nodes=3/cpu=32/conc=256/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=azure
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=32
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.sysbench/oltp_read_write/nodes=3/cpu=32/conc=256 failed with artifacts on master @ 93ad913106b6f0f6ec98bc2cfa788ff6d8085bd4:

(monitor.go:154).Wait: monitor failure: COMMAND_PROBLEM: exit status 132
test artifacts and logs in: /artifacts/sysbench/oltp_read_write/nodes=3/cpu=32/conc=256/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=azure
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=32
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

@renatolabs renatolabs removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label May 20, 2024
@renatolabs
Copy link
Collaborator

Seems like sysbench is crashing with an Illegal instruction error while running the benchmark:

bash: line 21: 20650 Illegal instruction     (core dumped) bash -c "sysbench \\

There should be no differences between the version used in the Azure or GCE, but this seems to be failing only on the former. Given that we don't even collect any metrics in these runs (#123071), we could just skip this test on Azure.

@srosenberg
Copy link
Member

srosenberg commented May 20, 2024

There should be no differences between the version used in the Azure or GCE, but this seems to be failing only on the former. Given that we don't even collect any metrics in these runs (#123071), we could just skip this test on Azure.

I am guessing it's because of ROACHTEST_arch=arm64; i.e., the sysbench binary must have been built with amd64; double-checking...

@srosenberg
Copy link
Member

srosenberg commented May 20, 2024

the sysbench binary must have been built with amd64; double-checking...

Nope, it is picking up arm64 build of sysbench. I tried to reproduce this locally,

roachprod create --clouds azure -n1 --local-ssd=false --azure-machine-type Standard_D32pds_v5 stan-test --arch arm64
roachprod stage stan-test release v23.1.4 --arch arm64
roachprod start stan-test

sudo apt-get update;
sudo apt-get install -y sysbench

but it succeeded,

sysbench               --db-driver=pgsql               --pgsql-host=localhost          --pgsql-port=26257              --pgsql-user=roachprod          --pgsql-password=cockroachdb            --pgsql-db=sysbench           --report-interval=1             --time=600              --threads=256           --tables=10             --table_size=10000000           --auto_inc=false                oltp_read_write prepare
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Initializing worker threads...

Creating table 'sbtest3'...
Inserting 10000000 records into 'sbtest3'
Creating table 'sbtest2'...
Inserting 10000000 records into 'sbtest2'
Creating table 'sbtest9'...
Inserting 10000000 records into 'sbtest9'
Creating table 'sbtest8'...
Inserting 10000000 records into 'sbtest8'
Creating table 'sbtest5'...
Inserting 10000000 records into 'sbtest5'
Creating table 'sbtest6'...
Inserting 10000000 records into 'sbtest6'
Creating table 'sbtest4'...
Inserting 10000000 records into 'sbtest4'
Creating table 'sbtest10'...
Creating table 'sbtest1'...
Inserting 10000000 records into 'sbtest10'
Inserting 10000000 records into 'sbtest1'
Creating table 'sbtest7'...
Inserting 10000000 records into 'sbtest7'
Creating a secondary index on 'sbtest4'...
Creating a secondary index on 'sbtest3'...
Creating a secondary index on 'sbtest5'...
Creating a secondary index on 'sbtest8'...
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest7'...
Creating a secondary index on 'sbtest6'...
Creating a secondary index on 'sbtest10'...
Creating a secondary index on 'sbtest9'...
Creating a secondary index on 'sbtest1'..

Evidently, sysbench is known to segfault occasionally [1], so there isn't much actionable here.

[1]

// Sysbench occasionally segfaults. When that happens, don't fail the
// test.
if result.RemoteExitStatus == errors.SegmentationFaultExitCode {
t.L().Printf("sysbench segfaulted; passing test anyway")
return nil

Test Engineering automation moved this from Triage to Done May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-testeng TestEng Team
Projects
Development

No branches or pull requests

3 participants