Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-34167 We fail to terminate transaction early with ER_LOCK_TABLE_… #3263

Merged
merged 1 commit into from
May 21, 2024

Conversation

mariadb-DebarunBanerjee
Copy link
Contributor

  • The Jira issue number for this PR is: MDEV-34167

Description

This regression is introduced in 10.6 by following commit. commit b6a2472
MDEV-27891: SIGSEGV in InnoDB buffer pool resize

During DML, we check if buffer pool is running out of data pages in buf_pool_t::running_out. Here, if 75% of the buffer pool is occupied by non-data pages, we rollback the current transaction and exit with ER_LOCK_TABLE_FULL.

The integer division (n_chunks_new / 4) becomes zero whenever the total number of chunks are < 4 making the check completely ineffective for such cases. Also the check is inaccurate for larger chunks.

Fix-1: Correct the check in buf_pool_t::running_out.

Fix-2: While waiting for free page, check for
buf_LRU_check_size_of_non_data_objects.

Release Notes

When system is reaching out of memory in BP due to too many locks, it allows DMLs to exit before reaching the critical stage of server exit.

How can this PR be tested?

The test case from MDEV-34166 which is being fixed in 10.5 would also test this patch once merged. So. we would have the automated test running soon. Till that time it is difficult to add an automated test as it requires large BP and records in the order of millions to validate it.

In MDEV, I have updated the way it can be tested manually.

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch.
  • This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Should we also simplify sel_set_rec_lock(): declare it static instead of UNIV_INLINE, and remove the additional condition?

	if (UT_LIST_GET_LEN(trx->lock.trx_locks) > 10000
	    && buf_pool.running_out()) {
		return DB_LOCK_TABLE_FULL;
	}

All other code paths that return DB_LOCK_TABLE_FULL ignore the number of locks that are being held by the current transaction.

storage/innobase/buf/buf0lru.cc Outdated Show resolved Hide resolved
@mariadb-DebarunBanerjee
Copy link
Contributor Author

Should we also simplify sel_set_rec_lock():
UT_LIST_GET_LEN(trx->lock.trx_locks) > 10000

Interesting observation but I think it is not void of impact and I would be sceptic making such changes part of this patch in older versions. Perhaps we should consider in 11.* versions later.

…FULL when lock memory is growing

This regression is introduced in 10.6 by following commit.
commit b6a2472
MDEV-27891: SIGSEGV in InnoDB buffer pool resize

During DML, we check if buffer pool is running out of data pages in
buf_pool_t::running_out. Here is 75% of the buffer pool is occupied by
non-data pages we rollback the current transaction and exit with
ER_LOCK_TABLE_FULL.

The integer division (n_chunks_new / 4) becomes zero whenever the total
number of chunks are < 4 making the check completely ineffective for
such cases. Also the check is inaccurate for larger chunks.

Fix-1: Correct the check in buf_pool_t::running_out.

Fix-2: While waiting for free page, check for
buf_LRU_check_size_of_non_data_objects.
@mariadb-DebarunBanerjee mariadb-DebarunBanerjee merged commit 2d5cba2 into 10.6 May 21, 2024
18 of 19 checks passed
@mariadb-DebarunBanerjee mariadb-DebarunBanerjee deleted the 10.6-MDEV-34167 branch May 21, 2024 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants