Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some checkpoints cannot be opened with kAbsoluteConsistency WAL recovery mode #12670

Open
andlr opened this issue May 16, 2024 · 0 comments · May be fixed by #12671
Open

Some checkpoints cannot be opened with kAbsoluteConsistency WAL recovery mode #12670

andlr opened this issue May 16, 2024 · 0 comments · May be fixed by #12671

Comments

@andlr
Copy link
Contributor

andlr commented May 16, 2024

Expected behavior

Database can be opened from a checkpoint with wal_recovery_mode=kAbsoluteConsistency

Actual behavior

Due to a few data race issues, sometimes active WAL file gets copied in inconsistent state.
Database open fails with one of these errors when wal_recovery_mode=kAbsoluteConsistency:

  • Corruption: truncated record body
  • Corruption: error reading trailing data

Steps to reproduce the behavior

Initially I wrote this heavy and flaky test, which sometimes reproduces this issue:

TEST_F(CheckpointTest, WalCorruption) {
  Options options = CurrentOptions();
  options.wal_recovery_mode = WALRecoveryMode::kAbsoluteConsistency;

  Reopen(options);

  const auto threads_num = 32;
  const auto checkpoints_to_create = 200;
  std::atomic<int> thread_num(0);
  std::vector<port::Thread> threads;
  port::RWMutex mutex;
  bool finished = false;

  std::function<void()> write_func = [&]() {
    int a = thread_num.fetch_add(1);
    bool stop_worker = false;

    while (!stop_worker) {
      for (auto i = 0; i < 10000; ++i) {
        std::string key = "foo" + std::to_string(a) + "_" + std::to_string(i);
        ASSERT_OK(Put(key, "bar"));
      }

      mutex.ReadLock();
      stop_worker = finished;
      mutex.ReadUnlock();
    }
  };

  for (auto i = 0; i < threads_num; ++i) {
    threads.emplace_back(write_func);
  }

  std::vector<std::string> snapshot_names;
  for (auto i = 0; i < checkpoints_to_create; ++i) {
    const auto snapshot_name =
        test::PerThreadDBPath(env_, "snap_" + std::to_string(i));
    std::unique_ptr<Checkpoint> checkpoint;
    Checkpoint* checkpoint_ptr;
    ASSERT_OK(Checkpoint::Create(db_, &checkpoint_ptr));
    checkpoint.reset(checkpoint_ptr);

    ASSERT_OK(checkpoint->CreateCheckpoint(snapshot_name));
    snapshot_names.push_back(snapshot_name);
  }

  mutex.WriteLock();
  finished = true;
  mutex.WriteUnlock();

  for (auto& t : threads) {
    t.join();
  }

  Close();

  options.skip_stats_update_on_db_open = true;
  options.skip_checking_sst_file_sizes_on_db_open = true;
  options.max_open_files = 10;

  for (const auto& snapshot_name : snapshot_names) {
    DB* snapshot_db = nullptr;
    ASSERT_OK(DB::Open(options, snapshot_name, &snapshot_db));
    ASSERT_OK(snapshot_db->Close());
    delete snapshot_db;
  }
}

But I've also wrote more precise unit tests using sync points, so I'll include them into my PR with a suggested fix.

Conditions to reproduce are:

  • wal_size_for_flush is non-zero, so the WAL file gets copied during checkpoint;
  • while checkpoint is in progress, there are write operations happening in the background;
  • wal_recovery_mode = WALRecoveryMode::kAbsoluteConsistency when opening DB from the checkpoint.

This happens because size of the active WAL file is captured at a random moment:

  • truncated record body error happens when WAL file size is captured right after WritableFileWriter flush when in-memory buffer no longer has space for new data
  • error reading trailing data happens, when WAL record gets broken down into multiple physical records, and WAL file size was captured before last fragment has been written.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant