Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leakage issues in ConnectionPool #212

Open
wathenjiang opened this issue Apr 20, 2024 · 1 comment
Open

Potential memory leakage issues in ConnectionPool #212

wathenjiang opened this issue Apr 20, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@wathenjiang
Copy link

wathenjiang commented Apr 20, 2024

Describe the bug

  • Based on the current implementation, we can find that Lru is encapsulated in RwLock<ThreadLocal<RefCell<T>>>. Please see
    pub struct Lru<K, T>
    where
    K: Send,
    T: Send,
    {
    lru: RwLock<ThreadLocal<RefCell<LruCache<K, Node<T>>>>>,
    size: usize,
    drain: AtomicBool,
    }
  • There may be the following behaviors:
    • Thread 1 puts the Connection into its thread local Lru and the global PoolNode.
    • Thread2 is responsible for executing the idle_poll async task(such as the release_http_session method in v2.rs). Please see
      rt.spawn(async move {
      pool.idle_poll(locked_stream, &meta, idle_timeout, notify_close, watch_use)
      .await;
      });
      and
      fn pop_closed(&self, meta: &ConnectionMeta) {
      // NOTE: which of these should be done first?
      self.pop_evicted(meta);
      self.lru.pop(&meta.id);
      }
    • By executing the pop_closed method, it will successfully remove the Connection from the PoolNode, but it cannot remove it from the Lru.

Steps to reproduce

The existence of this memory leak can be proven through the following tests:

// pingora-pool/src/connection.rs
#[test]
fn test_pop_multi_threads() {
    use env_logger;

    let _ = env_logger::builder()
    .is_test(true)
    .filter_level(log::LevelFilter::Debug)
    .try_init();

    let meta = ConnectionMeta::new(101, 1);
    let value = "v1".to_string();

    let cp: Arc<ConnectionPool<String>> = Arc::new(ConnectionPool::new(3));
    // put meta in main thread
    cp.put(&meta, value);

    {
        let cp = cp.clone();
        let meta = meta.clone();
        std::thread::spawn(move || {
            // pop meta in child thread
            cp.pop_closed(&meta);
        })
        .join()
        .unwrap();
    }
}


// Add log print in the method pop_closed
fn pop_closed(&self, meta: &ConnectionMeta) {
    // NOTE: which of these should be done first?
    self.pop_evicted(meta);
    let r = self.lru.pop(&meta.id);
    debug!("pop from lru res: {}",r.is_some()); // the added log print
}

Console printing:

running 1 test
[2024-04-20T02:51:48Z DEBUG pingora_pool::connection] evict fd: 1 from key 101
[2024-04-20T02:51:48Z DEBUG pingora_pool::connection] pop from lru res: false
test connection::test_pop_multi_threads ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 9 filtered out; finished in 0.00s

Currently, the maximum size of memory leaks depends on the size setting of the ConnectionPool.

If I haven't missed any other important information, I guess the RwLock<ThreadLocal<RefCell<T>> is mainly to reduce lock contention. Another solution to this is to use the sharded lock way(every connection has its ID, which should not be difficult to achieve).

There are use cases in the Rocksdb project on how to use sharded cache, please refer to: https://github.com/facebook/rocksdb/blob/master/cache/sharded_cache.cc

@eaufavor eaufavor added the bug Something isn't working label Apr 20, 2024
@eaufavor
Copy link
Member

Your observation is correct. This is by design. We trade memory for execution efficiency. I won't call it memory leak because the memory is still tracked in the LRU and will be reclaimed when the LRU evicts it eventually. The code just doesn't free it right away.

At the time we implemented this logic we did not know ways to do caching without lock contention. Now we have things like TinyUFO. So we will optimize this in the future.

@johnhurt johnhurt added enhancement New feature or request and removed bug Something isn't working labels Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants