Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

协程被长时间挂起会导致内存大量增加 #5327

Closed
nashgao opened this issue May 17, 2024 · 49 comments
Closed

协程被长时间挂起会导致内存大量增加 #5327

nashgao opened this issue May 17, 2024 · 49 comments

Comments

@nashgao
Copy link

nashgao commented May 17, 2024

我使用一个hyperf skeleton, 在worker进程初始化以后执行

for ($i=0;$i<1000;$i++) {
    \Swoole\Coroutine::create(
        function () {
            $chan = new \Swoole\Coroutine\Channel();
            while (true) {
                    $chan->pop(-1);
            }
       }
   );
}

创建1000个协程,每个协程内部创建一个channel并且pop消息来模拟实际应用场景,但是并不向channel内部传入任何消息。大概过了30分钟以后,内存显著并且持续增长直到超过memory_limit, 这种情况稳定复现。

所以我想请问这是正常的现象吗,还是协程挂起导致了内存泄漏了呢?

swoole参数:

Swoole => enabled
Author => Swoole Team <team@swoole.com>
Version => 5.1.2
Built => May 17 2024 01:17:31
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
signalfd => enabled
cpu_affinity => enabled
spinlock => enabled
rwlock => enabled
openssl => OpenSSL 1.1.1f  31 Mar 2020
dtls => enabled
http2 => enabled
json => enabled
curl-native => enabled
zlib => 1.2.11
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled
async_redis => enabled

Directive => Local Value => Master Value
swoole.enable_coroutine => On => On
swoole.enable_library => On => On
swoole.enable_fiber_mock => Off => Off
swoole.enable_preemptive_scheduler => Off => Off
swoole.display_errors => On => On
swoole.use_shortname => Off => Off
swoole.unixsock_buffer_size => 8388608 => 8388608

php版本为8.3
@nashgao
Copy link
Author

nashgao commented May 17, 2024

这个好像和协程栈的大小有关,默认的php栈是8k,文档中说可以储存512个变量,我的猜想是由于创建了1000个协程,主协程拥有了超过512个变量,导致协程的栈扩容,导致了内存的增加,这个理解是正确的吗

@NathanFreeman
Copy link
Member

你这段代码是实际生产环境的代码吗,这里会提示协程死锁的

@nashgao
Copy link
Author

nashgao commented May 17, 2024

不是实际生产,是模拟,我是自己写了一个actor模型,用channel模拟actor的mailbox,我们的业务特点会导致channel->pop($timeout) 里面的timeout时间会很长,也就是会有大量的协程里面出现

\Swoole\Coroutine::create(
        function () {
            $chan = new \Swoole\Coroutine\Channel();
            while (true) {
                    $chan->pop(-1);
            }
       }
   );

这样的情况,这个必须放在一个swoole server里面才能复现

@NathanFreeman
Copy link
Member

增大一下memery_limit试试看,如果是有请求不断进来,确实会生成大量协程

@nashgao
Copy link
Author

nashgao commented May 17, 2024

我的memory_limit 是8gb,而且只是启动了1000个协程,没有接受任何的请求

@NathanFreeman
Copy link
Member

NathanFreeman commented May 17, 2024

你是在server里面,一个请求过来就生成1000个协程,还是总共1000个协程

@nashgao
Copy link
Author

nashgao commented May 17, 2024

server启动的时候,创建1000个协程,没有任何请求被发送或者接受,或者你直接创建一个process也可以。

@NathanFreeman
Copy link
Member

NathanFreeman commented May 17, 2024

server启动的onworkerstart时间启动协程吗

@nashgao
Copy link
Author

nashgao commented May 17, 2024

server启动的onworkerstart时间启动携程吗

是的,这个复现可能需要进程运行一段时间才显著

@NathanFreeman
Copy link
Member

可以的话提供一个可以复现的代码,我这边看看

@nashgao
Copy link
Author

nashgao commented May 17, 2024

好的 我晚点弄一个

@nashgao
Copy link
Author

nashgao commented May 17, 2024

@nashgao
Copy link
Author

nashgao commented May 17, 2024

我是直接用htop观察内存的

@nashgao
Copy link
Author

nashgao commented May 18, 2024

想问一下这边可以复现问题吗 @NathanFreeman

另外我观察内存的增长基本上是以16mb为单位增长的

@NathanFreeman
Copy link
Member

内存泄漏不在这里,代码跑了很久,内存还是很稳定

@NathanFreeman
Copy link
Member

检查一下hyperf的日志输出有没有相关信息

@nashgao
Copy link
Author

nashgao commented May 18, 2024

hyperf没有任何日志,因为没有任何请求,没有定时任务,只是启动了一个骨架。

我目前把协程数量设置为350,程序的占用内存为729mb,也就是每个协程占了2mb的内存(实际内存,不是c栈的虚拟内存)

这个项目和hyperf骨架的唯一区别就是挂起了350个协程

也就是说,如果不是swoole存在内存泄漏,就是hyperf骨架存在了内存泄漏

@NathanFreeman
Copy link
Member

你bin/hyperf.php的ini_set('memory_limit', '1G');还是1G吗

@nashgao
Copy link
Author

nashgao commented May 18, 2024

是的,我是直接用了上面那个repo运行的

我把listener里的协程注释后只运行骨架包,内存稳定在28mb左右

你这边还是不能复现问题吗

@NathanFreeman
Copy link
Member

开了80000个协程,内存稳定2.4G

@nashgao
Copy link
Author

nashgao commented May 18, 2024

运行了多长时间呢,你这是虚拟内存还是实际内存2.4g?

@NathanFreeman
Copy link
Member

扩展有xdebug吗

@nashgao
Copy link
Author

nashgao commented May 18, 2024

没有xdebug

opcache.enable_cli=1 
opcache.jit_buffer_size=100M 
opcache.jit=1255
extension=curl.so 
extension=swoole.so 
swoole.use_shortname=off
extension=redis.so

扩展配置只有这些

@nashgao
Copy link
Author

nashgao commented May 18, 2024

我把jit关了试试

@nashgao
Copy link
Author

nashgao commented May 18, 2024

和jit没关系,仍然复现,程序运行了15分钟后内存开始增常,一次增长16mb,增长了很多次

@nashgao
Copy link
Author

nashgao commented May 19, 2024

@NathanFreeman 这会不会和操作系统有关,我在本地的macos上好像没有问题,但是服务器上就有问题

uname -a

5.15.0-1053-azure #61~20.04.1-Ubuntu SMP Tue Nov 21 17:50:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

并且我发现我的另一个服务(有同样的问题)打印的日志, 从memory_get_usage函数打印的情况来看,每次增长的正好是c栈虚拟内存的大小2mb,但是从/proc看进程内存的确增长了很多

@NathanFreeman
Copy link
Member

ubuntu20.04也没复现出来问题

@nashgao
Copy link
Author

nashgao commented May 19, 2024

那这怎么办呢,我已经关闭了所有的扩展,jit,重新编译swoole的时候也是没有添加任何参数,而且这确实开发和生产环境实在遇到的问题,如果没办法解决,只能在这个项目上放弃使用swoole了

@NathanFreeman
Copy link
Member

版本降级到5.1.1会出现这个问题吗

@nashgao
Copy link
Author

nashgao commented May 19, 2024

我试试吧,如果有时间我晚些再试试swow

@NathanFreeman
Copy link
Member

php -m看看扩展列表

@nashgao
Copy link
Author

nashgao commented May 19, 2024

还是会复现,

扩展列表

bcmath
calendar
Core
ctype
curl
date
decimal
dom
exif
FFI
fileinfo
filter
ftp
gd
gettext
gmp
hash
iconv
igbinary
json
libxml
mbstring
mysqli
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
pdo_sqlite
Phar
posix
random
readline
redis
Reflection
session
shmop
SimpleXML
soap
sockets
sodium
SPL
sqlite3
standard
swoole
sysvmsg
sysvsem
sysvshm
tokenizer
xml
xmlreader
xmlwriter
xsl
Zend OPcache
zip
zlib

@NathanFreeman
Copy link
Member

你用的是不是http协程服务端呀,不是异步风格的

@nashgao
Copy link
Author

nashgao commented May 19, 2024

使用的是hyperf默认的,异步模式的

@nashgao
Copy link
Author

nashgao commented May 20, 2024

协程风格服务也会复现

@NathanFreeman
Copy link
Member

使用Swoole\Timer::tick设置合适的时间间隔调用gc_mem_caches()强制php归还内存给操作系统试试看。

@nashgao
Copy link
Author

nashgao commented May 21, 2024

仍然复现,并且每次ge_mem_caches()返回的值是0

@NathanFreeman
Copy link
Member

服务器上有安装jemalloc吗

@nashgao
Copy link
Author

nashgao commented May 21, 2024

有安装jemalloc,但是编译swoole的时候没有添加选项,在运行php脚本的时候也没有添加指令

@nashgao
Copy link
Author

nashgao commented May 21, 2024

当我使用

LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libjemalloc.so" php bin/hyperf.php start

启动程序的时候,进程内存直接飙升到1.8gb,这时候还是上述代码1000个协程

@nashgao
Copy link
Author

nashgao commented May 21, 2024

也就是说每个协程实际占用了接近2mb的实际内存,而使用jemalloc和glibc的区别基本上是在时间上,最后都会为每个协程分配那么大的内存

@NathanFreeman
Copy link
Member

你直接把memory_limit设置为-1,长时间运行的情况下看看内存会不会不断增长

@nashgao
Copy link
Author

nashgao commented May 21, 2024

我设置memory limit为-1,后直接使用jemalloc启动内存使用还是一样的

这里我有个问题,因为我对php进程的内存分配不是特别了解,在hyperf skeleton项目中默认的memory limit是1gb,也就是说,当我创建1000个协程,并且使用jemalloc分配内存后,htop中进程的res达到了1800mb,也就是超过了memory_limit,但是程序并没有报错说内存爆了,这又是为什么呢

@nashgao
Copy link
Author

nashgao commented May 21, 2024

当我修改stack_size为1mb挂起1000个协程,并且使用jemalloc时,程序的内存占用变成了1gb左右

是不是可以理解,如果使用jemalloc,每个程协的实际内存占用大小和c栈虚拟内存大小相同

如果使用malloc,当协程长时间挂起,协程的实际占用内存会逐渐和c栈虚拟内存大小相同

@NathanFreeman
Copy link
Member

你要看内存逐渐增大是直接不断增加,没有上限,还是到了一个值就不会增长了

@nashgao
Copy link
Author

nashgao commented May 21, 2024

增加到一个值就不增加了

@NathanFreeman
Copy link
Member

那就不是内存泄露了,内存泄漏要一直不断增长直到用光系统内存

@nashgao
Copy link
Author

nashgao commented May 21, 2024

使用malloc的表现是很像内存泄漏的,每次以16mb增长,并且在这期间服务器没有做任何事情,没有请求,没有计时器,只有协程被挂起。

那么总的来说,swoole的一个协程,如果使用jemalloc,实际占用内存就是2mb, 而不是虚拟内存;如果使用malloc,如果协程被挂起,最开始的时候内存占用是很小的,但是由于malloc的原因,最后每个协程还是会占用2mb的内存空间。

是这样理解吗

@NathanFreeman
Copy link
Member

都是分配2M

@nashgao nashgao closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants