You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, We've recently been trying to use the dynamic embedding feature in torchrec contrib, but we have encountered a few challenges. The process may result in a segmentation fault.
the code like,as mentioned in readme, using a tde.wrap
Hi, We've recently been trying to use the dynamic embedding feature in torchrec contrib, but we have encountered a few challenges. The process may result in a segmentation fault.
the code like,as mentioned in readme, using a tde.wrap
we find two problems that may cause this
1) fetch_notifications_ is not thread-safe.
2)Tensor data might be recycled during pushing operations.
For the first issue, the fetch_notifications_ variable , defined herehttps://github.com/pytorch/torchrec/blob/main/contrib/dynamic_embedding/src/tde/ps.h#L108 , is accessed in both the pull and sync fetch functions. As these functions are called in different threads, there might be a thread safety issue.
For the second issue, the tensor (https://github.com/pytorch/torchrec/blob/main/contrib/dynamic_embedding/src/tde/ps.cpp#L122) generated by concat in the push function is a temporary variable. The subsequent call to io.push is asynchronous and does not copy the data (
https://github.com/pytorch/torchrec/blob/main/contrib/dynamic_embedding/src/tde/ps.cpp#L139) . Therefore, the data might have been recycled by the time it is executed asynchronously, leading to access to invalid data.
The text was updated successfully, but these errors were encountered: