Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HanLP语义相似度,希望可以输出句子的embedding以便做存储,提高效率 #1792

Open
1 task done
yuxulingche opened this issue Nov 16, 2022 · 2 comments
Open
1 task done
Assignees
Labels
feature request Suggest an idea for this project

Comments

@yuxulingche
Copy link

Describe the feature and the current behavior/state.
当前使用sts,输入两个句子,对于大量句子比较,效率太低,虽然可以batch来做,但效率还是不够

Will this change the current api? How?
可以在sts里增加一个输出

Who will benefit with this feature?
sts使用者

Are you willing to contribute it (Yes/No):
No

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Python version:
  • HanLP version:

Any other info
HanLP语义相似度比较的效果不错,非常感谢作者的贡献,但现在有大量句子需要比较,希望HanLP能增加输出句子embedding的功能,先存储,使用时算cos距离,提高实际使用中的比较效率

  • I've carefully completed this form.
@yuxulingche yuxulingche added the feature request Suggest an idea for this project label Nov 16, 2022
@hankcs
Copy link
Owner

hankcs commented Nov 16, 2022

Hi, 目前的STS模型需要同时输入一对句子计算相似度,不支持输出embedding。我们正在研发用于检索的句子embedding,敬请关注后续更新。

@yfq512
Copy link

yfq512 commented Apr 4, 2023

同样期待高效率的方法,目前可以使用simhash和bert的方法,但simhash准确率一般,bert计算量又大

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Suggest an idea for this project
Projects
None yet
Development

No branches or pull requests

3 participants