Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

方括号干扰以英文字母结尾的用户自定义词分词结果 #754

Open
ldwnt opened this issue Jul 28, 2020 · 6 comments
Open

方括号干扰以英文字母结尾的用户自定义词分词结果 #754

ldwnt opened this issue Jul 28, 2020 · 6 comments

Comments

@ldwnt
Copy link

ldwnt commented Jul 28, 2020

版本:ansj_seg-5.1.3.jar

public class Test {
    public static void main(String[] sts) {

        List<Value> values = new ArrayList<>();
        values.add(new Value("农银货币B", new String[]{"n", "1000"}));
        Forest forest = Library.makeForest(values);

        Analysis analysis = new DicAnalysis().setForests(forest, DicLibrary.get());
        List<Term> terms =  analysis.parseStr("农银货币B 农银货币B[]").getTerms();
        System.out.println(terms);
    }
}

执行结果:
[农银货币B/n, , 农/ng, 银/ng, 货币/n, b/en]

@shi-yuan
Copy link
Member

升级nlp-lang到最新版本1.7.8

@ldwnt
Copy link
Author

ldwnt commented Aug 21, 2020

升级nlp-lang到最新版本1.7.8

多谢,实测有效,请问对应的是nlp-lang哪个bug呢

@shi-yuan
Copy link
Member

主要涉及两个问题:
1、isEnglish判断有问题
2、之前版本工具类WordAlert的alertEnglish、alertNumber等方法,对传入的原始字符数组,有做变更,新版没有

@ldwnt
Copy link
Author

ldwnt commented Aug 28, 2020

主要涉及两个问题:
1、isEnglish判断有问题
2、之前版本工具类WordAlert的alertEnglish、alertNumber等方法,对传入的原始字符数组,有做变更,新版没有

相关代码有分支可以参考下么?直接升级1.7.8会有词库大小写敏感的问题,想把fix的代码cherry pick到1.7.2上

@shi-yuan
Copy link
Member

shi-yuan commented Sep 6, 2020

依据org.nlpcn.commons.lang.util.WordAlert#isEnglish(char)和org.nlpcn.commons.lang.util.WordAlert#isNumber(char),
修改
org.nlpcn.commons.lang.tire.SmartGetWord#isE
org.nlpcn.commons.lang.tire.SmartGetWord#isNum

@ldwnt
Copy link
Author

ldwnt commented Sep 20, 2020

依据org.nlpcn.commons.lang.util.WordAlert#isEnglish(char)和org.nlpcn.commons.lang.util.WordAlert#isNumber(char),
修改
org.nlpcn.commons.lang.tire.SmartGetWord#isE
org.nlpcn.commons.lang.tire.SmartGetWord#isNum

多谢,我试下~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants