Implementation options of task to add weight #36
Replies: 7 comments 4 replies
-
wordfreq looks interesting. May be we can use this by default with some language (use this coefficients or not). But there is a question about custom coefficients, how can we set these for some specific task (And can we do it with wordfreq). I've noted that wordfreq uses some special file for other languages, may be we can use |
Beta Was this translation helpful? Give feedback.
-
Interesting article |
Beta Was this translation helpful? Give feedback.
-
Yes, got it. I need time to read it closely. |
Beta Was this translation helpful? Give feedback.
-
Got it. If some text will be very long and all words will be important we will make many duplicates with words and important words. In this case may be better to set important words in TokenText object. But If we will have a few important words and big text it will be uncomfortable to convert May be we can make both of this options:
??? |
Beta Was this translation helpful? Give feedback.
-
1 variant: text = 'one two three'
texts = [TokenText('one', important=1.0), TokenText('two', important=0.0), TokenText('three')]
result = find_similar(text, texts) 2 variant: text = 'one two three'
texts = ['one', 'two', 'three']
important = [{'one': 1.0}, {'two': 0.0}]
result = find_similar(text, texts, important=important) Is it yours variant what a you talking about above? Theoretically we can combine both variants. May be you meant something else, in this case show an example please. |
Beta Was this translation helpful? Give feedback.
-
My variant was closer to 2 variant, something like that:
|
Beta Was this translation helpful? Give feedback.
-
Okay, understand. Like I said we can implement this one or both. Let's start with this one. |
Beta Was this translation helpful? Give feedback.
-
I have two ideas how we can improve the algorithm by adding weight to tokens.
@quillcraftsman What do you think about it?
Beta Was this translation helpful? Give feedback.
All reactions