To combat overly long, keyword stuffed documents (hello 2,000 word articles my old friend), information retrieval systems have a Pivoted Document Length Normalisation mechanic.
Whilst still measuring the frequency a word appears (Term Frequency) and query-document similarity scores (Euclidean Length), term and topic bloat is countered by normalisation.
Think of it like a golf handicap.
‘You sure can drive the ball far, but your short game is shit.’
Feb 9
at
8:28 AM
Relevant people
Log in or sign up
Join the most interesting and insightful discussions.