On the same predicament, I am looking if I can create adjustable weights like tags based on the compressed context.
This would create a level of importance to the part of memory that is needed based on context chat/ thinking
So retrieving memory would be looking statistically for the importance of the token instead of looking for static tokens in static.md files
Just thinking out loud here☺️