New to Elastic search, using an index to store the documents which e.g. is company information about employees, there are currently 600,000 employee data in the index, among these employees we need to figure out the distance calculations based on a particular attribute like address. What we do is essentially the following steps:
- Pull all the documents within the index in a java program.
- Use Lambdas in for parallelism and iterate over each document and then calculate the distance(lavenshtien, NGram and TFID) with other elements in the collections and then average out the value.
The problem with this flow is that we load all the documents present in the index in java memory and then apply the formulas, this eats up a lot of time both to load and apply the formulas, moreover JVM will have memory limitation to load the documents in memory.
Forgive me for limited knowledge on the subject but is there a way in which we can run these distance formulas directly on elastic search rather than loading the whole index in the memory.
Thanks for help...