Elastic Search Lucene formula calculation Using Java

问题: New to Elastic search, using an index to store the documents which e.g. is company information about employees, there are currently 600,000 employee data in the index, amon...

问题:

New to Elastic search, using an index to store the documents which e.g. is company information about employees, there are currently 600,000 employee data in the index, among these employees we need to figure out the distance calculations based on a particular attribute like address. What we do is essentially the following steps:

  • Pull all the documents within the index in a java program.
  • Use Lambdas in for parallelism and iterate over each document and then calculate the distance(lavenshtien, NGram and TFID) with other elements in the collections and then average out the value.

The problem with this flow is that we load all the documents present in the index in java memory and then apply the formulas, this eats up a lot of time both to load and apply the formulas, moreover JVM will have memory limitation to load the documents in memory.

Forgive me for limited knowledge on the subject but is there a way in which we can run these distance formulas directly on elastic search rather than loading the whole index in the memory.

Thanks for help...


回答1:

There is a data type in elasticsearch for geo points: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html

if you able to convert the address you have to their lat/lon coordinates, either you have it, or use a service that resolves addresses to geo point, then you can map that geo point field to an elasticsearch geo_point (in the index template, see the link I've pasted above for example). if you won't map the field to geo_point elasticsearch would treat it as an array of floats, which is float type.

Once you have geo point field, you can start running some distance aggregations on it. There are three aggregations that work with fields of type geo_point. see the options here: https://www.elastic.co/guide/en/elasticsearch/guide/current/geo-aggs.html

  • 发表于 2019-01-17 15:16
  • 阅读 ( 201 )
  • 分类:网络文章

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除