This is because Found attaches multiple IPs to your 0298347602938ahdf.us-east-1.aws.found.io hostname. So if you use a TransportClient with ssl on port 9343 and add the first IP you find with client.addTransportAddress(new InetSocketTransportAddress(host, port)), it’ll eventually stop working because it’s stuck with an old, invalid IP. The solution is to lookup all the IPs on the hostname and add them to the TransportClient, then do this every 1-5min (or something less than the DNS TTL). The TransportClient will check for duplicates and reachability, so you should have a stable system now. I wrote a gist for doing this with Spring and Groovy: https://gist.github.com/vaughnd/04350e4c5bf51dedabb8

I got burnt by this little architectural nuance in Elasticsearch recently. While batch processing items in a content store, updating their status, then searching for more items, I kept getting stale data and didn’t understand why. It turned out that Elasticsearch is _near_ realtime, with a default 1s refresh interval. So if you index and query within a second, you’re going to see old data. The best way around this is to do a refresh on the index just before you access it to make sure you have the latest data.

Checking logs when you have more than one servers is painful. Use Logback/Logstash-forwarder to send json-formatted logs to a central server running Logstash/ElasticSearch/Kibana, where you can then slice and dice logs to your heart’s content with the power of ElasticSearch and Kibana.

Confs and docs available here: https://github.com/vaughnd/centralised-logging