DevOps – Vaughn Dickson Consulting

When I index something in Elasticsearch, why doesn’t it show in my search straight after?

2015-10-16 / DevOps, Elasticsearch, Java

I got burnt by this little architectural nuance in Elasticsearch recently. While batch processing items in a content store, updating their status, then searching for more items, I kept getting stale data and didn’t understand why. It turned out that Elasticsearch is _near_ realtime, with a default 1s refresh interval. So if you index and query within […]

When I index something in Elasticsearch, why doesn’t it show in my search straight after? Read More »

Why does Apache Nutch sometimes get stuck using a single thread and crawling slowly?

2015-10-16 / DevOps

Nutch generates a list of urls to fetch from the crawldb. In ./bin/crawl it defaults the size of the fetch list to sizeFetchList=50000. If you use the default setting generate.max.count=-1 which is unrestricted, you can potentially end up with 50000 urls from the same domain in your fetch list. Then the setting fetcher.queue.mode=byHost only creates

Why does Apache Nutch sometimes get stuck using a single thread and crawling slowly? Read More »

Centralising Clojure/Java logging with Logback, LogStash, ElasticSearch, and Kibana

2014-02-21 / Clojure, DevOps, Elasticsearch, Java, Linux

Checking logs when you have more than one servers is painful. Use Logback/Logstash-forwarder to send json-formatted logs to a central server running Logstash/ElasticSearch/Kibana, where you can then slice and dice logs to your heart’s content with the power of ElasticSearch and Kibana. Confs and docs available here: https://github.com/vaughnd/centralised-logging

Centralising Clojure/Java logging with Logback, LogStash, ElasticSearch, and Kibana Read More »

Deploying Datomic free on EC2 or any Ubuntu system

2012-11-20 / Datomic, DevOps, EC2, Linux

Here’s a quick rundown on getting Datomic free running on EC2 or any Ubuntu system. This includes a startup script, and a symlinked runtime to make upgrading Datomic less painful. I highly recommend scripting this and the rest of your cloud with Pallet. Upgrading datomic: Backing up and restoring: Customize this script and

Deploying Datomic free on EC2 or any Ubuntu system Read More »