How to work for an international software company from SA and maximise your earnings

I’m always astounded at how little South African companies pay for the same roles and experience as European companies do. The reality is that EU companies can generally afford to pay more than most SA companies, and the demand for talent is so high that developers are writing their own cheques. And often the EU …

How to work for an international software company from SA and maximise your earnings Read More »

How to communicate between your Chrome extension and your SPA web app

I needed a Chrome extension that could open my single-page application and send any text field to it, and after editing, send the changes back to the field. Sounds simple, but it led me down many dead ends and complex APIs. The first catch was that chrome.windows.create can take a callback which gets a window object, …

How to communicate between your Chrome extension and your SPA web app Read More »

I get NoNodeAvailableException on long-running processes using SSL TransportClient with Found.no

This is because Found attaches multiple IPs to your 0298347602938ahdf.us-east-1.aws.found.io hostname. So if you use a TransportClient with ssl on port 9343 and add the first IP you find with client.addTransportAddress(new InetSocketTransportAddress(host, port)), it’ll eventually stop working because it’s stuck with an old, invalid IP. The solution is to lookup all the IPs on the hostname …

I get NoNodeAvailableException on long-running processes using SSL TransportClient with Found.no Read More »

When I index something in Elasticsearch, why doesn’t it show in my search straight after?

I got burnt by this little architectural nuance in Elasticsearch recently. While batch processing items in a content store, updating their status, then searching for more items, I kept getting stale data and didn’t understand why. It turned out that Elasticsearch is _near_ realtime, with a default 1s refresh interval. So if you index and query within …

When I index something in Elasticsearch, why doesn’t it show in my search straight after? Read More »

Why does Apache Nutch sometimes get stuck using a single thread and crawling slowly?

Nutch generates a list of urls to fetch from the crawldb. In ./bin/crawl it defaults the size of the fetch list to sizeFetchList=50000. If you use the default setting generate.max.count=-1 which is unrestricted, you can potentially end up with 50000 urls from the same domain in your fetch list. Then the setting fetcher.queue.mode=byHost only creates …

Why does Apache Nutch sometimes get stuck using a single thread and crawling slowly? Read More »

Centralising Clojure/Java logging with Logback, LogStash, ElasticSearch, and Kibana

Checking logs when you have more than one servers is painful. Use Logback/Logstash-forwarder to send json-formatted logs to a central server running Logstash/ElasticSearch/Kibana, where you can then slice and dice logs to your heart’s content with the power of ElasticSearch and Kibana. Confs and docs available here: https://github.com/vaughnd/centralised-logging

Keybinding for emacs helm to recursively grep certain file extensions in your src directories

Helm for Emacs is a fantastic Quicksilver-like extension, but it gets quite wordy sometimes. Instead of C-u M-x helm-do-grep *nav to dir* *enter extensions* *enter query* to recursively grep, I defined the following in my init.el. Now hitting F1 will grep actual source across all my projects. (defun project-search () (interactive) (helm-do-grep-1 ‘(“/home/vaughn/src”) ‘(4) nil …

Keybinding for emacs helm to recursively grep certain file extensions in your src directories Read More »

Deploying Datomic free on EC2 or any Ubuntu system

Here’s a quick rundown on getting Datomic free running on EC2 or any Ubuntu system. This includes a startup script, and a symlinked runtime to make upgrading Datomic less painful. I highly recommend scripting this and the rest of your cloud with Pallet. Start up an EC2 instance (preferably m1.small since Datomic wants 1GB ram). …

Deploying Datomic free on EC2 or any Ubuntu system Read More »

Integrating Logback with Clojure

I’ve uploaded a sample project showing you how to integrate Logback with Clojure. There are simpler Clojure logging libraries like Timbre , but using a standardised Java logging has the handy advantage of letting you control logging from non-clojure code as well as redirecting to separate log files and appenders.