How will we train junior developers in a world of generative AI?

How will we train junior developers in a world of generative AI?And how do we get young people into software development when AI is already writing decent code for them? This is one of the most pressing challenges facing our industry today. GenAI is changing the learning curve, the incentives—and even the definition of what […]

How will we train junior developers in a world of generative AI? Read More »

The Illusion of Democratized AI — And Why It Should Worry You

It’s become fashionable to talk about “open” and “democratized” AI. But the reality is that training frontier LLMs like GPT-4 is structurally out of reach for all but a few hyperscalers. The reason isn’t just talent or data. It’s mainly infrastructure. Modern AI training requires tens of thousands of GPUs operating in perfect sync across

The Illusion of Democratized AI — And Why It Should Worry You Read More »

Your real team as CTO

One of the biggest mistakes I see new CTOs make is thinking their team is still the tech team. It’s not. Once you step into an executive role, your primary team becomes the C-suite. Of course, you still lead engineering. You are still responsible for the systems, the architecture, the platforms. But if you are

Your real team as CTO Read More »

Easy async function execution in Python/Django using a queue and consumer Thread

I didn’t want to setup django-q2, Celery, or any of the other heavier background task running options for Django, since I just wanted to do basic API calls and save tracking data to the db, without blocking user requests. My async work also didn’t need to be transactional, so losing the queue due to app

Easy async function execution in Python/Django using a queue and consumer Thread Read More »

How to work for an international software company from SA and maximise your earnings

I’m always astounded at how little South African companies pay for the same roles and experience as European companies do. The reality is that EU companies can generally afford to pay more than most SA companies, and the demand for talent is so high that developers are writing their own cheques. And often the EU

How to work for an international software company from SA and maximise your earnings Read More »

I get NoNodeAvailableException on long-running processes using SSL TransportClient with Found.no

This is because Found attaches multiple IPs to your 0298347602938ahdf.us-east-1.aws.found.io hostname. So if you use a TransportClient with ssl on port 9343 and add the first IP you find with client.addTransportAddress(new InetSocketTransportAddress(host, port)), it’ll eventually stop working because it’s stuck with an old, invalid IP. The solution is to lookup all the IPs on the hostname

I get NoNodeAvailableException on long-running processes using SSL TransportClient with Found.no Read More »

When I index something in Elasticsearch, why doesn’t it show in my search straight after?

I got burnt by this little architectural nuance in Elasticsearch recently. While batch processing items in a content store, updating their status, then searching for more items, I kept getting stale data and didn’t understand why. It turned out that Elasticsearch is _near_ realtime, with a default 1s refresh interval. So if you index and query within

When I index something in Elasticsearch, why doesn’t it show in my search straight after? Read More »

Why does Apache Nutch sometimes get stuck using a single thread and crawling slowly?

Nutch generates a list of urls to fetch from the crawldb. In ./bin/crawl it defaults the size of the fetch list to sizeFetchList=50000. If you use the default setting generate.max.count=-1 which is unrestricted, you can potentially end up with 50000 urls from the same domain in your fetch list. Then the setting fetcher.queue.mode=byHost only creates

Why does Apache Nutch sometimes get stuck using a single thread and crawling slowly? Read More »