Programming in Paradise

Friday, April 28, 2006

Size of Directories

du -h --max-depth=1

Quite useful for determining which directories in a Linux system are taking up lots of disk space. Now if I could just figure out *why* Lighttpd logs are getting up to 4 GB so quickly then I wouldn't have this problem.

Saturday, April 15, 2006

Ruby for ETL

Seth and I are throwing together a little ETL tool in Ruby. Stay tuned to see what comes of it. We've found that while DTS is ok for simple, clean source data that it starts to fall apart when you have a large number of source tables and where the source files tend to change with each new data dump. Granted, this could be solved largely with improved source data, that is often not possible to force. Seth has also been using Kettle a bit, but it lacks the performance and line-level error handling that we need. We're also looking at commercial packages, but they tend to be pricey. In the long run it may be worth it to purchase a fifty-thousand dollar ETL tool, but not at the moment. Ruby is well suited for simple Domain Specific Languages so we're using it as the bases for ETL and we'll see where it leads us. It's a nice diversion from loading data and building reports anyhow. :-)