Tuesday, May 22, 2007

Testing: Output format statistics

I have just polished up code and data and run a comparative test, the results are interesting.

The large GPS sample data was first run through 'NMEA' format, selecting only GGA sentences. There are in total 2898111 sentences, which approximates 33 days, 13 hours, 1 minute and 51 seconds of positioning data if captured continuously at 1 second intervals. There are no invalid (i.e. non-lock, no position) sentences in here: they're all valid with usable data.

I then used this resulting data to test KML and CSV formats, using text and binary encoding options, and compression.

In NMEA mode, with GGA sentences: 190M file (2898111 lines).
In KML mode: 60M file (2643355 lines).
In CSV mode, with text encoding, and lon,lat,alt elements: 60M file.
In CSV mode, with binary encoding (no compress), and lon,lat,alt elements: 25M file.
In CSV mode, with binary encoding (compressed), and lon,lat,alt elements: 7.9M file.

When repeated with 'compress=true', the resulting output files are:

In NMEA mode, as above: 27M.
In KML mode, as above: 11M.
In CSV mode, with text encoding, as above: 11M.
In CSV mode, with binary encoding (no compress), as above: 12M.
In CSV mode, with binary encoding (compressed), as above: 5.8M.

I'll discuss this in further detail later, but it indicates the success of the generic and binary compression features. Consider about 1 month of data at 8M (using binary encoding, with compression, which happens to be the best-bang-for-buck), you could get 64 months on a 512M SD card, or 5.3 years. What an overkill! Remember that it's not about preserving card size, it's about preserving card writes, and thus power efficiency.

No comments: