Tuesday, May 22, 2007

Testing: Output format statistics

I have just polished up code and data and run a comparative test, the results are interesting.

The large GPS sample data was first run through 'NMEA' format, selecting only GGA sentences. There are in total 2898111 sentences, which approximates 33 days, 13 hours, 1 minute and 51 seconds of positioning data if captured continuously at 1 second intervals. There are no invalid (i.e. non-lock, no position) sentences in here: they're all valid with usable data.

I then used this resulting data to test KML and CSV formats, using text and binary encoding options, and compression.

In NMEA mode, with GGA sentences: 190M file (2898111 lines).
In KML mode: 60M file (2643355 lines).
In CSV mode, with text encoding, and lon,lat,alt elements: 60M file.
In CSV mode, with binary encoding (no compress), and lon,lat,alt elements: 25M file.
In CSV mode, with binary encoding (compressed), and lon,lat,alt elements: 7.9M file.

When repeated with 'compress=true', the resulting output files are:

In NMEA mode, as above: 27M.
In KML mode, as above: 11M.
In CSV mode, with text encoding, as above: 11M.
In CSV mode, with binary encoding (no compress), as above: 12M.
In CSV mode, with binary encoding (compressed), as above: 5.8M.

I'll discuss this in further detail later, but it indicates the success of the generic and binary compression features. Consider about 1 month of data at 8M (using binary encoding, with compression, which happens to be the best-bang-for-buck), you could get 64 months on a 512M SD card, or 5.3 years. What an overkill! Remember that it's not about preserving card size, it's about preserving card writes, and thus power efficiency.

Thursday, May 17, 2007

Progress: update for 0.94 release

Work for the 0.94 release is coming along. Most of the main work has been completed: (a) porting emulator to Linux, to support both FreeBSD & Linux; (b) porting firmware to GPS Logger V2.4, to support both V1.0 and V2.4; (c) supporting generic and binary compression for output data; (d) PC based application testing for both V1.0 and V2.4. There are also a bunch of smaller scale additions and modifications that I won't go into here right now.

All I have to complete now are some performance cleanups (some code inlining thanks to a gprof analysis), some emulator cleanups (clean termination at end of input files) and a round of testing. The testing can't be understated: I need to (a) test code against GPS sample data, (b) produce compression statistics from same data, (c) test power consumption for this and original firmwares, (d) general code review and functional testing.

I have also decided on the roadmap for future releases: the next one, a 0.96, will focus exclusively on reworking and extending the GPS data handling, by (a) the logger programming the GPS module for specific sentence, powersave and other features; (b) support sparkfun V2.4 firmware features (hold off, etc); (c) customisable filenames with date/timestamps; (d) a separate utility tool for conversion from raw/nmea to kml/csv with binary/text encoding and using compression; (e) power/performance improvements as necessary. Beyond that, a 0.98 release will be entirely dedicated to power/performance optimisations and improvements, and additional debug/inteface features (e.g. I may add in an xmodem download feature (so you can extract logs without having to remove the SD card). After 0.98 comes 1.00, when all should be stable, working and optimal.

I hope to have a 0.94 release ready in the next 3-4 weeks. Let me know now if there's anything you're interested in.

Discussion: to GGA or RMC

I noticed with the GPS Logger V1.0 using Trimble Lassen iQ, that the default output sentences were GGA and VTG. This provided the necessary position and velocity information. It also provide a timestamp (UTC), but not a datestamp. So, for a proper datestamp I either needed an additional RMC or ZDA sentence.

With the GPS Logger V2.4, using US Globalsat EM-406, there are a whole bunch of default sentences, one of which is RMC. Now, RMC sounds great: it has validity field, position, velocity, timestamp, datestamp, but it lacks an altitude. Just that one extra field would have made all the difference.

So, in any case, not in this version of GPS Logger MG, but in a future version, the GPS module (either Lassen iQ or EM-406) will be programmed at start up to only spit out the bare minimum sentences necessary. That's easy with the EM-406 out of the box, because it supports an inbound interface. For the Lassen iQ, it needs to be pre-configured to accepted TSIP on the port.

Tuesday, May 15, 2007

Internals: Supporting binary output compression

I have been working on improving the CSV format’s binary encoding to reduce the volume of data being logged, which will reduce SD card writes and thus improve power performance. The following is an attempt to explain how compressed binary encoding can produce an 8x size improvement over text encoding.

Approach

The logger is designed to capture and output positioning data (i.e. longitude, latitude, altitude, timestamp, etc) at a regular rate. The native encoding of this data is absolute. For example, the absolute encoding of a latitude position requires 28 bits (a 1 bit sign, a 7 bit integral degree, and a 20 bit fractional degree).

However, the logger is typically moving at a velocity. This velocity has practical limits as it is unlikely to be faster than the fastest vehicle known to humans, and will typically be much less (e.g. walking, cycling or driving speeds). Each positioning data that is output will be at some offset to the proceeding data.

This could mean that while large numbers are required to express absolute positions, much lower numbers are required to express the relative offset between them. For example, consider an airplane that is moving at 939km/h in the latitude direction only, with a logger onboard that is generating positioning data every 1 second. With each latitude degree being about 111km, then the absolute latitude changes by 0.00235 at each interval. This relative offset can be encoded in about 12 bits, which is 50% more efficient than the 28 bits required for the absolute encoding.

That is the first decision: to use relative offsets.

If relative offsets are to be used, how should they be encoded? The simple approach is to store the offset value itself. A more complex, but ultimately more adaptive and optimal, approach is to use a statistical bit encoder (e.g. huffman, arithmetic, etc) and store the resulting bitstream. For now, the simple approach will suffice.

That’s the second decision: to store the actual relative offsets.

The next problem is in the handling large relative offsets (e.g. an extended interval with no coverage, say inside a moving vehicle, a road/rail tunnel, or a baggage compartment). In these cases, the relative offset soon becomes a large as an absolute position. There needs to be a limit to the size of the relative offset, which if exceeded, causes an absolute position to be used. These should be infrequent cases.

That’s the third decision: to allow both absolute positions and relative offsets.

The number of bits required to encode an absolute position is fixed, but should this also be the case for relative offsets? A variable size would require some form of length or terminal indicator to be used. This would be mandatory for the complex approach above as the statistical bit encoders produce variable length outputs, but can be chosen either way for the sample approach. For now, the simplest option of a fixed bit size will be used.

That’s the fourth decision: to use fixed (pre-determined) bit sizes for relative offsets.

This begs the next question: what should the fixed bit sizes be? One approach is to calculate the theoretical limits to a relative offset (e.g. the maximum speed of vehicles). Another approach is to use empirical observations, and 700MB of GPS sample data is available to help with that. By analysing the relative offsets, an optimal fixed bit size can be computed.

That’s the fifth decision: to empirically determine the fixed bit sizes for the relative offsets.

There’s one final implementation detail: how are absolute or relative entries distinguished in the output. There could be an absolute/relative indicator specified (a) for the entire entry (i.e. all fields in the entry have the same encoding), or (b) for individual fields (e.g. an absolute altitude, but a relative timestamp). Additionally, the indicator could be specified for (1) each entry/field, or (2) only when the current run changes (e.g. a relative indicator, then 50 altitudes relative encoded, then an absolute indicator, etc). For simplicity, the indicator is chosen to be a single bit (0 for absolute, 1 for relative), and specified on each field (b) at each time (1).

That’s the sixth decision: to use a one bit indicator on every field in every entry to indicate its encoding method.

Analysis

The fixed bit sizes for the relative offset encoding were derived from the 700MB of GPS sample data. The NMEA GPGGA sentences in the data were converted into binary format with absolute encoding, resulting in 2,083,953 entries. The relative offsets were computed and turned into a cumulative histogram.

The histogram showed the number of entries that could be encoded within a given range of values, i.e. a particular number of bits. For example, the altitude field had 1,394,639 fields of offset 0, 1,963,859 of offset 0-1, and 2,082,057 fields of offset 0-127. This means that 99.90902% of offsets can be encoded as relative offsets within 7 bits (0-127), and the remaining would need to be encoded as absolute positions.

From this, it is easy to calculate the average bit size. For instance, the altitude field is 17 bits with absolute encoding (a 1 bit absolute indicator, a 1 bit sign, and a 15 bit absolute height), or (2 + n) bits with relative encoding (a 1 bit relative indicator, a 1 bit sign, and an #bits bit relative height). With this, the total output (encoded) size of all altitudes is (#values[n] * (2 + n)) + ((#total - #values[n]) * 17), where #total is 2,083,953 and #values[n] is obtained from the histogram. Dividing the total output size by the total number of entries gives the average bit per entry. It’s just a matter of looking at what number of bits (n) results in the lowest average bit per entry.

When this analysis is carried out for all positional fields:
  • Longitude: since 99.9762% of integral parts of the degree are at offset 0, specify that relative encoding can only occur if the integral field is unchanged. The fractional part of the degree is optimally encoded in 9 bits (37.55%), rather than 8 bits (45.64%) or 10 bits (40.20%).
  • Latitude: since 99.9968% of integral parts of the degree are at offset 0, specify that relative encoding can only occur if the integral field is unchanged. The fractional part of the degree is optimally encoded in 8 bits (35.68%) rather than 7 bits (45.72%) or 9 bits (38.17%).
  • Altitude: The value is optimally encoded in 1 bit (22.39%) rather than 0 bits (40.95%) or 2 bits (24.50%) or 3 bits (29.77%).
  • Timestamp: The value is optimally encoded in 1 bit (20.46%) rather than 0 bits (99.99%) or 2 bits (20.64%) or 3 bits (24.21%).
Results

With all fields having absolute encoding, an entry requires 94 bits. If all fields have relative encoding, this drops to 25 bits, giving a best case compression factor of 26.60%. However, the sample data shows that the best case is not always achieved, and on average, an entry requires 29.50 bits, giving an average case compression factor of 31.38%. That is a factor of 3 reduction in size and possibly (a tenuous extrapolation) a factor of 3 increase in power performance.

In summary, consider logging at 1 second intervals in CSV format, storing longitude, latitude, altitude and a timestamp. With text encoding, approximately 31 bits are required per entry, which fills the 24K onboard file cache in 13.2 minutes. With the original, uncompressed, binary encoding, 90 bits are used, filling the cache in 36.4 minutes. With the compressed binary encoding, 29.50 bits are used, filling the cache in 111.1 minutes. The compressed binary encoding can give an 8x size improvement over text encoding, possibly resulting in an 8x power improvement.

Saturday, May 12, 2007

Internals: Suppporting GPS Logger V2.4

I just received my GPS Logger V2.4, which is smaller and more compact than the V1.0. Having looked through the software differences between V1.0, I can see more attention to power saving in V2.4 - although V2.4 no longer supports native KML encoding.

For GPS Logger MG to support V2.4, there isn't a lot of work. The hardware is essentially the same: there's only an additional LED, a new GPIO pin used to control power to the GPS module, and a GPIO pin used to monitor the shutdown button (in V1.0, the button was directly wired to reset circuitry). Supporting these is trivial.

The software has a number of differences: (a) no native KML output mode, raw NMEA sentences only, (b) configurable NMEA sentence filtering; (c) configurable hold-off timer, (d) a configurable time between logs, (e) UART capturing under IRQ. I presently support all of these other than (c) and (d). I do support (d), but in a different way.

My current plan is for the next release to support the V2.4 hardware, as well as the V1.0 hardware, but not to support the (c) and (d) features above. I will also do an application test for V2.4 prior to the next release, and a power consumption test (using GPS Logger MG, and the original sparkfun V1.0 and V2.4 firmwares).

UPDATE: That was easy, V2.4 hardware support now completed and tested: rather than poll GPIO P0.16 to test for stop button actuation, I configured the port for EINT0 (triggered: edge, low). The EM-406 spits out a lot of sentences! I also added a new 'nmea' mode (alongside 'raw','kml' and 'csv'): 'raw' is a direct pipe from the GPS module, whereas 'nmea' selects only NMEA sentences.

Wednesday, May 09, 2007

Testing: PC based realtime GPS applications

Update 2007/05/15: now tested with firmware on V2.4, and all applications pass (incl. Earth Bridge).

I tested the GPS logger with a bunch of PC based realtime GPS applications. To use the GPS logger in this mode, you need (a) to have serial output (e.g. using the SparkFun "LPC Serial Port Boot Loader Interface"); (b) enable the GPS logger configuration (e.g. set "GPSCONFG.TXT" options: mode=pass, format=raw, pass_serial_speed=4800, gps_output_sentences=gga,vtg). Use a terminal program to verify that the sentences are being emitted.

Initially, I used a serial port speed of 9600bps, but soon found that a few applications assumed 4800bps (the strict NMEA standard rate) and did not allow for anything else to be specified. I moved to testing everything at 4800bps. Also, I haven't pre-programmed my Lassen iQ, so it currently only spits out the factory default NMEA sentences: GGA and VTG. For some applications, GSA and ZDA would have been necessary or desirable.

Applications were chosen because either (1) I currently use them, (2) they are reasonably popular and many people use them, (3) they make another useful test case. I'd be interested in additions to the list.

The majority work without problem: I'm not surprised because the GPS Logger in pass mode simply spits out the same NMEA sentences it receives from the Lassen iQ. But now there is some clear confirmation, and I know that no work (apart from changing the default serial port output speed to 4800) is required for the next release.

Mapping applications

The purpose of these applications is very clear: they provide some form of real-time mapping, navigation combined with further features, such as saving log files, etc.
  • Microsoft AutoRoute 2007 with GPS Locator: popular road trip planning and real-time directions: succeeded!
  • Google Earth Plus: using real-time GPS in Magellan/Serial mode with NMEA: succeeded!
  • GooPs: for Google Earth, providing GPS tracking and navigation: succeeded!
  • BUNGEE: for Google Earth, providing LIVE GPS tracking: succeeded!
  • Earth Bridge: for Google Earth, as a GPS bridge: failed: the diagnostics showed that the NMEA sentences looked fine, but application stated that it could not get a fix.
  • MeHere: networked GPS tracker for Google Maps and Google Earth: succeeded!
  • TopoFusion: GPS mapping software for Windows: succeeded!
  • GPS TrackMaker: providing real-time navigation: succeeded!
  • Memory-map V5 European Edition: route-planning and mapping software: succeeded!
Wireless network applications

These applications survey local wireless networks and can associate the results with GPS positioning information, typically used when scanning from a moving vehicle.
  • NetStumbler: wireless network analyser tool: succeeded!
  • WiFi Hopper: wireless network discovery and site survey: partially succeeded: NMEA sentences seen in diagnostic window, but not shown in main window because of license expiry.
  • WirelessMon: 802.11 wireless monitoring tool: succeeded!
Diagnostic applications

These are various tools used for diagnostic purposes, to show, interpret, convert or do other presentation and manipulation activities on the raw data itself. Most of these would have preferred more NMEA sentences (other than GGA and VTG) to work with.
  • GPSDiag: performing simple interpretation and presentation of NMEA data: succeeded!
  • GPS Utility: manage, manipulate and map GPS information: succeeded!
  • GPS Express: variety of reception, interpretion and mapping functions: succeeded!
  • VisualGPS: command monitor and graphical viewer of NMEA data: succeeded!
  • NMEA Sentence Logger: file and network logging of NMEA data: succeeded!
Timekeeping applications

These use GPS timing information to display clock, provide local or network time.

  • GPS Time and Test: provides clock synchronisation: failed: needs to see ZDA sentences which I do not currently have configured.

Sunday, May 06, 2007

Hintsandtips: Using the 'gpsconv' tool

I've been using, on occasion as various data manipulation needs require, the gpsconv tool from Raymond Choc. It's Perl based, so you'll need a Unix environment (Linux, BSD, etc) or say cygwin or other for Windows. Converts a whole bunch of stuff (hope that I express these properly): Magellan-NMEA-Format, NMEA 1803, TOPS50, KOMPASS, AlpenVerein, Fugawi, PCX5, g7t, Garmin, Google Earth KML, and GPX. There's also a GUI.

It's here (in German): http://velaa.ve.funpic.de/gpsconv.htm.

Thanks Raymond!

Friday, May 04, 2007

Hintsandtips: Using a basic KML template

I use a really basic KML template to wrap KML based results for use with Google Earth. Just replace the Name and Description fields with text, and the Results field with co-ordinates (in "lon,lat,alt" format).

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0">
    <placemark>
        <name>Name</name>
        <description>Description</description>
        <style>
            <linestyle>
                <width>4</width>
            </linestyle>
        </style>
        <linestring>
            <tessellate>1</tessellate>
            <altitudemode>clampToGround</altitudemode>
            <coordinates>
                Results
            </coordinates>
        </linestring>
    </placemark>
</kml>

To convert the .kml file into a .kmz file, just ZIP it, e.g. 'zip -9 -m file.kmz file.kml'.

Thursday, May 03, 2007

Planning: GPS Logger MG - Release 0.95

I'd be interested to hear any comments or suggestions for the next release. The current plans are to:

  • introduce CSV binary output compression and generic output compression;
  • verify 'pass' mode operation with a number of GPS applications (Google Earth, Microsoft Autoroute, etc);
  • support direct to local filesystem (rather than to emulated FAT filesystem) for emulator;
  • support more extensive unit tests using NMEA GPS sample data sets;
  • make various minor code level performance and resource usage improvements;
  • make various minor emulator signal and termination handling cleanups;
  • perform power consumption tests in different modes and against original firmware.
Anything else? Any defects to fix?

Internals: GPS NMEA sentence samples

You could do as I did, and spend an hour or two searching for and selecting data (a Google search for '$GPGGA'), or you could use the following links from which there is about 700MB of raw sample data.

I've been using this material to (a) assess LZARI performance, and (b) test the robustness of the GPS Logger NMEA parser.

Hopefully you'll find them useful, and by the time you do, hopefully they won't have disappeared. Unfortunately because of licensing reasons, I can't make your life easier by tarring up my sample set and redistributing it.

If you're aware of any other useful material, please let us all know.

Wednesday, May 02, 2007

Internals: Supporting generic LZARI output compression

One objective I have for the next release is to improve compression performance. This will lead to lower SD card write cycles, thus improving power performance. There are two strategies involved. The first is to introduce a compression mechanism for the binary CSV output format. The second is to introduce a general compression algorithm for use with any output format.

Overview

For the general compression algorithm, I've chosen LZARI (using code written by Haruhiko Okumura on 4/7/1989 under a free license) from several candidates. It is reasonably fast, has low resource requirements, provides good compression performance, and is adaptable for embedded use.

LZARI works to reduce redundancy by compressing repeated character strings (the LZ part), and passing the resulting 'symbols' through an arithmetic encoder (the ARI part). Character strings are stored in a binary tree for fast matching.

Implementation

There was a substantial amount of work involved in adapting the code for the GPS Logger MG:

  • all non-local variables were extracted out to a single context structure for allocation from a single chunk of SRAM, and to preserve state across function calls;
  • all variables were analysed and converted to a quantifiable size (either 8, 16 or 32 bits, rather than simply "int") to optimise variable memory use;
  • the API was changed from being stream based (file I/O operations) to being data based (read/write data blocks) [which turned out to be the most difficult challenge, requiring the introduction of state logic];
  • the end of encoded data is now signified by the use of a specific compression symbol rather than relying upon knowing and storing the uncompressed data size ahead of encoding;
  • selective compilation of either or both decode or encode routines was enabled, as the release build requires encode only, whereas the debug build requires both (the unit tests do decoding to verify that encoding works);
  • fine grained assert and lint checking was introduced, to improve quality.
Evaluation

To fine tune the implementation for use, I needed to evaluate possible variations of the tunable compression parameters against real sample data. I wanted to find a tradeoff between good performance but low variable memory requirements (currently 24K of SRAM is available as a file cache, and I don't want the variable memory requirements to reduce this by too much).

There are two tunable compression parameters. The buffer-size specifies the size of the pool of data from which strings can be matched. The max-match-size specifies the maximum length that a matched string can be. In theory, the greater these are, the greater the potential compression ratio, but the trade off is that variable memory requirements grow.

Using buffer-sizes of 256 or 512 with max-match-sizes of 48, 64, 96 or 128, resulted in a range of variable memory requirements from 5328 to 8352 bytes.

The sample data was collected from the internet, being various NMEA logs captured for scientific, educational, personal or other purposes. From some 20 sources, a total of about 700MB was gathered. The data was tested by compressing it (a) as originally found (with all manner of standard or proprietary NMEA sentences), (b) by selecting valid NMEA sentences only (i.e. ^$GP with valid checksums), or (c) by selecting valid GGA NMEA sentences only.

Conclusions

The results show that the average compression ratios achieved were (a) 20%-25%, (b) 16%, and (c) 25%-27%. Using a buffer-size of 512 required 50% more processing time than a buffer-size of 256, even though compression performance improved by less than 5%. There was little impact on processing time with different max-match-size. From the 6 million NMEA sentences in (b), I found that the maximum sentence length was 103 bytes, meaning that a max-match-size of 96 was a better choice than 128.

For this reason, I elected to use a buffer-size of 256, and a max-match-size of 96. It provides good compression, requires only 5760 bytes of variable memory, and lower processing overhead.

If LZARI is enabled (using a new configuration option 'compress' which defaults to 'no'), the maximum file cache size becomes 20K rather than 24K, but the data it holds is now compressed by up to 4.4x. Practically, this means that 20K of compressed data in the cache corresponds to 88K of uncompressed data, which previously would have required 3.7x cache (at 24K). This means that using LZARI can reduce SD card writes by 3.7x, roughly 4x, which should have a significant impact on power performance.

Report: Driving in Portugal

I've just returned from a family trip to Portugal, where we did a lot of driving. I'm pleased to report that 9 days of driving (some 1000km overall) was successfully captured to KML using Release 0.91. Battery life was also very impressive but I did not keep a detailed observation.

There was only one minor issue: converting the default CSV format "lat,lon,alt,tim" to KML involves removing "tim" and transposing "lat" and "lon" to obtain "lon,lat,alt". That's a little annoying. In future the default CSV format will change to "lon,lat,alt,tim" so that only one step will be required.

Here's one of the trips, where we drove up to Figueira da Foz, then down south to the beach, before heading back via. Pombal to our cottage in Penela.


I can really recommend central Portugal, it was enjoyable for me in year 2000 as a bachelor backpacker, and just as good now with a wife, toddler and rental car.

Release: GPS Logger MG - Release 0.91

This is to announce Release 0.91 of GPS Logger MG. This is the first public release. Refer to the userguide for details of features.

The package (firmware, userguide, etc) is here (400kb): http://tanundra.com/downloads/personal/gps_logger_mg-0.91.zip

The userguide (same as in the zip file) is here (90kb): http://tanundra.com/downloads/personal/gps_logger_mg-0.91.pdf

(Original: http://forum.sparkfun.com/viewtopic.php?t=6606)

Announce: First post

Welcome to the first post. The purpose of this blog is to document and discuss the development and operational use of the GPS Logger MG firmware, designed for the SparkFun series of GPS Loggers. I welcome your suggestions and comments.