Weather Data Logger

Mar. 2013

By Ofer Faigon
www.bitFormation.com

Articles

Weather Data Logger
A DIY / reverse engineering project

Background

A few years ago I bought a simple weather station for my home. The system consists of two devices: an outdoor unit which measures temperature and humidity outside the house and transmits the readings to an indoor unit that displays them on a small LCD. The indoor unit also displays indoor humidity and temperature, as well as barometric pressure, date and time and a few other tidbits.

La Crosse outdoor unit (sensor + transmitter) La Crosse WS9152 indoor unit (receiver + display) La Crosse WS9128 indoor unit (receiver + display)
The outdoor unit (left), shown installed on a wall and covered with a sun/rain protector cup, can be purchased alone. The indoor unit of the La Crosse WS9128 (right) displays received temperature and humidity as well as its own measurements of the indoor conditions. A similar system, the La Crosse WS9152 (middle), does not measure or display the outdoor humidity. Everything in this article applies equally to both models.

The problem with this system is that it does not log the data, nor does it have a connector that allows you to log the data on a PC. When you get up in the morning you have no way of knowing what the minimum temperature at night was or when humidity started rising. Advanced models with PC connectivity exist, but they are much more expensive and require proper installation in a suitable location for their additional rain gauge and wind speed/direction sensors to function correctly.

I decided to try and capture the transmitter's signal and see if I can feed it to my PC and decode it. Searching the web for the signal spec returned nothing. The manufacturer does not publish any technical details and nobody in the relevant forums seemed to be interested enough to reverse engineer and publish it on the web (or even be aware of this type of low end weather stations). The only detail I could find is the frequency the unit was transmitting on - 433 MHz - or so it said in the user's manual.

Capturing the Signal

It turns out that 433 MHz (or rather, 433.92 MHz) is one of four frequencies commonly used for short range remote controls and sensors, such as car alarms and garage door openers. Looking around I found that I can buy a tiny receiver called RR3-433.92 that is hard-tuned to this frequency for the reasonable price of roughly $20 at my local electronic parts shop (that was a few years ago; the current price is considerably lower). The plan was to connect this receiver to the PC sound card input and try to figure out what the received signals mean.

The RR3-433.92 hybrid card

The RR3-433.92 is a small hybrid component that is very easy to use. It has connections for power, antenna and output:

Connectiong the RR3-433.92 hybrid card
Typical application (from the RR3 datasheet) - all it requires is a single 5V power source and an antenna.

The RR3 datasheet can be found on the web in many places. Here is one: http://www.seapraha.cz/download/rr3.pdf

Power Supply

According to the RR3 datasheet, the device draws about 2.5 mA. This means that under the best conditions a set of four 2500 mAh rechargable batteries will completely drain in 4000 hours, or approximately 5.5 months. Since I wanted the whole setup to be as maintenance free as possible, I opted to using a small transformer instead (unfortunately, I still have to replace batteries in the transmitter once a year).

I used an old 6V DC stabilized transformer that I found in the old parts box. According to the RR3 spec, any voltage between 4.5V and 5.5V would do and the current it draws is so low (3 mA max) that even the smallest transformers can supply it. Using 6V instead of the max 5.5V is not a very good idea, but I did not have any 5V supply handy and from my experience this difference never caused any of my projects to malfunction.

Using a stabilized supply meant one less thing to worry about - the 50Hz component of a simpler power supply would probably interfere with the circuit operation, although I haven't tested this.

Signal Level

In a typical setup, the receiver's output is fed into a microcontroller chip which uses a simple program to decode the received signals and drive an LCD (as in the weather sensor case) or control a motor (as in the garage door opener case). To keep things simple I did not want to include a microcontroller in my setup. Instead, I chose to feed the signal directly into the PC sound card. This choice is somewhat adventurous, because the characteristics of a digital signal designed to be fed into a microcontroller are quite different than those expected by an analog PC sound card input.

The RR3 spec says that the "high" output signal is 3.6V, which is considerably more than the 1V expected by the line input of standard PC sound card. The "low" signal is somewhere between 0 and 0.6V. Driving a +/-1V input circuit with a 0-4V signal may cause problems. It is possible to use a pair of resistors to pass only 1/4 of the voltage to the sound card. Any pair with roughly the ratio of 1:3 in the 10 KOhm range should do.

To be honest, I only thought about this issue after hooking up everything, so the current system connects the RR3-433.92 output directly to the sound card input. This never caused any signal decoding problem (other than some minor effects described later) because we are only interested in telling a high signal from a low one, so truncation of the signal is not important. Such truncation would, of course, cause unacceptable distortion had it been an audio signal. Whether the sound card input will go up in smoke or not us yet to be seen, but after more than six years of continuous operation I tend to believe it won't.

Antenna

For an antenna I used a short (30 cm) wire. Having the correct length would probably make better reception, but I figured this length is similar to the antennas I see in garage doors around the neighborhood, so it should be good enough.

Before committing to any permanent arrangement I wanted to test if I could receive anything at all, so I used alligator connectors instead of soldering the wires.

No additional components necessary
A short piece of wire for an antenna, 4 rechargeable batteries supplying 4.8V and a set of headphones are all that is needed to start listening for the signal.

Making Sense of the Signal

Once everything was hooked up I started listening to the received signals and looking at them using Audacity - an open source and free sound editing software. It took me quite some time to realize that signals in this frequency are invariably very short chirps. This makes a lot of sense, because the same frequency is shared by a large number of transmitter-receiver pairs. It appears that they avoid interfering with each other by being idle 99.9% of the time and transmitting for a very short time when they are activated.

As I listened I grew familiar with the various local signals. I could hear every time the neighbor across the street opened his garage door. I could even tell who is opening the gate in the next building because each person has a different way of pressing the remote control button. After a while one sound stood out: a pair of chirps totaling less than a second that repeated once every 60 seconds. Some experimentation with taking out the sensor batteries and re-installing them confirmed that these chirps are indeed the signal I was looking for.

Waveform of some typical static noise and a pair of chirps
A pair of chirps in a sea of typical static noise. The numbers at the bottom are seconds. You can listen to it here.

In order to analyze the signal I thought at first that I had to figure out what modulation method it used, if any. Then I realized that it doesn't really matter since the RR3-344.92 probably takes care of any demodulation. The signal I was feeding into the PC sound card was almost surely a stream of high/low voltage levels. The supporting evidence to this realization was the mention of a comparator in the RR3-433.92 datasheet. The question now was how these high and low voltages encode the transmitted information.

My first assumption was that it would be a serial protocol similar to that of RS-232, with some baud rate, a start bit and one or two stop bits. I then spent a few days analyzing frequencies and looking for start and stop bits. When nothing made sense I started looking for other possible encoding methods. Collecting statistics on "high" state durations provided the breakthrough. I noticed many short pulses whose lengths formed two clearly distinct groups: short pulses lasting 0.67 mSec and long pulses lasting 1.33 mSec.

Arbitrarily decoding short pulses as "0" and long ones as "1", I recorded the bit sequences from a few dozen chirps and it became clear that each chirp contained a sequence of 44 bits and that these sequences were very similar to one another.

At this point it was possible to lower the audio sampling rate to a level that would be just enough to tell the difference between a "0" and a "1". There was no point in oversampling. I wrote some code that decoded chirps into bits and spent a few hours building a library of bit sequences and what the display unit showed after each sequence was transmitted.

The process of compiling a dictionary that associates transmitted bit sequences with their temperature/humidity meanings was quite tedious. For one, I found that the display unit only listens to the transmission and updates itself once every 5 minutes. To figure out that last fact I had to watch the display while heating and cooling the sensor for long enough. There is also a hint provided by the display itself: once every 5 minutes it shows a small Receiver-On symbol symbol for 60 seconds (My early thought was that this symbol appears whenever the receiver hears the transmission, but that did not make much sense given the short duration of the chirps. Later I realized that the display unit turns on the receiver for one minute every five minutes to conserve power and that the symbol indicates when the receiver is turned on). Keeping the receiver on for 60 seconds ensures it will hear one transmitted chirp.

Another obstacle on the road to compiling the dictionary was that radio signals are notoriously unreliable. Any spark within hundreds of meters, any neighbor opening or closing the garage door, distant lightnings and any number of other mysterious sources of noise easily corrupt the signal and cause the display unit to continue showing the previous numbers. Worse yet, I had no idea what the signal should look like and could not tell a clean sample from a corrupt one. I therefore had to collect several samples for each temperature+humidity combination and even then be prepared to have an unknown number of incorrect samples in the collection.

To collect samples over a range of temperatures and humidities that is wide enough to draw conclusions from, I recorded samples during the day and night at various times and even put the sensor in the fridge, over a boiling kettle and in my lap for a while.

When I had a dictionary of a couple dozen samples that I could reasonably trust, I started looking for possible encodings. Eventually I figured out enough details to make decoding possible. Over time I improved the decoding software and corrected minor details that I got wrong until the software performed as well as, or better than the display unit.

The Findings

  • The two chirps are two identical copies of the same signal, presumably transmitted twice to make the transmission more noise resistant.
  • Each chirp is made of two shorter bursts with a small gap between them. One of the bursts carries the temperature reading and the other carries the humidity reading.
  • Each burst is made of a series of 44 pulses. A pulse is either short, lasting 0.67 mSec, or long, lasting 1.33 mSec. The gap between pulses is always 1 mSec. The shortest possible burst is made of 44 short pulses and would last about 72 mSec, and the longest possible burst is made of 44 long pulses and would last about 102 mSec.
  • The data is encoded in the widths of the pulses: a long pulse means "1" and a short pulse means "0" (this interpretation is of course arbitrary; it can be inverted at the cost of a trivial change in the decoding software). Each burst, therefore, carries 44 bits of data.
  • The 44 bits can be interpreted as 11 groups of 4 bits each, with this structure:

    hh t xx vvv vv c hh - Constant packet header (1111 0101). t - Packet type. 0001 = humidity measurement, 1111 = temperature. xx - Transmitter ID and value parity. ID in the first 7 bits, parity in the last bit (xx = iiii iiip). Parity bit is 1 if the number of 1 bits in the 3-digit value vvv is even, 0 if odd. vvv - The sensor reading, multiplied by 10 and bit-inverted (that is, the value 0010 is represented by 1101, etc. This suggests that my choice of long=1 and short=0 may have been wrong). In humidity, the value is always nn.0 representing relative humidity in the range 0%-99%. In temperature, 50 is added to the temperature in Celsius to allow temperatures in the range -50.0 .. +49.9 to be represented. vv - The high two digits of the sensor value again. c - Sum of all previous digits, plus 9, modulo 16.

  • 44 pulses with their bit and hex values
    By adjusting the input volume of the sound card and keeping the audio cable away from noise sources the signal can be made quite clean. This is a single burst of 44 pulses with their bit and hex values.

    My guess is that similar rain gauge and wind speed sensors transmit the same data format with two other values for t, but I have never had a chance to verify this assumption.

    The transmitter ID is a mechanism intended to create an association between a transmitter and a receiver. When a transmitter is loaded with batteries it selects a random 7-bit value. When a receiver unit is loaded with batteries, it listens for the first few valid chirps it can hear and remembers that 7-bit value. From that moment on, the receiver will ignore any packet that has a different 7-bit ID. This mechanism allows several pairs of instruments to co-exist in close proximity without affecting one another (other than corrupt each other's signals if their timing is unlucky and they happen to transmit at the exact same time).

    Packaging

    The RR3 and the three sockets (antenna, 3.5 mm audio out and power supply) fit in a toothpicks plastic box. The RR3 is so small that there is no need to even attach it to the case - the wires soldered to it are rigid enough to hold it in place. It is easy to cross the output wires, feeding a signal with the inverse polarity to the sound card. Since it is much easier to modify a text file than to swap soldered wires, I included a switch in the decoding program to handle this potential situation.

    The Decoding Software

    At this point I was ready to write the full decoding software. The proof-of-concept code was written in Python. The decoding is done in several phases: first the input samples are classified as high or low levels and some cleanup is done, then consecutive high and low samples are counted to generate a series of 0/1/silence tokens. These tokens are fed into a decoder that looks for sequences of 44 non-silence tokens surrounded by two silence ones. Each 44-bit sequence is translated into 11 nybbles and tested for a valid header, correct parity and checksum and matching sensor values. If a test fails, the sequence is silently discarded. If all tests pass, the sequence is passed on to the next phase which decodes the value and prints it in human readable form.

    The cleanup in the first phase shapes the input a little to undo the damage introduced by the physical path that the signal goes through. The sound card input apparently filters very low frequencies and might cause the first few samples in each burst to have a different level than the rest, as shown in the following picture. Every horizontal line decays exponentially towards zero.

    Waveform decaying towards zero
    Passing a digital signal through an analog path produces exponential decay curves, but the signal is still clear enough to be recognized correctly and decoded.
    Waveform with DC component intact
    With its original DC component, the signal would have looked like this.

    My experience was that it is common for the first few "1" pulses to run together and form a single long pulse, most probably because of the exponential decay curve and the feeding of an unattenuated 0-4V signal into a +/- 1V input. A "low" signal can go as high as 0.6V, which can easily be interpreted as "high". To allow correct decoding of the signal, the first phase also breaks certain large pulse lengths into a sequence of between 2 and 4 "1" tokens.

    Up to four '1' pulses fused together
    At the start of a burst the voltage jumps up and the "low" signal is interpreted as "high", causing a few pulses to be fused. This can be corrected most easily in the decoding software. Next time around I will experiment with installing a pair of resistors to attenuate the signal before feeding it to the sound card.

    It turns out the Python 2.3 prototype was fast enough to consume a mere 3% of the CPU time on an ancient PC with a single core Celeron CPU running at 900 MHz, but I still preferred a compiled C version. After writing the same code in C it now consumes about 1/100 of the original resources. On my present machine it is not even measurable.

    Since I did not want to interfere with my ability to play sounds on my PC, I salvaged the sound card from an old PC and added it to my system. In fact, my PC currently has five sound cards: the on-board Intel hardware, a USB headset, a USB web-cam with integrated microphone, the old Trident card I dedicated to the weather logger, and one weird device that is probably created by the tv/video-capture card driver.

    A short shell script* finds the correct sound card, adjusts its input volume, selects the correct input and runs the capture-decode-log pipeline of commands. I've arranged for this script to be run automatically whenever I turn on the machine.

    * My system runs Linux, so I use shell scripts and I feed the live audio signal to the decoding program using Linux commands. There are ways to achieve the same effect on Windows systems, although probably not as easily. The decoding program, however, will work just fine when compiled and run on Windows because it does not deal with any I/O by itself.

    #! /bin/bash -x # Listen to the weather sensor transmissions, decode them and # append the text form to a file whose name is yyyy/mm-dd # This file can be run at boot time by a sysv-init script: # /etc/init.d/weather-sensor start|stop # or at login time by a symbolic link from ~/.kde/Autostart/ CARD="TRID4DWAVENX" DEVICE=$( cat /proc/asound/cards | grep -F '['"$CARD" | awk '{print $1;}' ) cd /home/me/projects/weather_station amixer -q -c ${DEVICE} sset 'Master' 45% amixer -q -c ${DEVICE} sset 'Capture' 50% amixer -q -c ${DEVICE} sset 'Line' cap amixer -q -c ${DEVICE} sset 'Line' 50% arecord -t raw -D front:CARD=${CARD} -c 1 -f S16_LE -r 11025 -N | ./decode - $1 | python ws_distribute_data.py & disown

    The /proc/asound/cards pseudo file lists all the sound cards installed on the system and allows finding which card was assigned to which device number.

    amixer is a command-line interface to the ALSA mixer. The first amixer command sets the output volume so that I can hear the incoming signal through a pair of broken earphones sitting behind my PC. The other amixer commands select the input channel on the sound card and set its volume to what I found to work best through trial and error.

    The arecord command is an ALSA program that can configure a sound card and capture raw input from it. The captured input is written out as a sequence of 16-bit binary values at a rate of 11025 samples per second. This stream is fed into the decode program, which interprets it and writes out a text line whenever it recognizes a valid chirp. The final step - ws_distribute_data.py - is a script that reads input lines and writes each one to a file whose name contains the current date in a directory whose name is the current year.


    The source code for the C decode program can be downloaded here. The C code does not depend on any library or environment, so it can be compiled and run on any operating system.

    The source code for the quick-and-dirty experimentation Python decode program can be downloaded here.


    Odds and Ends

    The receiver with all wires soldered
    The RR3 receiver with all wires soldered in place:
    • Orange: +5V
    • Black: ground (0V)
    • White: antenna
    • Yellow (left): digital signal out
    • Yellow (right): not connected.
    Signal with weak pulses
    Part of the problem in decoding the signal was that I did not know what I was looking for, and I could not tell a bad signal from a good one. Here is a typical example of an early recording showing a distorted signal in which the short pulses were much weaker than the long ones. Perhaps this was caused by a low sampling rate and some low-pass filtering.
    Signal with broken pulses
    Another example of a bad signal. The nine pulses in this image are hard to recognize because the noise was strong enough to break each pulse into a series of spikes.
    A daily temperature and humidity graph
    A cron job that runs every 15 minutes reads the most recently recorded measurements and updates a nice graph.

    One improvement I was thinking about is reporting the transmitter ID (or ignoring packets with the wrong ID), so I can place an additional transmitter in the freezer to get a log of the temperatures there into a separate graph.