A-LNT: A Wireless Sensor Network Platform for Low-Power Real-Time Voice Communications

Combiningwirelesssensornetworksandvoicecommunicationformultidatahybridwirelessnetworksuggestspossibleapplicationsinnumerousfields.However,voicecommunicationandsensordatatransmissionshavesignificantdifferences,Meanwhile,high-speedmassivereal-timevoicedataprocessingposeschallengesforhardwaredesign,protocoldesign,andespeciallypower management.Inthispaper,wepresentawirelessaudiosensornetworkplatformA-LNTandstudyanddiscusskeyelementsfor systematicdesignandimplementation:nodehardwaredesign,low-powervoicecodecandprocessing,wirelessnetworktopology, hybridMACprotocoldesignbasedonsuperframe,radiochannelallocation,andclocksynchronization.Furthermore,wediscuss energymanagementmethodssuchasaddressfilteringandefficientpowermanagementindetail.Theexperimentalandsimulation resultsshowthatA-LNTisalightweight,low-power,low-speed,andhigh-performancewirelesssensornetworkplatformfor multichannelreal-timevoicecommunications.


Introduction
WSNs (wireless sensor networks) consist of random or planned placed low-power wireless sensor nodes to monitor physical or environmental parameters [1]; these nodes are usually battery powered.WSNs have advantages of low power, low cost, self-networking, and no wiring or additional power supply needed.WSNs have been widely applied in environmental monitoring, intelligent transportation, automatic control, and so on [2].Scientists and engineers are facing new demands and challenges with the deployments of WSN systems: precise real-time personal orientation, medical monitoring systems, access WAN and data fusion, and so forth.Voice communications have a wide range of potential applications in these WSN systems, such as staff regularly patrolling and examining, medical advice and counseling, broadcast and notification, and emergency voice communications.Although the research of WMSN (wireless multimedia sensor network) [3] has been carried out for ten years, high power and bandwidth requirements limit WMSN development [4].In present, most WMSNs platforms are based on 802.11 platforms [5][6][7] and powered by highcapacity batteries or extern power supply.
In the last few years, multimedia components have become common and cheap with the rapid development of MEMS (microelectromechanical systems) and mobile Internet.Meanwhile, it becomes possible to achieve lowpower and low bandwidth voice communication in WSNs with compression ratio improvement and power consumption reduction [8].However, there are several problems that must be solved in combining voice communications with traditional WSNs: first, voice communication data and WSN data are significantly different in transmission features; audio data are real-time, which is different from ordinary WSN; voice communication needs much longer duration than sensor data transmission, meanwhile it occupoes the channel in communication.Moreover, audio sensor nodes are power-hungry and occupy massive bandwidth in data transmission.Furthermore, frequent high-speed real-time transmission of audio data poses challenges for WSN radio channel management, protocol design, hardware design, and energy management.

Related Works
There are a few suitable protocols and platforms for WASNs (wireless audio sensor networks) [9] at present [10][11][12][13][14]. Gabale et al. present a TDMA-(time division multiple access-) based MAC protocol LiT [10] and implement the MAC on an 802.15.4 platform Tmote; the evaluation of LiT shows quick flow setup, low packet delay, and essentiality for real-time applications; however, the speech coder chosen is G.723.1, and the coding bit is 6.3 kbps, but the speech codec power consumption is not considered; in 2012, a Lo 3 system based on LiT was reported [15], and it showed that such system bodes well with cost and power constraints in rural regions; the audio codec is SPEEX [16] with data rate of 5.9 Kbps, and the expected lifetime is 5 days.For most WSN applications, we need longer lifetime.
Li et al. study the audio element detection method in audio sensor network [11], and audio is treated as a special sensing data.In [12], speech codecs for high-quality voice over ZigBee applications are discussed.Zhao et al. design and implement an enhanced surveillance platform with low-power WASN; three kinds of audio sensors are discussed [13].A voice network protocol based on session initiation protocol using TDMA/TDD MAC protocol using an IEEE802.15.4 PHY is present in [14], which is suitable for voice communications in both small-and large-scale networks.Further research of this group on full-duplex voice mixer for multiuser [17] and multiuser voice communications [18] is carried on.
Most of the above solutions are based on IEEE802.15.4/ZigBee protocol and an 8-bit RF SOC CC2430/CC2530; the ZigBee protocol is complex and huge, and the full protocol stack requires more than 100 Kbytes of flash and 5 Kbytes of ROM in CC2430/CC2530.

Hardware Realization
A-LNT is a WSN platform.It has the typical characteristics of WSNs: low power, self-organizing network, environmental parameter monitoring, and reliable data transmitting.Meanwhile, the platform could carry real-time voice communications without affecting sensing data transmissions.There are three types of nodes in our designed network: a central node (CNODE) for network establishing and management, sensor nodes (DNODEs) which are wireless terminals placed on target position or person for environmental monitoring and physiological parameters monitoring, and audio sensor nodes (ANODEs) which are wireless sensor terminals with audio communication functions.CNODE and DNODEs are typical nodes in WSNs, and ANODE is a new type of DNODE introduced by us for voice communications.All nodes are constructed by MCU, power management unit, RF transceiver, voltage monitoring unit, sensors, and batteries.ANODE and CNODE have additional parts for audio communications: audio processing unit, display unit, and user input unit.In order to simplify hardware design and embedded software programming, we choose the same MCU and RF transceiver for all nodes.
The MCU chosen is MSP430F2618, which is a 16-bit ultralow-power RISC MCU; the MCLK is up to 16 MHz, and the wake-up time from low-power mode to active mode is less than 1us, which is suitable for dealing with frequent audio contents.There is an 8-channel 12-bit ADC (analog-to-digital converter) with internal reference, an internal temperature sensor, and 4 USCIs (serial communication interfaces) available in the chip, which could meet sensor interface needs for most WSNs.At present, we use the internal temperature sensor and one-channel ADC for voltage monitoring; there are 6-channel ADCs that are available for additional sensors.We chose CC2500 as the RF transceiver.It works at the ISM band of 2.4 GHz to 2.4835 GHz.The maximum wireless speed is 500 Kbps, and the current consumption is 17.0 mA at RX states, 21.1 mA@0 dBm at TX states, and 400 nA at sleep states.
DNODEs and ANODEs have different supply schemes.DNODEs are connected to batteries directly.It is an ideal way to power low-cost, low-power DNODEs as no energy loss is introduced by the power management unit.However, it is not suitable for powering ANODEs as audio codec and audio amplifier require low noise and a stable power supply.A high PSRR (power supply rejection ratio) LDO is necessary for ANODEs.Directly connecting LDO to batteries will increase current consumption and reduce available battery capacity.So a high performance step down DC-DC converter TPS62203 is added to the circuit in order to improve efficiency.
In practical application, users may want to turn off the terminal equipment when they finish their work.We use a low BISS (VCEsat) transistor PBSS5320T from NXP semiconductors and a small signal PNP transistor 9014 to realize a load switch.Although PMOS (P-Channel MosFet) transistors are popular in load switch designs, we chose a BISS transistor because it is ESD insensitive and has a constant VBE about 650 mV.So the voltage measurement circuit is easy to realize.The power control circuit workflow is as follows: when the batteries are connected to the board, the BISS transistor is off; when the tact switch 2 is pressed, the BISS transistor is ON, then the MCU turns on 3, the board works normally, and the MCU monitors the voltage between 4 and 5.When 2 is pressed or the batteries voltage is lower than the threshold voltage for 30 seconds, the MCU turns off 2 High PSRR LDO and the board is powered down.Figure 1 shows the power management circuit schematic.
The last important part in hardware design is the audio processing unit.The audio processing unit consists of audio codec, microphone, audio amplifier, and communication interface.The audio codec algorithm should satisfy the following requirements: low power, low bit rate, and robustness for wireless communication.The CVSD (continuously variable slope delta modulation) algorithm meets all the above requirements and is an ideal solution for wireless voice communication; even the error bit ratio reaches 10%, and the MOS (mean opinion score) is greater than 3. CVSD algorithm is a simple algorithm based on PCM (pulse code modulation) algorithm, which has been widely applied in digital voice conferences and digital cordless telephones [19,20].The codec chip chosen is CMX649 [21], which is a low-power fullduplex codec; the typical operating current is 2.4 mA at 3.0 V, and the codec bit rate is 15.625 Kbps in the design.The codec transmits audio contents through SPI with MCU.
By now, we have finished the test-board hardware design.All test-boards in A-LNT are on a 2-layer FR-4 PCB where the board thickness is 1 mm.The DNODE test-board is a tiny board that consists of MSP430F2618, CC2500, 2 AAA batteries, and respective peripheral circuits.ANODE testboard is shown in Figure 2. The CNODE uses the same board as ANODE with different embedded software.We will discuss A-LNT MAC protocol design in the next section.

Protocol Design and Algorithm Realization
We divide the A-LNT protocol and software design into 3 parts: network topology, MAC protocol, and network management.Network topology is the foundation of the entire protocol design; it determines radio channel allocation and data transmission management strategies.MAC protocol is the most important part in A-LNT software design, which consists of hybrid channel access and management based on superframe, clock synchronization design, address filtering, address allocation rule, and packet priority setting.We will start from network structure design.

Network Structure Design. As we have mentioned above,
A-LNT is based on WSN; it has typical characteristics of WSNs: low power, self-organizing network, environmental parameter monitoring, and reliable data transmitting.Meanwhile, the platform could carry out real-time voice communications without affecting sensing data transmissions.Sensing data requires reliable transmission, but latency is not critical.We ensure correct data transmission by applying the acknowledgment mechanism in WSNs.Voice communication is real-time, which needs strict clock synchronization.In addition, the real-time requirements and a large number of data transformations in voice communications determine that the network topology and protocol must be simple and efficient.Some packet loss and data error are acceptable in voice communications; acknowledgments are unnecessary and will degrade performance.
In a word, in order to meet the requirements of voice communications and sensing data transmission at the same time, the MAC protocol should be clock synchronous and ensuring two types of data noninterference.We designed the wireless network structure with considering the above factors; the network has a star topology as shown in Figure 3. CNODE is in charge of network management, nodes management, and clock synchronizing.DNODEs measure environmental parameters and upload data to CNODE periodically.ANODEs upload sensing data in the same way of DNODEs.A-LNT supports three types of voice communications: in most conditions, voice communications between ANODEs should be peer-to-peer (P2P) in order to reduce wireless transmission pressure; if two ANODEs are too far to communicate directly, audio packets could be forwarded by CNODE (PCP); the last voice communication type is voice conference (VCF); in this mode, only one ANODE or CNODE is active while all other ANODEs are listening at one moment.

MAC Protocol Design.
In order to realize the MAC protocol, we should determine clock synchronization frequency and the superframe time at first.In wireless multimedia networks, TDMA is an efficient and popular method to ensure QoS (quality of service) and maximize the use of wireless bandwidth [22,23].The transmitter sends multimedia contents at assigned time slot, while the receiver is listening, so they must be synchronous.Clock synchronization is critical in TDMA mechanism [24].All nodes in A-LNT are synchronized when they join the network.However, the clock error will increase over time, which is caused by crystal tolerance and MCU clock accuracy.The clock error would lead to radio channel conflict, transmission failure, and system error.So periodic clock synchronization is necessary to maintain that the WASN works normally.The synchronization frequency should be a tradeoff decided by audio processing period and clock error.For a low-cost ±20 ppm crystal, in the worst case, the time error will reach 2 ms in 50 seconds; for a ±5 ppm crystal, the time will be 200 s, which is about 3 minutes.
The superframe time is decided by the audio sampling period and wireless period.In A-LNT, the codec bit rate is 15.625 Kbps; the audio codec generates bit stream continuous to MCU and demands the same number of bits from MCU in voice communications.In order to reduce complexity and power consumption, encoded audio content bytes generated by the codec should be less than the TX buffer size, which is 64 bytes.We design the superframe period  as 20.48 ms, which is also the audio sampling period; 40 bytes are sent to MCU in one .In programming, we simplify operation by sending data to the codec; when receiving data from the codec, no additional timer or synchronization is required in this way.The audio data processing is mainly finished in the SPI interrupt function; in the interrupt function, MCU sends 1 byte decoded audio content when it receives 1 byte encoded audio from the audio codec.
ANODE sends encoded audio data periodically and receives wireless audio data from another ANODE in voice communications.In order to reduce hardware requirement and guarantee communication quality, we introduce four data buffers for cross access; two buffers are for storing received audio data, and the other two buffers are for storing encoded audio data.The maximum voice time delay is less than 2 times of .
The superframe is divided into several time slots for sensing data transmission, network management packet transmission, and audio data transmission.The number of time slots is decided by the packet processing time.In order to get precise packet processing time, the packet processing time model is introduced as: where  in network managements and it needs about twice the time; the other reason is that when the network works on lowspeed mode, nodes do not have precise high-speed crystal or main clock; they need longer time for safety time interval and packet processing.
In multimedia communication applications, multimedia sensor nodes should be clock synchronized precisely with each other.However, high precision requires higher MCU clock, more expensive hardware, more current consumption, and shorter battery lifetime.In order to reduce power consumption, the audio processing units are shut down after voice communication finished and a superframe-based hybrid MAC protocol is introduced.
This MAC protocol mechanism consists of 4 key components: (1) the superframe is derived into data subframe and voice subframes.DNODEs listen to radio channels and send data only in data subframe; voice communications are carried out only in voice subframes.(2) The network adopts low time synchronization accuracy and lower node MCLK (main clock) to reduce energy consumption.The data subframe times are automaticly adjusted with network loads and CSMA/CA mechanism is adopted to manage radio channels.(3) When there are audio data transmissions, node MCLK is increased to work in full-speed mode and adopts high-precise time synchronization to ensure network performance.(4) Radio channels are allocated by center node using TDMA mechanism in voice subframes.
In detail, A-LNT consists of 1 CNODE, up to 16 ANODEs, and 64 DNODEs and an optional computer.The superframe  is 20.48 ms, and high main frequency is 16 MHz; the superframe is divided into 1 data subframe ( 0 = 6.08 ms) and 6 voice subframes ( 1 - 6 = 2.4ms).When there are voice communications, ANODEs and CNODE listen to radio channels and send encoded audio content at specified voice subframes, and all DNODEs are in sleep.This platform supports up to 3-way P2P voice communications or 1-way PCP voice communication or a VCF including CNODE and all ANODES; the typical voice delay is less than 10 ms, and the time delay is less than 40 ms in the worst case.32T compose a management cycle TT, the data subframe in the first superframe  0 is a CNODE slot for network management.The data subframes in  1 - 16 are DNODE slots for network managements, network heart-beating, and sensing data transmissions.The data subframes in  17 - 31 are voice communication control and management slots.Every 6.55 s (i.e., 320T, the worst error at 20 ppm is 256 s), CNODE broadcasts a polling packet POLL; all nodes should receive the packet, get time information, and adjust time to be in agreement with CNODE and then send reply packet ACK POLL in specified slot, if there are sensor data need to be upload the nodes send them simultaneously by sending packet ACK POLL DATA instead of ACK POLL; CNODE should send reply packet DATA REVED to the node after receiving the packet immediately.For example, ANODE sends POLL at TT0. Nodes number 1-16 should send reply ACK POLL/ACK POLL DATA in TT0, nodes number 17-32 should send reply ACK POLL/ACK POLL DATA in TT1, and all replies should be finished in TT3.If CNODE did not receive ACK POLL/ACK POLL DATA from one node for three successive management cycles, CNODE would delete the node.Priority design rules are as follows: CNODE has the highest priority, DNODEs have the highest priority in allocated slot, and other data are sent sequentially according to the priority within the data subframe.The high-speed crystal is shutdown when there is no voice communication, the node MCLK drops to about 2 MHz, and TT increases to 65.5 s (i.e., 3200 T; the worst error at 20 ppm is 2.56 ms).All nodes wake up in  0 when the cycle is CNODE spooling cycle and go to sleep until it is time to send reply packet.When new node appears, it applies channel through CSMA/CA mechanism.
Address filtering is applied in A-LNT to reduce system total power consumption.In wireless network, all active nodes listen to radio channels; in most cases, only one node is the target node; other nodes receive useless packet and waste time to unpack and handle it.Address filtering is introduced to reduce wireless data processing time, which means that the wireless packet is unpacked and handled when address is matched; otherwise, the packet is abandoned.Address filtering can reduce processing time of that complete reception.This adaptive hybrid channel allocation method is an effective solution to the contradiction between multimedia communications and system power consumption.Figure 4 is the basic scheme of A-LNT channel allocation and address filtering protocol.
The other important pieces of information about A-LNT are as following.
For more information about packet types, refer to Figure 5.
The state machine of ANODE is shown in Figure 6.ANODEs are sleeping in most times and go to listening mode at specified times.Listening mode could go to three voice communication modes, and audio codec only works in voice communication modes.In our design, the work is mainly done by CNODE.It chooses an available radio channel and waits for radio packet receiving events.When a packet is received, CNODE does corresponding operation according to packet type.The network management process pseudocode is as shown in Algorithm 1.

Network Management Protocol
The simplified flowchart of node joining is shown in Figure 7.
In order to join A-LNT, wireless node should seek an active CNODE, send APPLY to CNODE, get time information, and synchronize with CNODE.The process is as shown in Algorithm 2.
The simplified flowchart of node joining is shown in Figure 8.
By now, the A-LNT MAC protocol design is finished; it is simple and efficient, it consumes limited RAM and flash resources, and the details information is the following: CNODE: 17 KB ROM, 0.5 KB RAM; ANODE: 11 KB ROM, 0.4 KB RAM; DNODE: 2.6 KB ROM, 0.1 KB RAM.We would not discuss voice communication details here and have attached it to the end of the paper as it is tedious.In the next section, we will carry out experiments to verify A-LNT platform performance and discuss the results.

Results and Conclusion
At first, we measured the operating currents of three types of nodes (Table 1).The current consumption is mainly determined by audio unit and radio unit.It could be lower through reducing output power and receiving sensitivity of CC2500, turning the volume down of earphone, and powering down the LCM.
Then, we have studied the batteries lifetime in theory.In order to simplify calculation, we assume that the battery maintains OCV (constant open circuit voltage) 1.5 V and RI (the internal resistance) 150 mΩ.Battery capacity  is 2300 mAh.The batteries are three alkaline batteries in series.The ANODE currents vary with operation mode: TX, RX, sleep, and audio.Average TX time TX is 2 ms every 1000 T; the RX time RX is 3 ms every 1000 T. For boards with only LDO, the battery lifetime is given as follows: where ldo is supply current of LDO; in the design the LDO is XC6204B30 and Ildo is 70 uA.
For boards with DC/DC converters, battery lifetime is given as follows: where and  out = 3.3 V,  = 95%.Figure 9 shows the calculation results.The DC/DC converter extends battery lifetime by greater than 29%.If voice communication time is 30 minutes per day, ANODEs could work for more than 60 days without changing batteries.It is also possible to serialize more batteries or use high voltage batteries extending node working time.
The minimum input voltage is calculated by the following equations: in min =  out max + IL max × ( ds (ON) max +RL) , where  out max = 53 mA;  out = 3.3 V;  = 10 H;  = 1 MHz;  ds (ON) max = 670 mΩ; the power inductance is CDRH5D28NP-100N from Sumida, and RL = 65 mΩ.The result shows that  in min is less than 3.4 V, for three alkaline batteries in series; the node could extract almost all energy.
The load switch circuit simulation is carried out by TINA-Ti 9.0; the results are shown in Figure 10.The current consumption of BISS transistor is about 255 uA, and the VBEsat is about 50 mV when the load current is 50 mA.Node shutdown current consumption is only 2.21 uA.The power management circuitry has virtually no impact on the node power consumption.The communication distance between ANODEs is longer than 70 meters indoors and 120 meters outdoors.The results are measured under the following conditions: line of sight, being about 1.5 meters above ground, and the nodes being placed on a table or carried by person.
At last, the address filtering performance is studied by measuring RF packet processing times.We measure packet processing times with a pair of nodes.Node A sends the same packets every 100 ms for 200 times.At the same time, node B stays in RX states when the RF packet is detected; the timer starts counting until the packet is received and processed, and then the timer count is stored into an array.At last, the processing time is calculated by averaging all 200 counts.The processing times of different packet lengths are measured as shown in Table 2.
The timing accuracy is 5 us in the above measures.The total processing time saving with parameters in Table 2 versus number of nodes is shown in Figure 11.
Where B means bytes, HS means "high speed" and LS means "low speed." It can be seen from Figure 11 that the processing time of all active nodes in WSN reduces with the node number increasing and packet length increasing.When the number of ANODEs reaches 6, the time spent is reduced to 20% of that without address filtering.It is an efficient method to reduce network power consumption.
In conclusion, we have presented a low-power WASN platform A-LNT from hardware realization to protocol design.The network has a star topology and three voice communication modes: P2P, PCP, and VCF.The audio codec is CVSD 15.625 kbps; cross accessing and data buffer pool    A.2. PCP Mode.PCP mode is similar to P2P mode; the differences are that CNODE allocates 4 time slots and all audio contents are forwarded by CNODE.The simplified flowchart is shown in Figure 14.
The schematic diagram of P2P and PCP is shown in Figure 15.
A.3.VCF Mode.VCF is initiated by CNODE.Any ANODE that wants to speak should send VCF ASK command to CNODE; CNODE checks if there is an active speaker and sends respect reply to the asker.The VCF could be ended only by CNODE.The simplified flowchart is shown in Figures 16  and 17.
The schematic diagram of P2P and PCP is shown in Figure 18.

Figure 4 :
Figure 4: A-LNT channel allocation and address filtering diagram.

Figure 9 :
Figure 9: Battery lifetime versus voice communication time.

Figure 11 :
Figure 11: Time spent versus number of nodes.
send is the sending packet processing time;  rev is the receiving packet processing time;   is the transmitter MCU processing time;   is the radio sending time;   is the receiver MCU processing time;   is the wireless receiving time;   is the packet payload length in bits;   is the preamble bits, sync word, and other data inserted automatically by CC2500; and  is radio transmission speed.The packet processing time is decided by MCU main clock, data transmitting speed, packet length, and radio transmission speed.During voice communications, MCU main clock is 16 MHz, SPI speed is 4 Mbps, audio packet length is 46 bytes, and wireless speed is 500 kbps. send is 1.95 ms, which includes synthesizer calibration time 721 us.So the time slot for voice communication should be longer than 1.95 ms, and the time slot for network management and data transmission should be much longer than audio time slots for two reasons: at first, acknowledgment is usually necessary Realization.The last part of MAC protocol design is network management.The network management protocol takes on the role of the following:

Table 2 :
Processing times of different packet lengths and hardware.
if ACT NODE, play encoded voice; else play voice data in PLAYBUFF.