The most recent network technologies are enabling a variety of new applications, thanks to the provision of increased bandwidth and better management of Quality of Service. Nevertheless, telemedical services involving multimedia data are still lagging behind, due to the concern of the end users, that is, clinicians and also patients, about the low quality provided. Indeed, emerging network technologies should be appropriately exploited by designing the transmission strategy focusing on quality provision for end users. Stemming from this principle, we propose here a context-aware transmission strategy for medical video transmission over WiMAX systems. Context, in terms of regions of interest (ROI) in a specific session, is taken into account for the identification of multiple regions of interest, and compression/transmission strategies are tailored to such context information. We present a methodology based on H.264 medical video compression and Flexible Macroblock Ordering (FMO) for ROI identification. Two different unequal error protection methodologies, providing higher protection to the most diagnostically relevant data, are presented.
The most recent network technologies are enabling a variety of new applications thanks to the provision of increased bandwidth and better management of Quality of Service. Nevertheless, telemedical services involving multimedia data are still lacking behind, due to the concern of the end users, that is, clinicians and also patients, about the low quality provided. This is in particular true in the case of wireless and mobile telemedicine services. Wireless and mobile telemedicine underpins applications such as the transmission of video data from an ambulance, the rapid retrieval and remote display of video data stored in hospital databases, remote (first-level) diagnosis in rural areas, for example, robotic teleultrasonography and telesurgery.
One of the key challenges is the ability to stream medical video over wireless channels. Although wireless multimedia telemedicine services have been proposed before in [
For instance, cardiac ultrasound loops require a very large bandwidth. In diagnostic cardiology it would be desirable to store approximately 30 seconds of dynamic heart images per patient (i.e., three sections of the heart and 10 seconds for each section). Even if frames are digitized with
Medical video compression techniques are thus required. For telemedical applications, such techniques must offer high fidelity in order to avoid the loss of vital diagnostic information. To achieve this, lossless compression techniques are often considered, but have the disadvantage of low-compression rates. Therefore, when transmission is over band-limited and error-prone channels, a compromise must be made between compression fidelity and protection and resilience to channel errors and packet loss. It has been estimated that lossy compression ratios of 1 : 5 to 1 : 29 do not result in a lowering of diagnostic accuracy [
However, we can consider three types of lossless compression: information lossless compression, perceptually lossless compression, and diagnostically lossless compression. The first one is limited by the entropy (mean information) of the source; the second is such that losses are not perceived by the human eye; the latter is such that the diagnosis made on the basis of the image/video sequence is not affected by compression.
In this paper, our goal is to achieve diagnostic lossless compression and transmission of medical video. We focus on ultrasound video, although some considerations are still valid for different type of sources.
We propose to exploit all the available context information (type and goal of ongoing/stored examination, status of the patient, transmission scenario) to design an appropriate transmission system for diagnostically lossless ultrasound video transmission over WiMAX systems. For instance, coronary heart disease can be diagnosed by measuring and scoring regional motion of the heart wall in ultrasound images of the left ventricle of the heart [
The paper is structured as follows. Section
In the last few years wireless Metropolitan Area Networks increased momentum. IEEE 802.16/WiMAX (Worldwide Interoperability for Microwave Access) [
The IEEE 802.16 standard offers broadband wireless access over long distance. Since 2001 WiMAX has evolved from 802.16 to 802.16d for fixed wireless access, and to the new IEEE 802.16e standard with mobility support [
The latter is generally referred to as mobile WiMAX. Mobile WiMAX adds significant enhancements, including improvement of NLOS coverage by utilizing advanced antenna diversity schemes and hybrid automatic repeat request (hARQ); the adoption of dense subchannelization, thus increasing system gain and improving indoor penetration; the use of adaptive antenna system (AAS) and multiple input multiple output (MIMO) technologies to improve coverage; the introduction of a downlink subchannelization scheme, enabling better coverage and capacity tradeoff. This brings potential benefits in terms of coverage, power consumption, self-installation, frequency reuse, and bandwidth efficiency. The 802.16e standard encompasses five Quality of Service classes for different types of traffic/applications.
In particular, for medical applications in emergency areas it is important to have an easy setup of the infrastructure. At the same time, QoS is critical in medical applications, thus proper prioritization and scheduling policies should be adopted in order to enable reliable and high-quality transmission of possibly critical medical data. In [
WiMAX has dramatically improved with respect to previous systems in terms of features which are critical for medical applications. High end-to-end quality; Robustness and Reliability: the system cannot break down under stress and the connection cannot be lost; Security: transmission of medical data should be secure and privacy of medical data must be preserved, medical data or patient identification cannot be disclosed indiscriminately; the fact that different health care providers have different access rights has to be considered.
However, the baseline WiMAX scheme lacks error protection beyond PHY/MAC and unequal error protection is not considered at PHY/MAC, hence video sequences can be largely affected by errors and packet losses. In order to improve the video quality, strong channel coding should be used at PHY layer. This would result in low-spectral efficiency. In addition, if unequal error protection is not available, the video quality will degrade significantly when a mobile subscriber station (MSS) experiences shadowing fading, temporal fading or interference. The idea of unequal error protection is to apply more robust channel coding to more important video content. Therefore, the MSS can at least decode some important video frames, for example, I frames and diagnostically important content.
For this reason we propose in this paper to adopt an unequal loss protection strategy at the application layer, to improve packet error resilience for ultrasound video sequences transmitted over a WiMAX system. The advantages of an unequal loss protection at the application layer are mainly the availability of detailed source information at this layer (no need to pass such information through the OSI protocol stack) and standard compatibility (PHY/MAC layers are standardized in WiMAX).
Teleultrasound systems for remote diagnosis have been proposed in the last ten years [
More challenging scenarios include Ultrasound guided remote telesurgery [
In [
Some projects and demonstrations are ongoing on multimedia telemedical application through WiMAX systems. The goal of the European IST project WEIRD [
The goal of most of the aforementioned projects is/was to demonstrate the transmission of medical data over a standard network, with no effort to tailor the characteristics of the transmission system to the specificity of the transmitted data.
One of the first works addressing the need of taking the specific characteristics of the medical application into account in the design of the transmission system was [
The importance of considering a specific cross-layer strategy designed with the goal of maximizing the diagnostic quality of the received information was first identified in [
Since most networks deal with a limited amount of bandwidth, scaling techniques are introduced to send less data over the network with as little inconvenience as possible for the user. One of these techniques is region-of-interest coding (region of interest (ROI)). ROI divides an image into multiple parts, the most important part typically being the one the user is observing, called the ROI.
ROI coding can be used to encode objects of interest with a higher quality, whereas the remainder of the image can be regarded as background information and can be encoded more coarsely. The advantage of this method is that the image parts that the viewer is looking at can be transmitted with a higher quality. The result is that the overall viewing experience remains satisfactory, while the transmission can be performed at lower bitrates.
Another advantage of ROI-coding is that ROIs can be transmitted first. This can be realized by the use of slices (e.g., if slice-group-0 is transmitted first, by placing the ROI in slice-group-0, it should arrive first at decoder side). When network congestion occurs, the probability of having a frame that contains at least something the viewer most likely wants to see, is higher with ROI coded imagery than without ROI.
The ROI can be defined by the user (e.g., clinician) by means of a mouse click, by making use of an eye tracking device or can be predicted, based on content recognition algorithms.
Medical video sequences typically consist of an area which is critical for the diagnosis and a surrounding area which provides the context, but is not critical for the purpose of the diagnosis. ROI coding appears thus as a natural methodology for medical video and ROI definition can be performed according to contextual information, either automatically or by the clinician.
Different image and video coding standards enable ROI definition, under different names. The image compression standard JPEG2000 [
In the more recent H.264 standard [
Indeed, the concept of ROI is not often exploited in the design of compression and transmission strategies. Reference [
The following section describes a useful way to implement ROIs in the H.264 video coding standard.
The H.264 standard [
A video frame consists of macroblocks which can be grouped into slices. A slice contains at least one macroblock and it can include all the macroblocks in the video frame. Using FMO, groups of macroblocks consisting of one or more slices, known as slice groups, are formed according to a specific strategy. FMO was mainly developed with the goal of improving error concealment. The FMO mode, in conjunction with advanced error concealment methods applied at the decoder, maintains the visual impact of the losses at a low level even at loss rates up to 10%. Apart from predefined patterns, fully flexible macroblock ordering (explicit mode) is also allowed, where the macroblock classification can be changed dynamically throughout the entire video sequence based on the video content. Examples of slice groups obtained through the FMO tool are reported in Figure
Examples of FMO patterns.
The idea behind FMO is that if a slice gets corrupted, and the macroblocks within this slice are dispersed across the frame, it will be easier to conceal the lost macroblocks than in the case they are contiguous.
However, according to our experience, in the case of medical video error concealment is not necessarily beneficial, since it may hide important irregularities present in the original video.
For this reason, in this paper we consider FMO as a means to perform ROI implementation in H.264 and not with the purpose of error concealment.
The standard includes seven modes to map macroblocks (MBs) to a slice group and we will consider in the following the explicit mode (Type 6), allowing the user to associate each of the macroblocks to a slice group independently. The pattern information is included in the Picture Parameter Set (PPS).
The FMO tool has already been used by a few authors for the purpose of ROI definition and unequal error protection. In [
Due to the characteristics of video coding methodologies and standards [
Examples of cross-layer methodologies include: rate control [
In [
In this paper, unequal error protection is performed at the application layer through erasure codes; on one side UEP at the application layer keeps compatibility with the WiMAX standard, since MAC/PHY layers do not require modifications; on the other side, the use of erasure codes allows the recovery of lost packets at the application layer, where the use of bit-error correction codes would be useless, since lower layer protocols remove packets with erroneous bits, unless MAC-lite [
This section presents a detailed formulation of the problem. The reader can refer to Table
Summary of important symbols used.
Symbol | Definition |
---|---|
Slice index | |
ROI index | |
Frame index | |
Length of slice | |
Average slice length | |
Max slice length | |
Number of ROIs | |
Number of frames per GOP | |
Number of slices in ROI | |
Number of pixels in ROI | |
Quantization parameter(s) for slice | |
Service class | |
Generic probability of packet loss | |
Residual probability of packet loss after RS coding | |
Probability of packet loss for service class | |
Generic mean error burst length (in packets) | |
Transition prob. | |
Transition prob. | |
Mean error burst length for service class | |
Encoding bits for slice | |
RS symbol size | |
RS code block length (in | |
RS data block size (in | |
RS code rate | |
RS code rate for service class | |
Transmission rate for service class | |
Transmission time per frame | |
Luminance of pixel | |
Luminance of pixel |
(a) Cardiac ultrasound image with ROIs (manually selected) highlighted. Three regions of interest, (b) Cardiac ultrasound image with ROIs highlighted. Two regions of interest.
The automatic detection of the fan area is not trivial, as the position and size of it varies in different frames and from clip to clip. Although a fan-shaped mask can be detected for each frame, we assume the fan area is uniform across all frames. We therefore construct a fan-shaped mask by finding the union of the individual masks identified. It is possible to adopt a similar procedure for clip-to-clip variations, by identifying a “universal” mask.
With the purpose of a context-aware design of the compression and transmission scheme, we identify three ROIs in each ultrasound video sequence (see Figure Diagnostically most important area identified by the clinician (see, e.g., Figure Fan-shaped sector (see Figure Black background with patient data and in some cases the associated ECG waveform.
In the following, we will also consider ROI 2 and ROI 3 jointly processed, as in Figure
In particular, ROI 1 is selected by the medical specialist according to context information such as type of examination and a priori knowledge on the disease to diagnose. ROI 2 can be selected automatically.
We consider two alternative options for compression and transmission of ROI 3. We assume we extract the information in the background prior to transmission. Information in the background is typically text data, for example, about the patient, the instrument used in the examination, and the section of the organ visualized. The associated ECG wave can also be displayed in the background area, with the ECG sample corresponding to the visualized image highlighted with a bar in the waveform. This information can be extracted prior transmission and both text and the ECG waveform can be separately compressed. When DICOM standard is used, such information can easily be separated from the rest of the image. We do not extract such information from the background prior to transmission and we transmit ROI 3 as a separate ROI or in the same transmission class as ROI 2.
In the first option, data and ECG waveform are separately encoded. When there is no requirement for high resolution for the diagnosis of a specific disease, ECG waveform is typically sampled at 360 Hz with a resolution of 11 bits per sample. In some cases the information from different (up to eight) channels obtained from different leads is needed. The waveform of a single channel occupies
When such information is removed and separately encoded, and application layer FEC is adopted, we propose we embed such information in the padding bits needed to have a regular code structure (see Section
Note that synchronization between ECG data and ultrasound images is important, in order for the specialist to correlate the visualized image with the corresponding wave in the ECG signal. The ECG signal provides context information to the medical specialist. For instance, it is essential for a specialist to synchronize the measurement of the diameter of vessels to the R-wave spikes in the ECG trace, to eliminate the effects of periodic changes in diameter caused by the normal changes in blood flow with every heartbeat.
We assume that we compress our medical video sequences according to the H.264 video coding standard, with the aid of the Flexible Macroblock Ordering (FMO) tool for encoding separately the different ROIs. We also assume that each ROI
The transmission channel is characterized by a set of parameters (such as packet loss rate, loss burst length, etc.) as specified in Section
The total transmission time per frame can be calculated as
In the example results reported in Section
The total transmission time per frame can be calculated as
We model here the loss pattern as a two-state Gilbert channel. The Gilbert two-state channel model [
Such a model, depicted in Figure
Gilbert channel model.
In particular:
The transition matrix is given by
We present in the following our application layer unequal error protection strategy. The use of Reed-Solomon (RS) codes is described first, the global UEP strategy adopted follows.
We consider the use of Reed-Solomon (RS) codes for application-layer FEC. When FEC is used at the application layer, it is necessary to apply erasure codes across video packets; the WiMAX MAC layer discards the whole MAC frame in the event of an error, that is, the erroneous frame at the receiving MAC is never passed on to the higher layer. Therefore, if RS coding is applied within a single packet at the application layer, the erroneous packet will not be available for error detection or correction at the application layer.
Similar to [
Each video or parity packet is transmitted via RTP/UDP/IP and an 802.16e MAC frame; if this frame is discarded at the receiving 802.16e MAC layer due to channel errors, this results in a symbol erasure at the RS decoder in the application layer. The RS decoder at the application layer can correct up to
This FEC scheme introduces delay due to two events. First, the interlacing operation requires that
Note that, since the RS code is systematic, it is not necessary to buffer packets to form RS codewords, but the information symbols can be transmitted directly if a local copy is kept to form the parity check symbols. These computed parity check symbols can then be sent immediately after the information symbols, eliminating interlacing delay at the transmitter. The total interlacing delay would then be the delay at the receiving end alone.
Every data block has its own block sequence number, which is useful at the receiver side, since it provides the RS decoder with the position of the lost block. The RS decoder can then recover up to
If at least
We propose to provide a high protection to the most significant ROI for the purpose of diagnosis (ROI 1) and a lower protection to ROI 2 and the background. Patient data/ECG can either be transmitted as data and compressed ECG in padding bits of ROI 1 and thus strongly protected, or transmitted in ROI 2.
We propose RS coding is performed GOP by GOP; an RS block will include data from no more than one GOP.
For the selection of the RS block size, the erasure correction capability of the code and the slice size have to be defined first. The selection of slice size
Note that, with the assumptions above, the MAC PDU size is
The selection of the coding rate
Instead of considering models for the impact of losses in the different regions on the global distortion, as typically done, in this case we give priority to context information for taking decisions on the protection rate, the relative importance of the region of interest with respect to the background is different for different types of examinations and we propose that this weight is provided by the clinician and considered for the selection of the protection rate of the different ROIs.
After application layer unequal error protection, the total number of bits per frame is
It is common practice in video transmission to “conceal” the effect of errors at the receiver side by, for instance, interpolating from neighbouring data in time and space. In the medical field, this practice may not be desirable when a medical doctor is performing a diagnosis. Such concealment practice could be misleading since in this case the specialist cannot factor into his or her decision an awareness of missing and potentially important data.
For this reason, we propose that concealment is applied seamlessly only in ROI2 and ROI3 in order to smooth the not diagnostically important ROIs. Although concealment is applied in ROI1, we propose to inform the specialist that a specific MB has been concealed by highlighting concealed MBs in the portion of the video frame belonging to ROI1. It is in fact important that the specialist can assess his/her confidence on the diagnosis.
The ultrasound video clips used in our experiments are cardiac ultrasonography sequences, partly collected from a Hospital and partly from public databases.
The acquired medical video sequence is encoded according to the H.264 standard [
Video coding simulation parameters.
Encoder/Decoder Parameter | Value |
---|---|
Encoder/Decoder | JM reference software codec Version 16.0 |
Profile | Baseline profile |
Test sequences | Guillaume-us |
No. of frames | 70 |
Resolution | |
GOP size | 15 (IPPP |
Quantization parameters | 30/33 |
Reference frames | 1 |
Entropy coding | CAVLC (Content Adaptive Variable Length Coding) |
Decoder error concealment | JM-FC (JM-Frame Copying) |
Groups of slices are organized with the aid of the flexible macroblock ordering tool, in order to have separate groups of slices for different ROIs. Information about the shape of the different ROIs is stored in the Picture Parameter Set.
The encoded image stream is then encoded through RS codes and delivered via RTP/UDP/IP.
We assume robust header compression (RoHC) is adopted to reduce the overhead due to packetization headers and that RTP/UDP/IP headers are compressed via RoHC to three bytes.
The main Radio Access Network parameters of the reference testbed [
WiMAX RAN system parameters.
Parameter | Value |
---|---|
Channel bandwidth | 5 MHz |
Carrier frequency | 3468.5 MHz and 3568.5 MHz |
Number of subcarriers | 512 |
Number of used data subcarriers | 360 |
Cyclic prefix | 1/4 symbol duration |
Frame length | 5 ms |
Channel Coding | Turbo Codes |
Possible modulation and coding schemes | QPSK 1/2, 16 QAM 1/2, 64 QAM 1/2 |
ARQ | No ARQ scheme |
Number of BS antennas | 2 |
BS antenna | type 4 array antenna |
BS antenna height | 22 m |
BS antenna gain | 17 dBm |
BS transmission power | 35 dBm |
BS antenna azimuth | 6° and 276° |
MS antenna gain | 2 dBi |
MS transmission power | 23 dBm |
Vehicular measurement parameters.
Distance to BS antenna | 281 m–500 m |
Scenario | Sub-Urban |
Mobile speed | 50 km/h |
Max. channel bandwidth | 500 kbps |
Packets per second | 38 |
WiMAX measurement results considered for simulation.
MIN Delay [s] | MAX Delay [s] | Average Delay [s] | Max Packet Loss Burst Length | Packet Loss [%] |
---|---|---|---|---|
0.017273 | 0.184955 | 0.047212 | 28 | 10.14 |
We consider a vehicular environment, in order to simulate ultrasound video transmission to/from an ambulance. This is the case where immediate access to ultrasound examinations located in the hospital database is needed, and where the examination is performed in an ambulance through a portable ultrasonographer and the relevant video stream is transmitted in real time to the specialist in the hospital.
Focusing on the latter scenario, we consider uplink data transmission. In particular the Gilbert channel model parameters are selected according to the measurements in Table
We compare the following strategies for application layer (unequal) error protection (see Table No application layer protection; in this case all the available bitrate is used for representing the video sequence. Application layer equal error protection (EEP): in this case a higher protection is uniformly provided to the bitstream, resulting in a higher robustness in bad channel/network conditions, but in a reduced global quality when channel/network conditions are good. RS(31,23). Application layer ROI-based unequal error protection: similar as in the case above, this scheme results in a higher robustness in bad channel/network conditions, but in a reduced global quality when channel/network conditions are good. In this case, however, the redundancy is exploited to protect the most important information from the point of view of the diagnosis and an improved quality in terms of probability to perform a correct diagnosis is expected also when channel conditions are bad. Application layer ROI-based and prediction-based unequal error protection; this scheme results in a higher robustness in bad channel/network conditions, but in an even more reduced global quality when channel/network conditions are good. In this case, the redundancy is exploited to protect the most important ROI from the point of view of the diagnosis and the most important information for motion compensation prediction (I frames). An improved quality in terms of probability to perform a correct diagnosis is expected also when channel conditions are bad.
In medical applications, the target of the optimization of the transmission system should not be the minimization of distortion in terms of mean square error (or equivalently the maximization of the peak signal-to-noise ratio, PSNR), but the maximization of the probability of performing a correct diagnosis based on the received video sequence. Although not designed for this purpose, according to preliminary studies [
We consider local distortion as in the following:
We then consider a slightly modified version of the SSIM metric in [
Numerical results obtained in the conditions described above, and summarized in Table
Application-layer unequal error protection strategies adopted.
Source bitrate | ROI 1 frames I | ROI 2 frames I | ROI 1 frames P | ROI 2 frames P | ||
---|---|---|---|---|---|---|
1 | No application layer FEC | 480 kbps | — | — | — | — |
2 | Application layer EEP | 300 kbps | RS(31,23) | RS(31,23) | RS(31,23) | RS(31,23) |
3 | Application layer UEP based on ROIs | 300 kbps | RS(31,16) | — | RS(31,16) | — |
4 | Application layer UEP based on ROIs and prediction | 300 kbps | RS(31,22) | RS(31,22) | RS(31,22) | — |
Video quality results.
UNCODED | EEP | UEP 1 | UEP 2 | |||||||||
Overall | ROI 1 | ROI 2 | Overall | ROI 1 | ROI 2 | Overall | ROI 1 | ROI 2 | Overall | ROI 1 | ROI 2 | |
PSNR (dB) | 33.11 | 30.94 | 34.01 | 34.63 | 30.92 | 36.12 | 34.15 | 33.26 | 33.85 | 34.95 | 33.06 | 35.65 |
SSIM | 0.89511 | 0.81503 | 0.89953 | 0.89881 | 0.81443 | 0.91568 | 0.89845 | 0.85342 | 0.90887 | 0.90424 | 0.84647 | 0.9144 |
Note that the quality of the diagnostically important region of interest is lower than the quality of the background in the unprotected case, due to the different complexity and the use of the same quantization parameter for the different ROIs.
A uniform protection scheme at the application layer (EEP) increases the total quality of the sequence, but it fails in sensibly increasing the quality of the most important ROI for the diagnosis. We highlight here that the schemes where FEC is applied at the application layer are compared with an uncoded scheme where a higher bitrate is adopted in source encoding, in order to allow a fair comparison.
Both the UEP schemes manage to improve the quality of ROI 1, at the expense of a slight decrease in quality in the remaining part of the images. The scheme UEP 1 provides a slightly higher quality for ROI 1, both in terms of PSNR and SSIM. Scheme UEP 2 provides an improvement of about 1 dB in PSNR with a decrease in quality with respect to scheme UEP 1 of only 0.2 dBs in ROI 1 and it can be preferable in some scenarios as also confirmed by subjective tests.
Visual results are reported in Figure
Visual results—Frame no. 44 of the test sequence. (a) Original; (b) Uncoded (Scheme
We have proposed in this paper a context-aware transmission strategy for diagnostic-quality ultrasound video transmission over WiMAX systems. Context, in terms of regions of interest (ROI) in a specific session, is taken into account for the identification of multiple regions of interest, and compression/transmission strategies are tailored to such context information. We have presented a methodology based on H.264 medical video compression and FMO for ROI identification. Two different unequal error protection methodologies, providing higher protection to the most diagnostically relevant data, are compared. Results show that the proposed scheme allows an improvement for the diagnostic region of interest of about 3 dBs in PSNR and 0.31 in SSIM with respect to the case where such an approach is not adopted, still obtaining a small improvement in quality in the rest of the image (0.8–1.6 in PSNR for UEP 1 and UEP 2, resp.). This methodology is simple to implement and standard compatible.
This work was partially supported by the European Commission (FP7 Project OPTIMIX INFSO-ICT-21462).