Improvement of BBRv2 Congestion Control Algorithm Based on Flow-aware ECN

Google proposed a new congestion control algorithm (CCA) based on bottleneck bandwidth and round-trip propagation time (BBR), which is considered to open a new era of congestion control. BBR creates a network path model by measuring the available bottleneck bandwidth and the minimum round-trip time (RTT) to maximize delivery rate and minimize latency. The BBR v2 algorithm is a recently updated version by Google, which aims to improve some of the problems in the original BBR (BBRv1) algorithm, such as interprotocol fairness issues, RTTfairness issues, and excessive retransmissions. The BBRv2 evaluation results show that it can improve the coexistence with the loss_based algorithm and alleviate some of the shortcomings in BBRv1. However, when multiple BBRv2 ﬂows enter the same link at diﬀerent times, fair convergence cannot be achieved, and RTTfairness still exists. Based on these problems, we analyze the root cause and proposed an improved algorithm BBRv2+, which uses ﬂow-aware explicit congestion notiﬁcation (ECN) to quantify queue information and feedback on the accurate congestion degree. BBRv2+ algorithm can avoid blind window constraints and selectively mark packets so that diﬀerent ﬂows can converge to fairness. In the simulation experiment of Network Simulator 3 (NS3), the results show that the BBRv2+ algorithm can improve intraprotocol fairness and RTT fairness and ensure bandwidth utilization and interprotocol fairness.


Introduction
With the intensification of the contradiction between user demand and network resources, the importance of congestion control in transmission control protocol (TCP) becomes more and more important. TCP congestion control can effectively improve the management of network resources and the quality of network service. Traditional TCP congestion control algorithm (CCA) is mainly divided into loss_based CCAs and delay_based CCAs [1,2], such as Reno [3], CUBIC [4], and Vegas [5], but these two kinds of CCAs have some design defects [6][7][8].
In 2016, Google proposed a hybrid CCA based on bottleneck bandwidth and round-trip propagation time (BBR) [9][10][11]. BBR calculates the bandwidth delay product (BDP) by measuring the maximum delivery rate (Btlbw) and minimum round-trip propagation time (RTprop) to achieve high throughput and low latency. e BBR implementation released by Google shows that BBR can significantly improve throughput and bandwidth utilization compared to CUBIC. Due to its excellent performance, BBR has gained much attention ever since its release. Mathis et al. [7] claimed that BBR opens a new era in congestion control and obsoletes Jacobson88. However, some evaluations [12][13][14][15][16][17][18] found that there are some problems in BBR, such as the unfairness when BBR and Reno/CUBIC share bottlenecks, the intraprotocol fairness, and RTT fairness. For the problem of the BBR algorithm, researchers are also making continuous efforts to obtain better performance by modifying some control parameters of BBR [19][20][21][22][23][24]. Google is also constantly optimizing the BBR algorithm and launched the second version of BBR in 2018, called BBRv2 [25]. BBRv2 introduces packet loss rate and explicit congestion notification (ECN) [26] marking as congestion signals to adjust congestion window (CWND) and pacing rate. Some evaluations [27][28][29] show that BBRv2 can coexist better with Reno/ CUBIC and further reduce retransmission. However, the RTT fairness in BBRv2 still exists, and the flow with different start times cannot converge fairly.
Obviously, more experiments are needed to ensure that the BBR algorithm has no potential pitfalls and further improve its network practicability. We further analyzed intraprotocol fairness and RTT fairness in the BBRv2 algorithm proposed in the literature and propose an improved method based on flow-aware ECN, called BBRv2+. BBRv2+ perceives accurate congestion degree and queue information through ECN, selectively marks ECN, avoids blind CWND adjustment, and can effectively improve the fairness of the algorithm. With the advent of programmable switches [30][31][32][33], network owners can choose to carry new (standard or proprietary) information in packet headers. Queue information can be easily obtained from the switch, and there is no technical difficulty in placing queue occupancy information in the data packet header [34]. It is feasible to combine the queue information in the switch with the congestion control algorithm. e main contributions of this paper are as follows: (1) Based on the analysis results of BBRv2, a flow-aware ECN for quantifying queue information and congestion degree is proposed. BBRv2+ algorithm can selectively judge packet loss and ECN marking, avoid blind CWND restriction, and balance the sending rate between flows. (2) We comprehensively evaluate the BBRv2+ algorithm from the aspects of CWND, link utilization, intraprotocol fairness, RTT fairness, and interprotocol fairness. e results show that the optimization algorithm can effectively alleviate the fairness problem of BBRv2 without sacrificing the advantages of BBRv2. (3) e evaluation results between algorithms can enable engineers and technicians to select appropriate parameter configurations in the application of the BBR algorithm and provide a reference for the development of the BBR version.
e rest of this article is arranged as follows. Section 2 briefly introduces the related research on BBRv2, and Section 3 elaborates on the specific information of the BBR algorithm and deeply analyzes the causes of the fairness problem of the BBR algorithm. e theoretical model and derivation of the optimization algorithm are in Section 4. Section 5 is simulation results and evaluation. Conclusions will be held in Section 6.

Related Work
After the BBRv1 algorithm was released by Google, there have been a lot of evaluation studies and improvement work on it. Hock et al. [12] first evaluated the performance of BBR on high-speed bottleneck links, including RTT fairness, delay, packet loss rate, and coexistence fairness with CUBIC. Scholz et al. [14] analyzed the behavior and performance of BBR, analyzed the advantages of the BBR algorithm, and proposed a publicly available framework for repeatable TCP measurement based on network simulation. Moreover, some researchers have further analyzed the defects of BBR and put forward some optimization algorithms.
Google is also constantly optimizing the BBR algorithm and updating the BBRv2 version to improve problems reported in BBRv1's studies. At IETF-102 [25] and IETF-104 [35], BBRv2 changed the method of adjusting the bandwidth probe cycles to improve the interprotocol fairness. Google revealed the preview version (called BBRv2 alpha) [36] of the open-source code in IETF 105 [37], encouraging researchers to dig deeper to help evaluate and improve BBRv2. is version improves the coexistence with loss-based CCAs and reduces the loss rate of bottleneck buffer to less than 1.5BDP. At IETF-109 [38], Google proposed a new BBR.Swift algorithm based on the method of using delay as a congestion signal [39]. BBR.Swift can achieve higher fairness and lower retransmission rate when flows using the same CCA.
Since the updated BBRv2 algorithm is not the final version, the performance of BBRv2 is only evaluated on the simulation platform. Zhang [27] discussed BBRv1, BBRv1 variant, and BBRv2 and pointed out that BBRv2 has improved RTT fairness compared with BBRv1 and has better coexistence with CUBIC and Reno. However, the channel utilization of BBRv2 is low when the random loss rate is 5%. Gomez [40] et al. used Mininet simulation to conduct an experimental evaluation on BBRv2, and the results showed that BBRv2 improved coexistence with CUBIC and alleviated RTT fairness in BBRv1. However, some studies show that there are still some problems to be optimized in BBRv2. Kfoury et al. [28] pointed out that BBRv2 could not quickly detect the available bandwidth in the network environment with unstable bandwidth, resulting in low link utilization. Song et al. [41] pointed out that two BBRv2 flows entering the same bottleneck link at different times cannot achieve fair bandwidth sharing in deep buffers. e first start flow will occupy more bandwidth, which will lead to the subsequent start flow being limited. Nandagiri et al. [42] found that the unfairness between BBRv2 flows with different RTTs is alleviated, and the packet retransmission times are greatly reduced. However, when the buffer size is large enough, long RTT flows still consume more bandwidth than short RTT flows.
With the continuous updating of the BBR algorithm, BBRv2 can solve some fairness problems and limitations of BBRv1. But, BBRv2 needs more research to improve the RTT fairness and intraprotocol fairness, so as to provide more possibilities for the final version of BBRv2 and even TCP congestion control.

Detail on Algorithms
3.1. BBRv2 Behavior Analysis. BBR measures the maximum delivery rate and minimum transmission delay alternately to find Kleinrock's optimal operating point [43]. BBR controls congestion by limiting the pacing rate of packets, limiting inflight to one BDP, as calculated in BDP Btlbw × RTprop. (1) BBR adjusts the speed of output packets at the latest estimated delivery rate. At the same time, BBR maintains a CWND in order to maintain consistent throughput in delayed or aggregated ack networks. BBR sets the maximum bandwidth (delivery rate) of the last 10 round trips as Btlbw and regards the minimum delay measured in the past 10 seconds as RTprop through the scaling factor cwnd_gain and pacing_gain to adjust CWND and pacing rate. e BBRv1 algorithm has four phases, as shown in Figure 1. BBRv2 is optimized on the basis of the original BBR v1, and its ProbeBW phase is further divided into four substages.
In the StartUP phase, pacing rate and CWND will increase by setting cwnd_gain and pacing_gain as 2/ln2 (about 2.89). e exponential growth of pacing rate and CWND will lead to queue accumulation on routers. If the newly estimated bandwidth of three consecutive RTTs does not increase by at least 25%, the BBR enters the Drain phase. On the basis of BBRv1, BBRv2 adds two additional conditions: packet loss or ECN mark rate. If the packet loss rate or ECN mark rate exceeds their respective thresholds, in ight_hi is set to an estimate of the maximum in ight. ECN mark rate is calculated by equation (2). When the continuous bandwidth detector reaches a stable value or when in ight_hi is set, the ow exits the StartUP phase.
where g is a weight factor and is 1/16 in DCTCP and F is the fraction of packets that are marked in the last window of data.
In the Drain phase, BBRv1 and BBRv2 are the same. BBR passes the pacing_gain which is reduced to ln2/2 (about 0.35) to clear the remaining queue in the previous stage, and cwnd_gain remains unchanged (2/ln2). At the end of this phase, in ight data <the estimated BDP.
In the ProbeBW phase, BBRv1 has 8 cycles in the detection bandwidth (pacing_gain [] [1.25; 0 : 75; 1; 1; 1; 1; 1; 1; 1; 1]), and the duration of each pacing_gain was RTprop. In BBRv2, unlike the 8 cycles in BBRv1, it is divided into four cycles, as shown in Figure 2. e pacing_gain in the Up and Down phases is 1.25 and 0.75, respectively, and that in the Cruise and Re ll phase is 1. e bandwidth detection time in BBRv2 is adaptive, which improves the fairness when coexisting with Reno and CUBIC.
In the Re ll phase, the ow probes for additional bandwidth and in ight capacity to prepare for the Up phase. In the Up phase, if in ight_hi is fully utilized, it will increase the number of packets per round (1, 2, 4, 8, . . . Packets) exponentially. If the loss rate or ECN mark rate is too high, in ight_hi will be reduced to the length of in ight packets. When in ight_hi or the estimated queue is large enough (in ight data greater than 1.25 times the estimated BDP), the ow process exits the Up phase. And then the ow process enters the Down phase to clear the recently created queue and leave unused headroom. CWND is set to the minimum between in ight_lo and in ight_hi, as shown in equation (3). When in ight data is lower than in ight_hi or in ight data is equal to or lower than the estimated BDP, the ow exits this phase. In the cruise phase, bw_lo, in ight_lo is updated at each RTT, and packet loss and ECN signals are used, so that the sending rate constantly adapts to the control queue level.
where kHeadRoom is a constant and the default value is 0.15.
For the ProbeRTT phase, if a new RTprop is not sampled again within 10 seconds, BBR v1 enters ProbeRTT, while that in BBRv2 is 5 seconds. In this phase, the CWND of BBRv1 is set to 4MSS and lasts for 200 ms. BBRv2 sets CWND to 50% of BDP to avoid throughput loss.

Causes Fairness.
e evaluation results [40][41][42] of BBRv2 show that BBRv2 works well on bottleneck links with small bu ers. Compared with BBRv1, BBRv2 not only improves the fairness with other TCP ows but also reduces packet loss. However, BBRv2 still faces some problems of fair convergence. On the one hand, there is an RTT fairness problem between di erent RTT ows in the protocol. On the other hand, when BBRv2 ows with the same RTT enter bottleneck links with large bu ers at di erent times, there is also a fairness problem.
BBRv2 introduces packet loss rate and ECN marking rate as congestion signals. In the StartUP phase, if the packet loss rate or ECN marking rate exceeds the prede ned threshold, the current estimated BDP is set to the upper limit in ight_hi. e ows with di erent start times enter the same bottleneck link, the RTprop measurement value will increase and the packet loss feedback (threshold 2%) will be triggered when the later start ows enter the link. en, the ow sets the current estimated BDP as in ight_hi and moves on to the next phase. In the ProbeBW: Up phase, in ight_hi limits the increase of packets in ight due to the presence of the ECN marking rate and will exit the phase early to avoid congestion. erefore, the second start ow cannot measure the higher bandwidth and always remains at the level of the Security and Communication Networks 3 initial probe. If the bu er size is large enough to ensure that the second ow does not experience packet loss, in ight_hi will not be set. is mechanism makes the BBRv2 bandwidth probe sensitive to bu er size, resulting in ows starting at di erent times not being able to share bandwidth fairly [41]. Moreover, RTT fairness persists in BBRv2 because the transmission rate is periodically increased to detect available bandwidth. e bandwidth sample measured by each BBRv2 ow is larger than the actual available capacity, so the BBRv2 ow produces a standing queue length similar to BBRv1. e estimated BDP value of the long RTT ow is larger than that of the short RTT ow. e short RTT ow is rst limited by CWND. Although the ECN feedback is added in BBRv2 to adjust the size of CWND through in ight_hi and in ight_lo, the queue information is not aware. e large proportion of persistent queues for long RTT ows causes long RTT ows to squeeze the bandwidth of short RTT ows.
From the above analysis, it can be seen that the fundamental reason why di erent ows cannot converge fairly is that the BBRv2 sender restricts CWND based on packet loss and ECN marking rate. e packet loss threshold can only re ect whether the network is congested (the packet loss caused by link errors is ignored here), but it cannot provide the speci c degree of congestion of the network. e ECN marking uses a simple marking scheme. When the instantaneous queue length is greater than the precon gured threshold value K, each arriving packet will be marked by ECN, each queue has its own threshold, and the ECN marking will be performed independently for other queues.
is ECN marking approach is ow agnostic because it marks packets based on queue length, regardless of the state of the ows. Such feedback regulation behavior cannot cope with the complex and changeable network situation, so it is necessary to further improve the feedback mechanism in BBRv2.

The Proposed Algorithm: BBRv2+
4.1. Flow-Aware ECN. In order to solve the problem of intraprotocol fairness and RTT fairness of BBRv2 congestion control, the sender must ne-grained adjust the CWND and pacing rate according to the level of network congestion.
is brings about two problems: (1) how the sender accurately knows the degree of network congestion; (2) how the sender reasonably chooses the magnitude of the multiplicative decrease (MD) according to the level of congestion.
Ideally, we want to quantify the size of the queue and sense congestion with each ACK signal. e sending rate can be adjusted reasonably by the queue length. When the queue is too long, the large MD (MD coe cient close to 0) can be used to greatly reduce the sending rate and accelerate the queue draining. When the queue is short, the small MD (MD coe cient close to 1) can be used to slightly reduce the sending rate and maintain high bandwidth utilization. Based on the above issues, on the basis of the BBRv2 algorithm, FAECN ( ow-aware ECN) is introduced to replace the original ECN mechanism to givenfeedback on the congestion degree and queue size of the bottleneck link. Figure 3 shows the proposed architecture diagram for BBRv2+. e level of congestion is quanti ed by the change of RTT [44,45], and Δ represents the load factor of the bottleneck link, which is calculated by the ratio of the current delay of ow i over a period(5 s) to the maximum delay in the link, as shown in where T i is the current RTT obtained by ow i from the last ACK, and T max is the maximum RTT in the entire link from each period. Δ 1 only if the bandwidth and bu er of the bottleneck link are fully utilized; otherwise, it is < 1. Algorithm 1 describes the statistical process of the load factor. e sender uses the feedback RTT values to retrieve the maximum and minimum RTT values and calculate the load factor. e congestion degree of the link is divided into low load and full load states by load factor Δ. In the StartUP phase, when the link load is low, even if the packet loss threshold exceeds 2%, the upper limit in ight_hi of CWND will not be set and continue to perform bandwidth detection. When the link is fully loaded, perform the operation in the original BBRv2.
In addition, a queue length threshold K needs to be set, and when the queue length exceeds a lower threshold K min , packets from relatively high sending rate ows are selectively marked. According to queuing theory, ows with higher sending rates occupy most of the queue bu ers. ese highspeed ows are slowed down to e ectively maintain the queue length below the given threshold, while the other unmarked ows keep increasing their sending rates to prevent the bu er under ows. In this way, the bandwidth gap between di erent ows is reduced. In addition, long RTT ows often have a higher sending rate than short RTT ows, when long and short RTT ows share the same queue, long RTT ows are more likely to be marked and slowed down to alleviate congestion, while short RTT ows can speed up transmission, thus alleviating RTTunfairness in the algorithm.
FAECN uses two ECN bits to notify the sender, who determines the level of congestion based on the ECN marker rate and load factor. In order to distinguish high-speed ows, the sending rate is stored in the Options of the IP packet (size 16 bits). e de nitions of ECN [46] in the IP header are shown in Figure 4. e switch reads the sending rate and compares the size of the average rate to which packet belongs to high-speed ow. When the queue length exceeds the given minimum threshold K min and the rate value carried in the packet header is greater than the moving average, the switch marks the packets arriving with ECN. When a packet enters the queue, the switch updates the mean aveS with an estimate of the average sending rate of all the ows in the queue, as shown in S is the sending rate carried in the packet, c is the weight when updating the new value of aveS, and the value range is 0 < c < 1. In this paper, the value of c is 1/8. If S > aveS, the packet will be marked by ECN as probability P ([0, 1]). When the queue length changes from K min to K max , the marking probability P changes linearly with the queue length.
e switch compares the changes in the current queue length. If the instantaneous queue length q is smaller than K min , ECN is not marked. If the queue length q is larger than K min and smaller than K max , ECN markings are selected according to the sending rate of packets. Algorithm 2 represents the process of packet ECN marking. Firstly, the queue threshold in each ACK is judged, and then the selective ECN marking is carried out in combination with the sending rate in the message. e FAECN method can e ectively quantify the network queue length. Selectively tagging packets based on queue length allows the network to precisely slow down high-speed ows without killing slow-speed ows. us, the problem that di erent ows cannot converge fairly in the BBRv2 algorithm is alleviated. In addition, FAECN simply uses large elds in the data header without adding too much overhead to the switch, making it easy to implement and deploy.
In the BBRv2+ algorithm, when congestion exceeds the threshold Δ, the in ight_hi is set based on the packet loss rate and ECN marking rate. e threshold of ECN marking rate is 50%, which is consistent with the threshold used in the original BBRv2. On the one hand, the congestion degree threshold avoids blind judgment of the packet loss threshold For every ACK do (2) if BBR in StartUp phase then (3) rtt_us ⟵ (now sending time) (4) if rtt_us < T min then (5) T min ⟵ rtt_us (6) end if (7) if rtt_us > T max then (8) T max ⟵ rtt_us (9) end if (10) Δ rtt_us/T max //Calculate load factor (11) end if (12) end for ALGORITHM 1: e statistical process of load factor. Security and Communication Networks and alleviates the convergence problem among di erent ows. On the other hand, the algorithm selectively carries out ECN marking through the queue information to achieve the staggered arrangement of each ow rate. When highspeed ows and short ows share the same queues, the highspeed ows are more likely to be marked and slowed down to relieve congestion, and the low-speed ows can speed up and nish quickly. erefore, compared with the original BBRv2, BBRv2+ can make each ow converge to fairness.

Algorithm Model Analysis
. Establish a uid model to simplify the BBR operation mechanism. Suppose that there are n ows with di erent RTTs passing through a bottleneck link with bandwidth C, where let flow i (i ∈ [1; n]) represents ow i and d i (t) represents the delivery rate (bottleneck bandwidth) at time t. S i (t) is the sending rate at time t, calculated by R i (t) A represents the round-trip time of ow i at the time t, as shown in where q i (t)/C denotes the queuing delay. Let Q i (t) denote the disparity between in ight (I(t)) and delivery capacity (D(t)), as shown in It can be seen that if Q i (t) > 0, the transmission capacity of ow i in the bottleneck link is exceeded. In order to reduce the burden of the bottleneck, the ow i should reduce the number of packets injected into the pipeline in the next period through the feedback mechanism. ECN is marked by comparing the send rate to the average send rate of all ows passing through the same exit. e calculation of the average rate aveS(t) is shown in e probability of marking according to the queue length is shown in the equations (10) and (11): where q(t) is the instantaneous queue length at time t, and K is the threshold of queue length. K min should be set to a low value to maintain low network latency, and K max should be set to a high value to maintain low queue oscillation to avoid queue bu er over ow. Based on the standard ECN marking threshold derived from the single-queue model [45,46], the only queue of the bottleneck link shared by synchronous ows with the same RTT is considered. In order to make full use of link bandwidth and realize low latency, the ECN marking threshold K is set as follows: K C * RTT * λ, where λ is a tunable parameter closely related to CCAs. In this con guration, any queue can independently make full use of the link capacity. But the problem is that when N queues are busy at the same time, the total bu er occupancy can easily reach N times the standard threshold, resulting in high queue delay and huge bu er pressure. erefore, the selection of K should vary with the bandwidth and tra c nature. Generally, K min is set to a low value to maintain low network delay, while K max is set to a high value to maintain low queue oscillation, so as to avoid bu er idleness [47][48][49][50]. On the one hand, to prevent the impact of K from being underestimated, on the other hand, considering that the weighted fair share rate should not be greater than the link capacity, K can be constrained by  (i) Input: Mark the packet (4) else if K min < q< K max and S > aveS then (5) Mark the packet with P (6) end if (7) end for ALGORITHM 2: e process of packet ECN marking. 6 Security and Communication Networks where quantum represents the maximum number of bits that can be sent in each round of queue, and λ is set to 0.17. T round represents the completion time of each round, which is calculated by the weighted moving average of where g t is a parameter in (0, 1), which represents the forgetting speed of the historical value by T round . Here, g t is taken as 1/3 to prevent violent uctuations when the estimated value of K is too small. We assume that each queue maintains a variable T pre to store the time stamp when the queue completes the service in the previous round. Each time the queue completes its service, it records the current timestamp T now and calculates a round of time samples T sample as follows: T sample T now − T pre . en, reset T pre with T now . FAECN maintains the same scale and implementation complexity as ECN, and only one additional registration is required for each port to store T round . For marked probability P, P min , and P max should be set to a larger value when concurrent tra c is large; otherwise, P min and P max should be set to a smaller value. We conducted a simulation experiment, and the distribution between queue length and P min and P max is shown in Figure 5.
As seen from Figure 5, a larger P min can maintain a smaller queue length, which will lead to instability of queue range. e average queue length increases signi cantly with the increase in the number of concurrent streams. erefore, we set P min 0.5 and P max 1 in the simulation experiment. However, the actual network load is variable, and the number of concurrent streams may vary with time and space. e xed threshold K and P cannot satisfy all cases, and further veri cation and optimization are needed after that.
Moreover, regarding the adjustment of CWND, from equation (4), we can know that when T i moves to T min , Δ increases to the lowest possible value Δ min , which means that the link load is light, as shown in Conversely, if T i moves in the T max direction, it indicates that Δ increases to the maximum possible value Δ max , which indicates that the network load is busy, as shown in e judgment of congestion degree depends on Δ, considering that the link-state threshold should be the connection point between the idle state and the full state of the network. e threshold of Δ should be close to 1 so that the sender can use the idle bandwidth faster, but it also needs to balance the link convergence time. We set up a 100 Mbps bottleneck link and ve ows with 80 ms RTT for the simulation experiment. Figure 6 shows the in uence of Δ on link utilization and convergence time. e fairness convergence time is normalized by the minimum fairness convergence time in the simulation. It can be seen from the simulation results that both link utilization rate and fair convergence time increase with the increase of Δ. e difference is that the growth rate of link utilization slows down after Δ > 0.7, while the growth rate of fairness convergence time increases sharply after Δ > 0.8. To balance these two parameters, we set the threshold to 0.8.

Results and Discussion
Based on the implementation of the BBR algorithm framework [36,51,52], a large number of simulation experiments are carried out on the NS3 platform to compare the performance of BBRv1, BBRv2, and BBRv2+. is section describes the results of running tests under di erent network conditions, where the condition variables include start time, RTT, and bu er size. As shown in Figure 7, an experimental environment is constructed.

CWND Evolution.
e evolution of CWND directly in uences other performance metrics, such as throughput, bandwidth utilization, and sharing fairness. In the BBRv2+ algorithm, we judge the link state by optimizing the ECN marking mode and adding the load factor, so as to adjust the CWND limit in the original BBRv2 algorithm. rough NS3 simulation experiments, we verify the CWND size of multiple ows with di erent packet loss rates (bu er size is 1BDP). Figure 8 shows the comparison of three algorithms BBRv1, BBRv2, and BBRv2+ in terms of CWND evolution, which can be divided into the aggregated CWND of each algorithm when the packet loss rate is 0 and 1. It can be seen from Figure 8 that the CWND regulation frequency of BBRv2+ is more frequent, indicating that BBRv2+ adjusts the size of CWND timely through the ne-grained perception of ow status. For Figure 8(a), although BBRv1 reaches the maximum CWND earlier than BBRv2 and

Security and Communication Networks
BBRv2+, the uctuation range of CWND after stabilization is signi cantly larger than BBRv2 and BBRv2+. When BBRv2+ reaches dynamic stability, its CWND is slightly lower than that of BBRv1 but higher than that of BBRv2. In Figure 8(b), the CWND of the three algorithms is reduced, and BBRv2+ shows low sensitivity to packet loss, which is only reduced by about 5%, and CWND is the most stable.

Link Utilization.
A single-ow scenario was created based on NS3 to evaluate the link utilization of the improved algorithm BBRv2+. e three algorithms were tested for link utilization on links with random packet loss, bu er congurations ranging from 0.1 to 20 BDP, and random packet loss rates of 0% and 1% respectively. e bandwidth utilization of all ows is calculated according to where bytes i is the length of all received packets for flow i . Cap is the bandwidth of the bottleneck link, and duration is the continuous simulation running time. e experimental results of link utilization of BBRv1, BBRv2, and BBRv2+ are shown in Figure 9. In Figure 9(a), the link utilization of BBRv2+ is lower than that of BBRv1 but higher than that of BBRv2, and the difference between BBRv2+ and BBRv1 is only about 0.25%. is is because the detection rate in BBRv2 decreases, sacrificing part of the bandwidth. BBRv2 + algorithm ensures the rate of low-speed flow by selective ECN marking, so the link utilization does not decrease. When the packet loss rate is 1% in Figure 9(b), the link utilization of the three algorithms decreases, and the link utilization of BBRv2 is the lowest. In BBRv2, the upper limit of CWND is set by packet loss feedback, while the BBRv2+ algorithm adjusts the threshold judgment of node packet loss by the congestion degree, which avoids blind CWND limit and maintains the original link utilization. e experimental results show that BBRv2+ has a certain antipacket loss ability without sacrificing link utilization.

Multiple Flows with Different Start Times and Buffer Sizes.
In the previous analysis, we know that the flows with different start times in the BBRv2 algorithm cannot converge fairly. erefore, we test the throughput of flows starting at 0 s or 5 s to analyze the effectiveness of the BBRv2+ algorithm. e bottleneck bandwidth of each test is 100 Mbps, and the start time of flow1 and flow2 is 0 s and 5 s, respectively. Each experiment consists of flows running the BBRv1, BBRv2, or BBRv2+ algorithm.
In a 0.5 BDP buffer, the throughput of the three algorithms is shown in Figures 10(a) and 10(b). For the BBRv1 algorithm, the throughput of flow 1 and flow 2 is shown in Figure 10(a). Flow 1 can quickly obtain bandwidth in the StartUP phase. After flow 2 is added, flow 1 will have a shortterm throughput decline and then remain stable. e throughput difference between flow 1 and flow 2 is about 2.2 Mbps. In Figure 10(b), the throughput difference between BBRv2 increases to approximately 6.7 Mbps. For the BBRv2+ algorithm, the throughput trends of flow 1 and flow 2 are similar to the above two algorithms. Although the throughput difference between flow 1 and flow 2 is larger than that in BBRv1, it is smaller than that in BBRv2. e throughput difference between flow 1 and flow 2 is reduced by 32% compared with BBRv2.
In the 5BDP buffer, the throughput of the three algorithm flows is shown in Figures 10(c) and 10(d). Compared with the results in 0.5BDP buffer, the throughput difference between flow 1 and flow 2 increases. In Figure 10(c), the throughput difference between flow 1 and flow 2 in the BBR algorithm is larger than that in 0.5 BDP, but the throughput difference is small. e throughput of flow 1 and flow 2 can still maintain good fairness. In Figure 10(d), the throughput difference between BBRv2 is significantly increased compared with Figure 10(b), and there is obvious bandwidth unfairness.
e throughput difference is about 1.7 times. Compared with BBRv2, the throughput fluctuation of BBRv2+ is relatively large, but the throughput difference is reduced to 1.3 times. Overall, BBRv2+ improves the unfairness of throughput in BBRv2.
According to the situation considered, a multiflow scenario is established continuously to evaluate the performance of three algorithms on congestion bottleneck to simulate real network scenarios. In order to further verify the effectiveness of BBRv2+, we divided 100 flows into 5 groups and set the start time of each flow as 0 s, 2 s, 5 s, 10 s, and 20 s in turn. Other conditions remain unchanged, repeated experiments are carried out to calculate the average throughput of each flow. e experimental results are shown in Figure 11.
In a 0.5 BDP buffer, the average throughput of the three algorithms is shown in Figures 11(a), 11(b), and 11(c). For the BBRv1 algorithm, the average throughput of five flows is shown in Figure 11(a). e flow starting at 0 s can quickly obtain bandwidth at the start, but with the addition of other flows, the bandwidth of flows starting at 0 s gradually decreases. Each flow starts with a peak in bandwidth and then decreases. Finally, the five flows gradually share bandwidth after 60 s, converging to fairness. In Figure 11 Figure 11(c), the average throughput trend of the five flows is similar to BBRv1, which can achieve better convergence than BBRv2. After the bandwidth competition is stable, the average throughput of each flow is about 17-21 Mbps, and the throughput difference is reduced by 80% compared with BBRv2.
In the 5BDP buffer, the average throughput of the two algorithm flows is shown in Figures 11(d), 11(e), and 11(f). In Figure 11(d), compared with Figure 11(a), the bandwidth difference of the five BBRv1 flows is increased, but the bandwidth difference is very small, and it can still converge well. For the BBR v2 algorithm, the throughput difference of five flows becomes larger as shown in Figure 11(e), and there is obvious bandwidth unfairness. e average throughput of flow starting at 0 s is about 40 Mbps, that of flow starting at 2 s and 5 s is 19 Mbps and 15 Mbps, respectively, and that of flow starting at 10 s and 50 s is only 9 Mbps and 7 Mbps. In Figure 11(f ), the bandwidth fairness of the five flows in the BBRv2+ algorithm is reduced compared with 0.5BDP. However, compared to the BBRv2 algorithm, five flows maintained a relatively fair average throughput, eventually stabilizing at 17-22 Mbps. Although the fairness is not as good as BBRv1, it is greatly improved compared with the BBRv2 algorithm.
We find that the fairness convergence of the BBR algorithm varies with the buffer size. In order to evaluate the difference in the fair convergence problem of the BBR algorithm in different buffer sizes, we conducted a lot of tests. In this paper, the    fairness index is introduced. As described in the literature [53], Jain's fairness index is used to measure equity. Equation (17) shows a method to calculate Jain's fairness index.
where x is the throughput of the ow. For example, in the case of 100 simultaneous ows, n 100 and i 1, 2, . . . , 100. e closer Jain's fairness index is to 1, the better the fairness of bandwidth allocation. Jain's fairness index can well re ect the throughput di erence.
We repeated the experiment 10 times, calculating the average throughput of each ow in each test duration (200 s). e average throughput of each group is obtained by averaging the throughput of 50 ows corresponding to the same start time. Each group consists of 10 samples, which avoids accidental experimental errors to a certain extent. Figure 12 shows the average throughput and Jain fairness index of the three algorithms running in 0.1 ∼ 100 BDP bu er with di erent start time ows.
As shown in Figure 12(a), the fairness index of the BBRv1 algorithm is close to 1, and each ow forms a good bandwidth sharing. For BBRv2 in Figure 12(b), when the bu er size is less than 0.2 BDP, ows with di erent start times can share bandwidth fairly. However, when the bu er size is greater than 0.2 BDP, the fairness index decreases with the increase of bu er size, and the ow started rst takes up more bandwidth. When the bu er size is greater than 10 BDP, the fairness index is only 0.93. For BBRv2+, the fairness index decreases with the increase of bu er size. But compared with the BBRv2 algorithm, the fairness has been greatly improved, and the minimum fairness index can be maintained at about 0.99. In general, the fairness index of the BBRv2+ algorithm is 6% higher than that of the BBRv2 algorithm, and it has better fairness on the whole.

RTT Fairness.
Unlike traditional CCAs, the longer RTT ows in BBRv1 will occupy more bandwidth than the shorter RTT ows. is RTT bias is a trade-o between low latency and high transmission rates, breaking down the notion of nding the optimum operating point with the minimum RTT. Some evaluation results suggest that BBRv2 alleviates RTT fairness in BBRv1.
is section compares the RTT unfairness of BBRv1, BBRv2, and BBRv2+ through simulation experiments and evaluates whether BBRv2+ alleviates this limitation. ese tests compete the 10 ms RTT ows with the 50 ms RTT ows. In the bu er condition of 0.1 ∼ 100 BDP, the average throughput and Jain fairness index of ow are shown in Figure 13. Figure 13(a) shows the average throughput change and Jain fairness index when 10 ms RTT ow competes with 50 ms RTT ow in the BBRv1 algorithm. With the increase in bu er size, the throughput di erence between 10 ms RTT ows and 50 ms RTT ows becomes larger. When the bu er size is less than 0.5 BDP, di erent RTT ows can share bandwidth fairly, and the fairness index is about 0.99. When the bu er is larger than 0.5 BDP, the bandwidth di erence between the two ows increases with the increase of the bu er. When the bu er is larger than 6 BDP, the fairness index is only about 0.63. In Figure 13(b), the average throughput variation trends of BBRv2 and BBRv2+ algorithms are similar, and the bandwidth di erence between 10 ms RTT ows and 50 ms RTT ows decreases compared with BBRv1. With the increase in bu er size, the initial advantage of the 10 ms RTT ows increases and the fairness index drops to about 0.93. However, when the bu er is increased between 7 BDP and 8 BDP, the bandwidth of 10 ms RTT flows and 50 ms RTT flows is reversed, and 50 ms RTT flows are gradually dominant. When the buffer is larger than 10 BDP, the fairness index of BBRv2 is about 0.9, while the fairness index of BBRv2+ is 0.94. In buffer sizes from 0.1 to 100 BDP, the BBRv2+ fairness index remained above 0.94, with a 31% improvement over BBRv1 and 4% improvement over BBRv2, especially in large buffers.

Multiple Flows with Different Start Times and RTTs.
We further carry out hybrid experiments for different RTT flows and flows with different start times. e intrafairness of BBRv1, BBRv2, and BBRv2+ is verified by experiments. We are divided into four groups, each set as shown in Table 1. e experimental results are shown in Figure 14. For the BBRv1 algorithm in Figure 14(a), the unfairness of average throughput mainly comes from the RTT unfairness. When the buffer size is less than 0.5 BDP, each flow can compete fairly for bandwidth. With the increase in buffer size, the fairness between different RTT flows (flow 1 and flow 2, flow 3 and flow 4) decreases, but the fairness between flows with different start times (flow 1 and flow 3, flow 2 and flow 4) can still maintain relatively stable fairness. When the buffer is larger than 10 BDP, the fairness index is only about 0.63. Figure 14(b) shows the experimental results of the BBRv2 algorithm. When the buffer is less than 1 BDP, four flows can compete fairly for bandwidth, and 10 ms RTT flows are relatively dominant. With the increase of buffer, the fairness among the four flows decreases. When the buffer size is between 5 BDP and 6 BDP, the bandwidth of 10 ms RTT flows and 50 ms RTT flows is reversed, and 50 ms RTT flow is dominant. Moreover, because of the bandwidth difference caused by the start time, the throughput gap between the four flows becomes larger and larger. When the buffer size increases to 10 BDP, the bandwidth difference remains stable, and the fairness index is approximately 0.9. For the BBRv2+ algorithm, when the buffer is less than 10 BDP, the average throughput trends of the four flows in Figure 14(c) are basically the same as those in Figure 14(b), but the throughput difference between the four flows is smaller. When the buffer is larger than 10 BDP, the BBRv2+ algorithm can better balance the bandwidth occupation of the four flows. Compared with the BBRv2 algorithm, the BBRv2+ algorithm alleviates the bandwidth unfairness, and the fairness index remains above 0.93. In particular, in the deep buffer size, the fairness index is 34% higher than BBRv1 and 7% higher than BBRv2. To sum up, the proposed BBRv2+ algorithm has better fairness than BBRv1 and BBRv2 algorithms; in particular, in deep buffer size, the fairness index is the highest.

Coexistence and Fairness with CUBIC.
In this part, the fairness of BBRv1, BBRv2, and BBRv2+ is evaluated when coexisting with CUBIC in different buffer sizes. e competition between a single flow and multiple flows is considered. e simulation experiments of competition between single BBRv1, single BBRv2, or single BBRv2+ and single CUBIC, 50 BBRv1 flow, 50BBRv2, or 50BBRv2+ flows and 50 CUBIC flows are designed. e average throughput and fairness index results of the three algorithms are shown in Figure 15.
In Figure 15(a), a single BBRv1 flow competes with a single CUBIC. When the buffer is less than 0.4 BDP, the bandwidth of BBRv1 is more than 80%, and the fairness index is only about 0.67. When buffer size is between 0.5 BDP and 3 BDP, BBR's average throughput is declining and CUBIC's gradually occupies a favorable position, the average throughput between BBRv1 and CUBIC is reversed finally. In this range, the fair index has also gradually increased, reaching the highest value of 0.99 near 2 BDP. When buffer size is greater than 3 BDP, the throughput difference between BBRv1 and CUBIC increases, and the final result is that the bandwidth occupancy ratio of CUBIC is about 75%, and the fairness index is stable near 0.76. Figure 15(b) shows the trend of average throughput and fairness index changes when a single BBRv2 flow or a single BBRv2+ flow competes with a single CUBIC. Compared with the BBRv1 algorithm, BBRv2 improves the fairness when coexisting with CUBIC. When the buffer size is 10 BDP, BBRv2 can maintain good fairness with CUBIC, and the fairness index is above 0.92. With the increase in buffer size, the average throughput of CUBIC increases gradually and occupies a dominant position. But the fairness index can be maintained at about 0.78, which is significantly better than the BBRv1 algorithm. For the BBRv2+ algorithm, the average throughput trend is very similar to BBRv2. When the buffer is less than 10BDP, the fairness index can remain above 0.92. When the buffer is larger than 10 BDP, the fairness index is greater than 0.82. Overall, the fairness of the BBRv2+ algorithm is improved by 6% compared with the BBRv1 algorithm and 4% compared with the BBRv2 algorithm, indicating that the BBRv2+ algorithm achieves better interprotocol fairness.
When 50 BBRv1 flows compete with 50 CUBIC, as shown in Figure 15(c). BBRv1 has obvious advantages in different buffers; in particular, when the buffer is less than 1 BDP, BBR occupies about 85% of the bandwidth, and the fairness index is only 0.62. When buffer size is larger than 1 BDP, the average throughput of BBR decreases slightly but still dominates, and the fairness index increases to about 0.94. In Figure 15(d), 50 BBRv2 flows compete with 50 CUBIC flows. Different from the case in Figure 15(b), the average throughput of BBRv2 is always dominant (in Figure 15(b), when the buffer is larger than 0.5 BDP, the average throughput of CUBIC is dominant). When the buffer size is 1 BDP to 2 BDP, the fairness index reaches above 0.98.
When the buffer is larger than 10 BDP, the fairness index decreases to 0.85. e fairness index first increases and then  decreases with the increase of buffer and finally stabilizes at about 0.84. In general, when competing with CUBIC on the same link, the average throughput and fairness index of BBRv2+ and BBRv2 are basically the same. When single flow competes, the fairness of BBRv2+ is much better than BBRv1 and slightly better than that of BBRv2. When multiple streams compete simultaneously, the fairness index of BBRv2 and BBRv2+ is higher than that of BBRv1 in shallow buffers, but the fairness index of BBRv2+ is 1% lower than that of BBRv2 in deep buffers. e experimental results show that BBRv2+ does not worsen the fairness between protocols.

Conclusions
is paper analyzes the causes of intraprotocol fairness and RTT fairness in the BBRv2 algorithm in detail. Based on BBRv2, BBRv2+ judges packet loss selectively according to congestion degree, avoiding blind CWND limitation. In addition, the BBRv2+ algorithm adds a flow-aware ECN (FAECN) marking scheme which marks packets from flows according to queue information. In this scheme, the highspeed flows are more likely to be marked and slowed down, while the low-speed flows can speed up and finish quickly, so as to make different flows converge to fairness. On the NS3 platform, a large number of simulation experiments are carried out on multiple flows with different start times or RTTs under different buffer sizes. e results show that the algorithm can effectively solve the intraprotocol and RTT fairness problems without sacrificing the performance of the BBRv2 algorithm.
As programmable switches become more pervasive, realizing the idea of FAECN in the programmable switch may be feasible, which is our important future work. In addition, there are several factors that can influence the performance of the BBRv2+ algorithm, such as Δ, K min , and P. We will further optimize these parameters and plan to use machine learning methods to predict network load so that K min or P can adjust adaptively.

Data Availability
No data were used to support this article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.