Multimedia Archives: New Digital Filters to Correct Equalization Errors on Digitized Audio Tapes

,


Introduction
e transition required for the information age brings with it the need to transfer preexisting (analogue) multimedia materials into a digital form in order to withstand the wear and tear of time and the progression of technology, such as search and recovery functions through increasingly powerful digital tools. Archiving has become an increasingly important goal both in terms of historical documentation and also for ease of location and availability. e implications of these needs are particularly complex when it comes to historical music recordings. In this context, research on the preservation and restoration of sound documents has been developed in the information engineering area and, in particular, in the multimedia field, augmenting the innovations introduced for storage and retrieval technologies [1]. ese developments have additional implications for the definition of digitization protocols to help ensure maintenance and longevity.
is paper presents the problem of equalization in the active preservation process of audio documents. If the goal of the active preservation and re-recording process is to pursue historical faithfulness, the audio signal must be precisely filtered to take into consideration recording equalization that is part of the original source audio document [2]. Choosing the correct equalization curve is essential to avoid the proliferation of additional, incorrect versions of the audio documents (referred to in philology as a "false witnesses" [3]). e choice is usually made on the basis of both historical information (which is rarely complete and exhaustive) and the experience of the technicians [4], introducing a certain margin of interpretation. We therefore present tools to compensate for errors (in choosing the equalization curve) introduced by the re-recording technicians. In this way, if an archivist or musicologist notices that a preservation master has been produced using the wrong equalization curve, it can be changed without having to recover the original analogue audio document (which may have deteriorated in the meantime).
In Section 2, we present an overview of analogue equalization, illustrating the problems concerned with the user choice. Next, we focus on a case study of the analogue audio tape and explain two equalization standards from a mathematical point of view. In Section 3, these equalizations will be transformed into the digital domain, and in Section 4, we report an experiment assessing the perception of these equalization methods. Based on the results, we propose that a digital correction filter provides a reliable means to compensate for errors made in the digitization process.

e "Equalization Problem".
e term "equalization" can be used to indicate any procedure that involves altering or adjusting of the overall frequency spectrum characteristics of the audio signal. e concept of filtering audio frequencies dates back at least to the 1870s. It was first applied in harmonic telegraphs and then later adopted in analogue audio recordings [5]. In analogue audio recordings, a preemphasis curve is applied to the signal which is contained in the analogue carrier, and an inverse postemphasis curve is applied during the reproducing phase.
us, the resulting output signal maintains nearly the flat frequency response of the original input [6], but at the same time, it is characterized by an extension of the dynamic range [7] and an improvement of the SNR [8]. is technique is adopted from several analogue audio technologies due to the limited dynamic range of audio systems [7].
Historically, the adoption of these techniques was not uniform, and several different standards were applied by record manufacturers. To faithfully reproduce recordings, it is necessary to tackle what is referred to as the "equalization problem [9]." is problem specifically arises when analyzing magnetic tape technology. Several standards exist [4], and during playback and digitization, this must be considered to help obtain an "authentic" listening experience, that is, postemphasis filtering (equalization) that corresponds to that of the machines upon which the playback was originally intended. e differences between the equalization curves are subtle, and during the digitization process, it may be difficult to determine the "correct" one, and without reliable documentation or test tones, operators involved in the digitization process are forced to choose the equalization aurally [4,9], which may lead to errors. erefore, there is a possibility that an "incorrect" equalization will be selected in the process of digitizing audio tapes. ese issues could be resolved through innovative automatic analysis tools, as recently presented in [10], or through an accurate historical investigation of the recording studio, aiming to individuate the original equipment and the relative setup used at the time [11]. e musicological study of sound recording is often performed directly on the digital copy. If, at this stage, the musicologist has doubts about the type of equalization used during the analogue-digital transfer, it is beneficial to provide her/him with corrective tools, which enable comparisons between the existing, possibly inauthentic versions and corrected versions. It is not feasible to redigitize audio tapes with the correct equalization on a large scale due to excessive economic cost of the operation. Furthermore, a number of these heritage items may now be unreadable due to physical degradation [2], making the matter of corrective equalization an urgent one. e solution proposed in this paper is to create a set of precise digital filters to subtract the "incorrect" equalization curve applied in the digitization process and to add a corrective measure. It is important to specify that these filters must only be used to alter access copies or with access tools such as those presented in [12,13] that filter the signal without performing irreversible changes to the file.
at is, they must not alter the preservation copy for any reason.

Case Study.
Equalization standards are usually referred to with the acronyms of the organization that proposed the standard itself. Historically, different standards were most widespread in Europe and the United States. e most prevalent European standard was IEC1 from the International Electrotechnical Commission, alternatively called CCIR by the acronym of the Comité Consultatif International pour la Radio. In the United States, the most prevalent standard was IEC2, also referred to as NAB from the American National Association of Broadcasters. We henceforth refer to these as CCIR and NAB, being the two standards that this paper focuses on. e equalization standards are strictly connected to another parameter that must be correctly configured before the equalization setting: the playback speed. ere are 6 standard speeds, but the most common are 15 ips (38.1 cm/s) and 7.5 ips (19.05 cm/s) [4]. e latter speed will be used in our work. As can be seen in [14], digitization problems derived by different speed (and therefore equalization) standards in the same open-reel tape are quite widespread. Nevertheless, in this preliminary study, the authors decided to not involve a second variable. Further study will be necessary for correcting both speed and equalization errors. e first step of this work consists of the analysis of the pre-and postemphasis curves for any standard. A postemphasis curve could be expressed as a combination of two curves described with the following formula: where f is the frequency in Hz and t 1 and t 2 are the time constants in microseconds [15]. An alternative mathematical representation of the formula is 2 Advances in Multimedia where ω � 2πf, and f is the frequency [16]. e two time constants describe the equalization curve, but in some cases, t 1 is ∞. For the 7.5 ips audio tape recording, t 1 and t 2 are, respectively, ∞ and 70 μs for CCIR but 3180 μs and 50 μs for NAB (see Table 1). e characteristics of these equalization standards will be analyzed in this paper. Figures 1(a) and 1(b) present the frequency response of pre-and postemphasis curves, respectively, for NAB and CCIR equalization. An incorrect juxtaposition of the pre-and postemphasis significantly alters the spectrum and therefore requires compensation to avoid the loss of accuracy for digitized audio documents. Starting from these analytic formulas, the paper will describe how to create digital filters of the pre-and postemphasis curves to digitally compensate equalization errors in digitized 7.5 ips audio tape recordings.

Signals and a Chain of Filters.
Given an analogue signal x ∈ L 2 (R), it passes through two steps before digitization: a recording phase and a reproducing phase. An equalization for each step is defined, followed by the convolutions of the signals with the impulse responses of the recording and reproducing filters w ∈ L 2 (R) and r ∈ L 2 (R), denoted, respectively, with W(x): � x * w and R(x): � x * r. e resulting filter is defined as E(x): � R°W(x) � (x * w) * r.
Considering the transfer functions of our filters, in this context, a correct equalization E: L 2 (R) ⟶ L 2 (R) of a signal has to be a flat equalization, which means that its transfer function is the identity operator, i.e., E � id, where E: C ⟶ C is the transfer function of E. Denoting, respectively, with W and R: C ⟶ C the transfer functions of the recording and reproducing filters W and R, we should have R � W − 1 . In this project, however, we are dealing with a nonflat equalization E � R°W, where its transfer function E � W · R ≠ id since the reproducing curve R is wrongly set. It is necessary to apply a filter F: Taking advantage of the structure of E, it is possible to express the desired transfer function F as is last equality is a solution in terms of the transfer functions obtained from the standard NAB and CCIR equalizations defined in [15,16].

Standard NAB and CCIR Transfer Functions.
From the standard references, the reproducing characteristic curves are given by magnitude function (2). By definition, (2) is derived from the transfer function of the reproducing analogue filter. Since the standards consider only first-order low-pass and high-pass filters, it is possible to show that where s ∈ C is the transfer function needed. Computing the squared norm of (4) on an imaginary line iω, where ω ∈ R, the result is which is the squared argument in (2). e transfer function W is a rational complex polynomial given by the inverse of R: where s ∈ C.
Since, in our case, we can infer that the parameters of R are incorrect, equation (3) becomes where s ∈ C, t 1 , t 2 are the parameters of the recording transfer function W, and t 3 , t 4 are the parameters of the wrongly reproduced transfer function R.

Filter
Stability. Now that the general structure of the corrective filters has been described, it is necessary to verify that, with all possible combinations of the four parameters t 1 , t 2 , t 3 , and t 4 , stable filters are obtained. From the reference tables for standard NAB and CCIR equalizations [15,16], coefficients t 1 and t 3 can assume finite values or can be ∞. is means that, considering (7) as a function with parameters t 1 and t 3 , there are four cases: Also, all these filters except the last are stable as they have poles when s � − (1/t 1 ) or s � − (1/t 4 ), which are both strictly negative. e fourth case gives an unstable filter with the pole in s � 0.
Clearly, the real case which corresponds to the unstable filter is relevant in applications as it is the inverse of the chain , where W and R are, respectively, the transfer functions of CCIR and NAB equalizations (see Table 2 as a summary of all cases). We need to approximate the unstable filter with a stable one, which is sufficiently "close" (clarified in the following section) to the first, to produce a similar equalization. Formally, we digitize this filter via bilinear transform and Advances in Multimedia digitally approximate it by solving a minimum least square problem, as explained in the following.

Digital Approximation of the Unstable Filter.
After the digitization of the transfer function F(s), the MATLAB function "freqz" was used to study the behavior of the unstable filter. Examining its output, it was observed that the frequency vector reaches ∞ in its first cell, near 0 Hz as expected. Using a pragmatic approach, this value has been overridden with 0. Since this is anextreme modification of the frequency response vector, it has been studied if it is possible to find an approximated stable transfer function starting from the modified frequency response vector such that its frequency response is close to the analogue transfer function, at the very least in audible frequencies.
e transfer function we are dealing with is a rational function H: C ⟶ C of the form where z ∈ C and B, A: C ⟶ C are complex polynomials of finite degrees m, n ∈ N, with coefficient vectors b ∈ C m+1 and a ∈ C n+1 , respectively. Given a vector of frequency points f ∈ R l , where l ∈ N, and the corresponding frequency response vector h ∈ C l , we define the following minimization problem: A solution to this problem, given via an algorithm based on a damped Gauss-Newton iterative search methoddescribed in [17] and implemented in the MATLAB function "invfreqz", is the coefficient vectors of the stable rational transfer function approximating the unstable filter we found in the previous section. e inputs of this function are the frequency vector f, the frequency response vector h (modified as described at the beginning of this section), the polynomial degrees m and n of the numerator and denominator of the solution, and the number of iterations "iter." We set m and n equal to 2 in order to maintain the same general structure of this kind of filter, and we set "iter" to 10 since at this point, the approximation converges.
In Figure 2(a), the "bilinear approximated" curve is obtained by using the MATLAB function "freqz" applied to the approximated stable transfer function, i.e., the output of "invfreqz." Figures 2(b) and 2(c) quantify the approximation in further detail, specifically focusing on low and high frequencies, respectively. At low frequencies, it is noticeable that the resolution of the "bilinear" curve is poor. However, this study primarily aims to characterize the feasibility of our earlier noted pragmatic approach: subsequent studies for improving approximation could be done in the future. e described approximation method satisfies the stability requirement and produces a transfer function with frequency response close to the original. Given the nature of approximations, future investigations could lead to different solutions; for example, it is possible to modify the analogue transfer function by adding a pole centered at a very low frequency.

Assessment of Perception
We conducted an experiment with the aim of assessing the perception of similarity for various equalization curves applied to the same stimulus. We adopted an approach inspired by the MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) test, a well-established method for evaluating the quality of several versions of an audio stimulus [18,19]. Our MUSHRA-inspired assessment aimed to investigate whether or not musically trained participants were able to distinguish a stimulus recorded in the magnetic tape that is digitized with a correct equalization standard (Reference) from (a) the same stimulus that is digitized with an intentionally incorrect equalization standard (Foil) and (b) the incorrect stimulus that has been subsequently corrected with the digital filters proposed in the previous section. For (b), two separate correction versions are proposed. In the first version, the incorrect stimulus was directly corrected with a MATLAB script, whereas in the second version, an ad hoc web interface adopting the Web Audio API was used to correct the stimulus in order to simulate the use of the filters in web tools for accessing historical audio documents, such as in [13] (see more details in Section 4.3).

Materials.
e experiment contained 8 audio stimuli, listed in Table 3. As will be detailed in Procedures, 6 stimuli were used for assessment, and 2 stimuli were used as training. Each stimulus was a 10-second excerpt of an electroacoustic composition, chosen from important repertoire of the genre. e stimuli were selected to produce a heterogeneous set from a spectral standpoint, with each exhibiting a wide range of frequency combinations and textures. Additionally, half of the stimuli were produced with a NAB preemphasis curve and the other half with a CCIR preemphasis curve.
For each stimulus, there were 6 different equalizations (henceforth filters) provided. See Procedures for production details of each filter. e 6 filters were as follows: (1 )"Reference": the correctly produced equalization standard. (2) "Hidden reference": an exact copy of the "Reference" audio, used as an accuracy check. (3) "Anchor": the Reference processed with a low-pass filter. is was easily discernible from the other filters and so was used as a second accuracy check. (4) "Foil": an intentionally incorrect equalization, created by mismatching the recording and reproducing curves. (5) "MATLAB correction": a subsequent correction of the Foil audio using a MATLAB script. (6) "Web Audio API correction": a subsequent correction of the Foil audio using an ad hoc web interface.

Participants.
Twenty-three participants were recruited from an undergraduate music course in Australia. irteen participants (57%) were male, and 10 (43%) were female. Participants ranged in age from 18 to 33 years (M � 19.7; SD � 3.7) and were asked how many years they had received music training (range: 1-20, M � 11.1, and SD � 3.7; all but 1 participant reported 7 or more years of music training). Prospective participants all agreed to participate and completed a written consent form. e study received ethics approval (UNSW Human Ethics Approval HC13015).

Procedures.
Participants were tested in groups ≤ 5. Testing was conducted on MacBook Pro laptops (13 inches, mid-2010) with Sennheiser HD280 Pro headphones. e web interface of the test was created by using BeaqleJS, a framework based on HTML5 and JavaScript [20], and the browser Google Chrome was used for all tests. e loudness was set consistently on each laptop, and the loudness toggle keys were locked for each computer. e loudness level was inspected by the research team by measuring the sound pressure level for a pink noise sound file using a Testo 815 meter. Measurements were taken with the following setting: slow time weighting, "A" frequency weighting, a measurement range of 50 to 100 dB, and a "maximum" hold function. Ten measurements were made on each laptop, switching to a second pair of headphones after the fifth Advances in Multimedia 5 measurement. Measurements ranged from 78.2 dB to 81.2 dB across all laptops, with M � 80.5 and SD � 0.9. e experiment consisted of 8 different tests, with each concerning one of the 8 stimuli in Table 3. Each test was presented on a single screen of the interface (see Figure 3) and contained all 6 filters for the examined stimulus-Reference, Hidden reference, Anchor, Foil filter (CN foil or NC foil), MATLAB correction, and Web Audio API correction. As per the MUSHRA protocol [19], the Reference filter was always the first filter presented and was clearly labeled, whereas the remaining filters were randomized and unlabeled.
Participants were able to replay each audio file as often as they wished and in any order and were tasked with evaluating the similarity of each of the unlabeled filters in comparison to the Reference. Responses were recorded on 11-point rating scales (0-10) corresponding to "different"; "somewhat different"; "slightly different"; "nearly identical"; and "identical" (see Figure 3). Furthermore, in line with the MUSHRA guidelines [19], participants received a "training  To create the 6 filters, high-quality digital samples from a computer were recorded onto a new tape using the professional Studer A810 with a recording speed of 7.5 ips and the CCIR preemphasis curve for the first four stimuli and NAB for the second four. After this stage, the Reference filter was obtained for each stimulus through the digitization of the recorded samples with the correct juxtaposition of the inverse analogue filter used during the recording. A second version of the Reference (Hidden Reference) was also included in the test phase. Starting from the Reference, the Anchor filter was obtained for each stimulus by processing the Reference with a low-pass filter measuring − 3 dB at 3.5 kHz as defined by the MUSHRA standard [19]. Next, the signals that were recorded onto the tape for the creation of the Reference were digitized a second time, using an uncorrected inverse analogue filter, i.e., CCIR as the preemphasis curve and NAB as the postemphasis curve (CN foil), and vice versa (NC foil). e Foil digitization is used to simulate the real-life situation where the incorrect postemphasis curves are selected.
Finally, each Foil filter was compensated, from the spectral point of view, with the two correction filters described in Section 4.1: a MATLAB script and an ad hoc web app based on the Web Audio API. e version obtained by the MATLAB correction implements a high-resolution offline processing of the signal, whereas the version obtained with the web app performs a real-time processing of the signal with ConvolverNode [21]. is node convolves the audio signal with an impulse response of the filters, and the resulting signal was recorded using professional equipment and normalized.

Preliminary Analysis.
As per the MUSHRA guidelines [19], the Hidden reference and the Anchor filter each acted as a reliability test. Any participants that rated the Hidden reference ≤ 7 (constituting the rating of "nearly identical" or lower) were removed from the entire data sample. Similarly,   Advances in Multimedia any participants who rated the Anchor ≥ 3 (constituting a rating of "somewhat different" or higher) were removed from the data sample. is produced a subsample of n � 14 reliable participants. Before analysis of results, we listened to each filter for all stimuli.
is listening examination suggested that, in all cases, the Web Audio API correction was accompanied by an unintentional, perceivable equalization effect. To further investigate this, long-term average spectrum (LTAS) plots on each filter for all stimuli were calculated. e LTAS plot of the Web Audio API correction filter for each stimulus was visually different to that of the Reference filter plot, whereas the MATLAB correction filter plot was not. erefore, we identified a production error in the method used to create the Web Audio API correction filter and so exclude this filter from all subsequent analyses (although for interest, we retain descriptive statistics for the Web Audio API filter in Table 4).

Results and Discussion
. Two separate two-way withinsubject ANOVAs were performed, with the first examining the three NAB stimuli and the second examining the three CCIR stimuli. e ANOVAs were used to investigate any differences in similarity (dependent variable), with filter and stimulus each used as within-subject independent variables. Descriptive statistics for each stimulus and filter are reported in Table 4 and are plotted in Figure 4. Anchor filters were not included in ANOVA analyses because of the following: (1) this investigation is concerned with interactions between the Reference, Foil, and correction filters, whereas Anchor filters are designed to examine reliability (see previous section); (2) the inclusion of these data, which are consistently rated lower in similarity in comparison to the remaining filters, would likely violate the assumption of normality [22]. Regardless, we included descriptive statistics for the Anchor filters in Table 4 and performed separate paired sample ttests between ratings for the Hidden reference and the Anchor filters (included in Table 5). As similarity ratings for this filter were consistently lower than those for the MATLAB and Foil filters, these data confirm that participants were able to perceive the effects of the production error. e two ANOVAs each produced a significant main effect of the filter: NAB (F(2, 26) � 10.54, p < 0.001 , η 2 � 0.448) and CCIR (F(2, 26) � 5.47, p � 0.010, η 2 � 0.296).
ere were no significant interactions between the independent variables for either of the ANOVAs. We performed two types of post hoc analysis. First, for each stimulus, we examined differences in similarity ratings between the Hidden reference and either the MATLAB or Foil filter; these results are reported in Table 6. However, due to the small sample size (n � 14), this may not produce sufficient statistical power for meaningful analysis [23]. erefore, we also compared similarity ratings for the Hidden reference filter with the MATLAB and Foil filters, collapsed either across the 3 NAB stimuli or across the 3 CCIR stimuli (see Table 5).
It is evident from the data reported in Tables 4 and 5 that participants were able to distinguish between the Anchor filters and all remaining filters, regardless of the stimulus examined. Post hoc results reported in Table 6 (approach 1) suggest that the MATLAB correction was not perceivable from the Hidden reference for 5 of the 6 stimuli (all except Artikulation). In contrast, the Foil filters were rated statistically lower in similarity than the Hidden reference filter for 3 of the 6 stimuli and approached significance (p � 0.088) for 2 of the remaining 3 stimuli. For the stimulus Visage, similarity ratings for the MATLAB and Foil filters occur in close proximity to each other. As this anomalous result occurs only for this stimulus, we suggest that it may be the result of complex music textures within this composition, which could have created difficulties in differentiating between the filters. For approach 2, where the data were collapsed prior to post hoc testing (either as NAB or CCIR;  see Table 5), no significant differences were observed between the Hidden reference and MATLAB filters for either NAB or CCIR stimuli. As significant differences were observed between the Hidden reference and Foil filters for both stimulus types (NAB and CCIR), we conclude that overall, the MATLAB correction appears to be a successful method for compensating existing digitization errors. Furthermore, it is important to note that the Hidden reference was consistently rated lower than the maximum similarity level of 10, despite the audio file being identical to the original Reference file.
is result suggests the presence of a rating bias [24] in which participants appear hesitant to use the extreme ends of the rating scale.

Conclusion
is paper investigated the equalization problem for the active preservation process of audio tape recordings. Proper selection of equalization in the digitization process is essential in preserving the historical authenticity of an audio work, although the differences between the original ("correct") and arbitrary equalizations may be subtle to an untrained listener. We investigated tools to compensate equalization errors introduced in the re-recording process. With these tools, an archivist or musicologist who notices an error in the preservation master (through listening or with automatic tools [3,25]) can make a correction and so provide an authentic listening experience without having to recover the original analogue audio document or perform redigitization. A MUSHRA-inspired test was conducted on six electroacoustic stimuli to investigate perceivable differences between (a) correctly digitized "Reference" versions, (b) two intentionally incorrect "Foil" versions (in terms of the digitization process), (c) easily distinguishable 3.5 kHz "Anchor" filters, and (d) subsequent digital correction filters of the Foil equalizations. Two digital filters were initially presented to compensate equalization errors in the case of 7.5 ips recordings both with NAB and CCIR standards, although one of these correction filters (the Web Audio API correction filter) contained a production error and so had to be removed from analyses. erefore, the present study was only able to evaluate the validity of the MATLAB correction filter.
Similarity ratings were examined with two ANOVAs, and two distinct post hoc approaches were taken. When data were collapsed either as NAB or CCIR stimuli (prior to post hoc testing), participants were not able to distinguish between the Hidden reference filter and the MATLAB correction filter. In comparison, both types of Foil filters and the Anchor filter produced significantly lower ratings of similarity than the Hidden reference. As such, we conclude that the MATLAB correction filter is a promising method to aid in the preservation of analogue works.
Five design issues were identified. First, future studies should include a larger sample size and aim to incorporate historically informed expert listeners who are highly familiar with and knowledgeable about electroacoustic music. Such an inclusion should increase reliability in comparison to the undergraduate music students who were used in the present study. Second, comparisons with additional correction filters (as was originally intended in this design) would allow further clarification on the accuracy of the MATLAB  Advances in Multimedia correction. ird, one of the stimuli in this study (Visage) produced an anomalous result in which ratings for the MATLAB and Foil filters occurred within very close proximity to each other. We suggest that this may be a result of complex music textures within the composition, which could produce difficulty in differentiating between filters. is result highlights the need for future designs to place great care on stimulus selection. Fourth, the results in this study suggest the presence of a rating scale bias in which participants are hesitant to use the extreme ends of the rating scale. Additional rating biases may also be present, such as the range equalizing bias [24]. Specifically, the differences between the clearly discernible Anchor filter compared to the remaining, less-discernible filters might produce a response in which differences between the lessdifferent filters become comparatively difficult to perceive.
us, we recommend that future studies adopt a betweensubjects design that investigates the impact of Anchor filters on ratings of the remaining filters, such as through a "MUSHRA versus MUSHR" test (with the latter containing no Anchor filters). Finally, while it is beyond the scope of the present paper, future studies could expand this research area by examining additional corrective equalization methods for other equalization standards (that is, other than NAB and CCIR) and at playback speeds other than 7.5 ips. However, in such a case, numerous factors must be considered, such as the changes in curves between equalization standards at various playback speeds, as well as the effect on the frequency response of the filters derived from the change of speed. e preservation and ongoing authentic use of historical audio documents hinges on the application of multimedia information processing tools, with particular attention on the parameters that were used at the time of the recording, as well as their metadata. e tools presented in this paper are aimed to produce a complete and historically informed use of historical audio (words, sound effects, and music) for multimedia archives.
Data Availability e ethics approval for this research does not allow for the release of the experimental data used to support these findings, even if anonymized.

Conflicts of Interest
e authors declare no conflicts of interest related to this work.