Norm minimized Scattering Data from Intensity Spectra

We apply the $l_1$ minimizing technique of compressive sensing (CS) to non-linear quadratic observations. For the example of coherent X-ray scattering we provide the formulae for a Kalman filter approach to quadratic CS and show how to reconstruct the scattering data from their spatial intensity distribution.

condition. The diffraction patterns are structure-specific and encode the information about the electron density of the sample and thus the stacking sequence of the atomic bilayers formally by Fourier transform. However, because of sensor physics the phase information is lost in CXDI measurement since the measured intensity pattern is the modulus square of the scattered X-rays and no inverse Fourier transform can be directly applied to recover the stacking sequence. The classical approach with the Patterson function [35] fails as the number of expected randomly distributed stacking faults is too high. Other inversion algorithms [20,24] could be used instead to reconstruct the lost phase information: Although dual space iterative algorithms [21] have been shown to converge under specific conditions [37] there still remain some convergence problems for a number of cases when not enough preliminary information regarding the structure of the object is previously known [15]. Indeed ptychography type of experiments [28,36] were suggested to determine at least the relative phases of the bilayers by interfering adjacent Bragg reflections. But this type of experiments are more difficult to realize, as it requires higher stability in comparison to conventional CXDI and the longer measurement time can, however, influence the sample [19].
Due to the principal loss in phase information the scattering data can be considered to be undersampled and algorithms tailored to this lack of information, like basis pursuit approaches [11] realized by minimizing the ℓ 1 norm of sparse vectors in an appropriate basis or phase retrieval via matrix completion [6,8,22], could be tested for reconstruction from conventional CXDI measurements: As this data are recorded without structured illumination for applying matrix completion we focus on utilizing the ℓ 1 minimizing technique to reconstruct undersampled sparse signals [7,16] as vectors from linear mappings between the signal and the observations. For an overview of this so called compressive sensing (CS) see the textbook [23]. The method of CS aims to a signal's sparse support which can be estimated by e.g. Kalman filtering [41]. As the underlying filter formulas also relate observations and signals to be reconstructed by linear mappings this technique meets CS and was also used to explicitly minimize the ℓ 1 norm by so called pseudo measurements [31]. For an example see the reconstruction from a random sample of Fourier coefficients [32]. Even for non-linear mappings of signals Kalman filtering applies by using Jacobians rather than constant sensing matrices. These extended Kalman filters (EKF) are used for e.g. tracking issues [30] or robotics [13] and match the sensing problem of the quadratic nonlinearities in the spatial intensity distribution of CXDI.
The article is organized as follows: In Section II we setup the observation model for modulus-squared amplitudes of intensity distributions in coherent X-ray scattering and point out the relation to the ℓ 1 minimization. In Section III we give a brief overview of the linear Kalman filter model and show how to incorporate the complex non-analytic ℓ 1 norm as a linearized observation to meet the minimizing strategy of CS. In this framework we prove a convergence concept for the ℓ 1 minimization in the reconstruction scheme. In Section IV we apply our findings to 1D and 2D Fourier data and remark in Section V on future considerations.

A. Intensity Spectra in Coherent Scattering Experiments
Exposing crystals to non-destructive coherent X-ray radiation in d dimensions the amplitude of the elastically scattered radiation is proportional to the Fourier transform [42] of the electron density b(r), where the vector q := k out − k in is a parametrization of the direction where radiation is detected and (multiplied by Planck's constant) also describes the momentum transfer in a kinematical scattering theory; k in is the incident direction whereas k out represents the outgoing direction for radiation with wavevectors ||k out || = ||k in || = 2π λ of wavelength λ: Following standard textbooks, e.g. [42], one deals with two sets of basis vectors to characterize the scattering. The spatial vectors are represented in the basis a 1 , . . . , a d of the grid whereas the reciprocal basis b 1 , . . . , b d is used for wave vectors q, k in , k out relying on the normalization As the units of lengths and wave vectors are carried by the basis sets the scalar products read with dimensionless factors y j and κ k Restricting to a finite grid with n 1 , n 2 . . . , n d lattice sites in each direction with periodic boundary conditions the possible grid positions and wave vectors allowing for a discrete Fourier transform 1 read for all j = 1, 2, . . . , d Thus the scalar product separates into the d spatial dimensions according to 2πk j r j n j , k j , r j ∈ 0, 1, . . . , n j − 1 (II.5) where the k j are related to the lattice positions in direction of the lattice constant a j and r j refer to a discretized wave vector in direction of the corresponding reciprocal basis b j . For example a d = 2 dimensional regular hexagonal lattice {a 1 , a 2 } encloses 120 • -angles: Of course for the application the formulas are only needed up to d = 3 dimensions: Nanowires can be characterized by the stacking sequence of atomic bilayers which are shifted laterally and vertically with respect to each other. In coherent X-ray diffraction summing up all scattered radiation from a certain bilayer perpendicular to the growth direction a 3 yields a complex scattering amplitude x k ∈ C associated with the kth bilayer spanned by a 1 , a 2 . Due to the hexagonal lattice and with respect to an arbitrary reference bilayer there are three different phase factors for the amplitudes left [3,42]. Wave vectors q sensitive to the arrangement of the bilayers are selected from directions related to the Bragg condition by κ 1 − κ 2 = 3N with N, κ 1 , κ 2 ∈ Z yielding 1, exp − 2πi 3 , exp 2πi 3 , which directly relates to the three possible lateral shifts.
With respect to the experimental setup [15] this is satisfied by κ 1 = 0 and κ 2 = 1. Recovering the stacking sequence of this relative phase factors by varying κ 3 directly reveals the Zinc-Blende or Wurtzite structure of the wire along with their stacking faults.
Carrying out the remaining summation over all equidistant bilayers in growth direction a 3 mathematically corresponds to the 1D discrete Fourier transform of the periodically continued complex scattering amplitudes x k combined to the vector x.
Each non-zero entry represents one bilayer and r refers to the discretized component κ 3 of the q vector in the reciprocal basis corresponding to the growth direction. As the outcome of the detector is the measured intensity rather than the amplitude the observation is the squared signal S r 2 = x||T r |x , r = 0, 1, . . . , n − 1 (II. 8) where the Fourier coefficients build a hermitian Toeplitz matrix pq ∈ C n×n , r = 0, 1, . . . , n − 1 (II. 9) showing orthogonality T r T s = nδ rs T r with respect to multiplication and a completeness property T 0 + T 1 + . . . + T n−1 = n½ n with respect to summation. Clearly, due to the squaring the signal (II.8) is invariant both under the transformation x → e iφ x with any global phase φ ∈ R and the reflection x k → x n−1−k , k = 0, . . . , n − 1 on the lattice. As the spatial directions exhibit periodic boundary conditions the signal also remains invariant under discrete translations x k → x k+q , q ∈ Z on the grid.
On the contrary, setting the q component κ 3 corresponding to the growth direction a 3 in the reciprocal basis to zero, (II.1) examines the 2D structure of all the M bilayers simultaneously. If they are assumed to be identical without lateral 2 shifts the scattering amplitudes of the bilayer's lattice sites are 2D real data sets X k 1 k 2 with the discrete Fourier for fixed r 1 = 0, 1, . . . , n 1 −1 and r 2 = 0, 1, . . . , n 2 −1 discretizing κ 1 and κ 2 . Squaring (II.12) yields Toeplitz matrices with Kronecker product structure (A.7) for the detected signal However, in the hexagonal lattice there are three possible lateral shifts yielding additional phase factors in (II.12) for each layer. So carrying out the complete Fourier transform of all bilayers yields a random sequence rather than the constant M in front of (II.12) and (II.13). In the general case the q vectors corresponding to the bilayers can be expressed by q = κ 1 b 1 + κ 2 b 2 . Then the phase factors (II.10) read and their q dependence can be suppressed by choosing wave vectors related to the Bragg condition by κ 1 − κ 2 = 3N with N, κ 1 , κ 2 ∈ Z. One could be misled that this selects a certain amount of data points on the grid spanned by {b 1 , b 2 } allowing for CS techniques. Switching to the discrete version this reduces to one corner of the 1st Brillouin zone, i.e. to the case of one single data point r 1 = r 2 = 0 which is in fact too little for CS.
with multiple row and column indices (p 1 p 2 ) and (q 1 q 2 ) respectively referring to the ordinary rows p 1 , p 2 and columns q 1 , q 2 of the Toeplitz matrices (II.9): The scattering amplitudes in Fourier domain can be assigned with arbitrary phases. As this leaves the given intensity distribution |S| 2 unchanged an related infinite set of broadened scattering data is formed by convolution. But if the signal |S| 2 is assumed to be oversampled the original scattering data in this set appear to be sparse and can be recovered by an ℓ 1 minimization run on e.g. (II.8) or (II.13) -up to translations, reflections or global phases.

A. Kalman Filter Equations
The equations for the Kalman filter usually apply to vectors over the field of real numbers Ê. We will examine next that the equations can also be extended to the field .
Anticipating this in the classical state space approach [33] the vectorial quantity x k is supposed to evolve according to the linear evolution with a fixed matrix G and a determined sequence of evolution matrices A k . In the model above the quantity x k is assumed to be only traceable indirectly by linear observations y k which can be viewed as linear mappings with given sensing matrices C k by The stochastic behaviour of all the considered quantities is modelled by zero-mean Gaussian distributions with positive definite matrices R and Q. These covariance matrices and related mean values can be calculated from normalized Gaussian integrals denoted by ex-pectation or mean values E{•}, The main idea behind Kalman filtering is to invert the observation model (III.2), where the estimation of the quantities x k from the measurements y k shows predictor-corrector structure [33]: The prediction step relates the estimates x − k+1 and x + k by extrapolation, whereas the correction step updates the estimate x − k+1 by relating it to the new measurement y k+1 : As the underlying equations utilize conditional mean values the update can be formulated in terms of covariances An alternative form of the covariance cycle in the correction step above reads in term of the explicit so called Kalman gain matrix K k+1 : The main problem in CS is to minimize ||x|| 1 subject to the constraint y = Cx for a fixed y and C. Because this linear structure matches the evolution equation (III.1a) of Kalman filtering in Subsection III A we want to follow [31] using a pseudo measurement to iteratively minimize the ℓ 1 norm: Here, the kth estimate of the state vector x can be associated by x k and the norm minimization is driven by the fixed constraint y augmented by the lowered norm γ ||x k−1 || 1 from the previous step by a factor of 0 < γ ≤ 1 as an additional observation.
The main issue of applying the Kalman filter to the ℓ 1 minimization procedure consists in suitably linearizing the non-analytic ℓ 1 norm: Solving for |z| in the vicinity of z 0 ∈ C yields for ϕ := arg zz 0 |z 0 | the expression Thus the ℓ 1 norm of any vector z ∈ C n can be linearized in the vicinity of z 0 ∈ C n utilizing its phase information p ∈ C n by Due to vectors over the field C it is necessary to distinguish row vectors p| from the usual column vectors |p := p related by hermitian conjugation. The notation is based on complex scalar products and matrix multiplication, cf. Appendix A.

C. Pseudo Measurements
Involving the observation y = Cx ∈ C m as a constraint like e.g. a Fourier transform the Kalman filter equations (III.4) and (III.5) read (for A = G = ½, u k = 0) for the estimates x k := x + k to the vector x involving a full prediction-correction step with the initial values P 0 ∈ C n×n , x 0 ∈ C n and the constant parameters C ∈ C m×n , R ∈ C (m+1)×n and Q ∈ C n×n to reconstruct x = lim k→∞ x k from a given y as a limiting value. The minimization of the ℓ 1 norm is incorporated into the (pseudo) measurements by where the factor 0 < γ k ≤ 1 adaptively lowers the current ℓ 1 norm solving for a corresponding estimate x k+1 in the next iteration step. For a solution to the linear constraint C ∈ C m×n as initial data can serve (III.12) By inserting (III.10a) into (III.10b) the enhancement of the estimate x k+1 compared to x k reads with respect to the measurement y k+1 ∈ C m+1 of a lowered ℓ 1 norm Using the covariance cycle (III.10a) again the enhancement matrix composed from positive definite matrices R, P and Q on the RHS can be recast into the more suitable 3 form yielding the inequality by components. The key ingredients are the restricted positive eigenvalues of (III.15) bounded from above by the spectral norm ||½ m+1 || 2 = 1, cf. (B.1). What remains to prove for the inequality in (III.15) is the combination C k (P + Q)C H k to be positive semi definite: In the overdetermined case m ≥ n this holds true for C H k C k to be positive definite: As the combination C k (P + Q)C H k ∈ C (m+1)×(m+1) has only rank n the amount of m + 1 − n of its eigenvalues are zero. The positive sign of the remaining eigenvalues can be obtained from a Cholesky factorization of P + Q = F F H with a lower triangular matrix F ∈ C n×n Due to a singular value decomposition [25] of C k F the remaining eigenvalues belong to (C k F ) H (C k F ) = F H (C H k C k )F ∈ C n×n which, however, is along with the prerequisite C H k C k > 0 positive definite because of Theorem [25]. If B ∈ C n×n is positive definite and X ∈ C n×k has rank k, then the matrix Y = X H BX ∈ C k×k is also positive definite.
By virtue of the same theorem the combination C k (P + Q)C H k ∈ C (m+1)×(m+1) is always positive definite in the underdetermined case m < n, which is independent from the sensing matrix C k .
3 a completely factorized version of (III.15) reads for m < n So in each iteration step with adaptive 0 < γ k ≤ 1 the linearized ℓ 1 norm is lowered (or remains at least constant) subject to the approximated constraint y = C k x k .
In the real case (III.8) even holds with '=' yielding for j = m + 1 the relation To show a general convergence let us start with the exact solution x ∞ = x and C ∞ related to it. Because of A = G = ½ with zero shifts u k = 0 in the Kalman filter model (III.1) the constant (pseudo) measurement y ∞ already implies with γ k ≡ 1 a linear convergence of the series for all j = 1, . . . , m + 1 starting with an x 0 in the vicinity of x ∞ . Thus the series . . is said to be weakly convergent and reconstructs the original signal as long as the requirement for sparsity in the CS approach are met. In the framework of weak convergence the limiting value of Cov{x k } read for large k ≫ 1 where '=' represents the special case Q ≡ 0. The limit Q → ∞ is only possible for m ≥ n yielding a constant covariance P k = (C H ∞ R −1 C ∞ ) −1 in each iteration.
For an example of linear compressive sensing in the framework of Kalman filtering see [32] using random samples of Fourier coefficients. Applying the method to a quadratic non-linear observation model is shown for coherent X-ray scattering in the next Section IV.

IV. LINEARIZED OBSERVATION MODEL
Like the ℓ 1 norm in subsection III B we linearize the squared observations (II.8) to apply the Kalman filtering scheme. Differentiating with respect to x k and x k respectively yields and Taylor expansion around z 0 ∈ C n reads up to 1st order S r 2 = 2 Re z 0 ||T r |x − z 0 ||T r |z 0 + . . . , r = 0, 1, . . . , n − 1 .

(IV.2)
With the biases s r = z 0 ||T r |z 0 , r = 0, 1, . . . , n − 1 from the linearization the observation where the real part over all Cx combinations have to be taken. The model in detail reads with the complex sensing matrix C k+1 ∈ C (n+1)×n , biases s k+1 ∈ C n+1 and the given real (pseudo) measurements y k+1 ∈ R n+1 according to A properly chosen factor 0 < γ k ≤ 1 adaptively lowers the ℓ 1 norm in each iteration step k to reconstruct the signal x = lim k→∞ x k as a limiting value from its given squared measurements y| = S 0 2 , . . . , S n−1 2 . Note that {T 0 |x k , T 1 |x k , . . . , T n−1 |x k } ⊂ C n form an orthogonal basis. Thus is As initial data we use the moduli and phases from x sparse symmetrically broadened by a leakage 117 lattice sites and 6 equidistant sparse amplitudes |x j | = 1 with phases from the set As in the original objective the number of bilayers is roughly known we use as initial values x 0 ∈ C n a broadened modulus and phase distribution related to the setting x sparse , cf. Figure 1: With the support S := supp x sparse a leakage for the moduli and phases of x sparse is with shifts 0 ≤ s < 1 a suitable way to also consider the limit s → 0 of exactly known positions. Note that for the phases we only used one fixed value for each occupied lattice site broadened by the leakage. So the main effort is the recovery of the relative phases from the measurements. To drive the ℓ 1 minimization the corresponding entry (here 10 −6 ) in the R matrix has to be much lower in magnitude than the ones (here 10 −4 ) related to the observations of the constraint. When reconstructed amplitudes are said to vanish (e.g. in Figure 2) this means only up to numerical precision where the zero value is dominated by the entries of Q. Empirically, the exponential decay in the lowering factor γ k is chosen such that its values approaches about 0.99, . . . , 0.999 after the maximum number of iterations. A resulting typical smooth convergence of the ℓ 1 norm due to the exponential decay is shown in Figure 4: The constant tail can be used to determine a maximal number of iterations as a stopping criterion. Note that the reconstructed results beyond this guessed number are insensitive to further iterations meaning for the equations (IV.5) to perform fixed point iterations within accuracies related to the covariances Q and P k .
In the nanowire the equidistant bilayers are at fixed positions in growth direction a 3 .
Thus the reconstruction of their complex scattering amplitudes does not suffer from 'off grid' problems in general, see e.g. [12,18,39], if integer multiples of the lattice constants are used for the DFT.
The 2D reconstruction from (II.13) can be calculated with the same algorithm already used for Example 1 above, as the 2D data set of size 12 × 25 can be mapped to a 1D vector by means of (II.14), cf. Figure 5. Like the example from Figure 1, we use as initial values with the same reasoning and broadening parameters the given distribution. Within 2000 iterations and an exponentially decaying lowering factor γ k = 1 − 0.17 · exp(−0.0028k) the reconstruction for noiseless measurements is shown in Figure 6 with empirically good algorithm's covariances P 0 = 0.3 · ½ 300 , Q = 10 −7 · ½ 300 , R = diag(10 −4 , . . . , 10 −4 , 10 −6 ) ∈ R 301×301 . (IV.10) Note that there is a combined translational and reflectional symmetry mapping the vectorized real scattering amplitude distribution onto itself, cf.  ther considerations to also account for e.g. lattice distortions or the finite coherence of the primary beam. As an outlook we presented a 2D pattern reconstruction which, hopefully, can help to investigate if the cross sections of the wires were grown regularly.
Because of the Jacobians building up the sensing matrix C the convergence of the underlying ℓ 1 norm minimization does depend on the reconstructed vector x and may go beyond the resolution issues [39] for even constant sensing matrices. For this reason a thorough analysis on the relation between maximal sparsity (for the examples here about 10% of the available lattice sites) and the Toeplitz matrix (II.9) representing the sensing process for a successful CS is needed. As mentioned in the introduction algorithms in general minimizing the ℓ 1 norm could be of interest retrieving information out from intensity spectra: To compare our results we applied the non-linear version [40] of the primal dual algorithm [9] to the 1D sensing problem above yielding similar results with respect to iteration num- As the reconstruction from quadratic constraints seems to depend only little on the explicit algorithm used for the ℓ 1 minimization, it would be interesting to figure out to which extend the linear matrix completion approach [6,8,22] with respect to the nuclear norm can still be applied in face of seemingly too few numbers of independent observations. Vectors over the field C consist of complex numbers z = x + iy with x, y ∈ R and i 2 = −1.
With a vector |v ∈ C n we usually associate n complex numbers v 1 , v 2 , . . . , v n arranged as a column. With the complex conjugate z := x − iy row vectors v| ∈ C n are considered to be dual to |v by Using such a row decomposition the matrix multiplication of commensurable A ∈ C m×n and B ∈ C n×p can be viewed as mp single scalar products yielding with B = |b 1 , . . . , |b p AB = C = (c ik ) ik ∈ C m×p , c ik = n ℓ=1 a iℓ b ℓk = a i |b k (A.5) including products A|x ∈ C m for B = |x ∈ C n . Thus with the identity ½ n = (δ ij ) ij ∈ C n×n unitary matrices Q = |q 1 , . . . , |q n ∈ C n×n consist of orthonormal (row and) columns defined by the property QQ H = Q H Q = ½ n , Q H Q jk = q j |q k = δ jk , Q H = Q −1 . (A.6) The possibility of multiplying matrices A = (a ij ) ij ∈ C m×n and B = (b rs ) rs ∈ C p×q of arbitrary dimensions is covered by the Kronecker product Especially ℓ 0 is no norm but can be viewed as the number of non-zero entries. The corresponding matrix norms of A ∈ C m×n can be related to the vector norms according to