A Process for Modelling Diffuse Scattering from Disordered Molecular Crystals, Illustrated by Application to Monoclinic 9-Chloro-10-methylanthracene

Diffuse scattering from a crystal contains valuable information about the two-body correlations (related to the nanoscale order) in the material. Despite years of development, the detailed analysis of single crystal diffuse scattering (SCDS) has yet to become part of the everyday toolbox of the structural scientist. Recent decades have seen the pair distribution function approach to diffuse scattering (in fact, total scattering) from powders become a relatively routine tool. However, analysing the detailed, complex, and often highly anisotropic three-dimensional distribution of SCDS remains valuable yet rare because there is no routine method for undertaking the analysis. At present, analysis requires significant investment of time to develop specialist expertise, which means that the analysis of diffuse scattering, which has much to offer, is not incorporated thorough studies of many compounds even though it has the potential to be a very useful adjunct to existing techniques. This article endeavours to outline in some detail how the diffuse scattering from a molecular crystal can be modelled relatively quickly and largely using existing software tools. It is hoped this will provide a template for other studies. To enable this, the entire simulation is included as deposited material.


Introduction
In recent years, much work has been done in making the analysis of single crystal diffuse scattering (SCDS) more routine [1][2][3][4][5][6].While the technique can provide information about many subtle and important structural effects [7], there is no straightforward approach to analysing the data.Analysis of diffuse scattering from powders can use the pair distribution function (PDF) approach [8,9].However, SCDS can take on so many forms and result from so many quite different structural phenomena that a general, unified software application to allow its modelling is very demanding.
SCDS arises from short-range order (SRO) in the crystal.The existence of SRO means that the conventional description of a crystal structure-the unit cell-becomes inadequate, as the cell describes the single body average structure where the SCDS is sensitive to the two-body average, which is not described by a single unit cell.The atomic coordinates of the atoms in the asymmetric unit, and their associated atomic displacement parameters, can no longer be used as parameters of the fitting.To capture the SRO, we must capture the correlations amongst the positions and occupancies of the atoms, which means the model must be large enough to contain a valid statistical sample of the atomic (or molecular) configurations.This typically requires many cells (of the order of tens of thousands) which in turn means there are too many atomic coordinates to fit directly.
There are a range of ways of overcoming this.One is a generalisation of the PDF approach [3][4][5].Another relies on a "Reverse Monte Carlo" (RMC) approach, which does use the atomic coordinates, but whose results can be difficult to interpret if not highly constrained [8,10,11].A third approach parameterises the interactions that determine the atomic coordinates and refines those.Since the interactions are the same from cell to cell, this reduces the number of free parameters to a manageable level.
This last approach, which may be termed a "Monte Carlo" (MC) modelling approach in its current implementation, has been used with some success for quite a few years [7,[12][13][14][15][16][17][18][19].MC is used in a wide variety of contexts, in and out 2 Advances in Condensed Matter Physics of materials science [20][21][22][23][24].The method has been used to tackle a very wide range of problems but has its own difficulties.
First, it requires modelling of interatomic potentials.Ab initio approaches like density functional theory (DFT) and molecular dynamics (MD) struggle to cope with the very large systems required to model the many anisotropic and relatively long-ranged (dozens of unit cells) correlations.The potentials need to be modelled in a way which is not computationally expensive, but this militates against these interactions being highly physically accurate.
Second, identifying the nature of the disorder-what structural features need to be varied, correlated, and made to interact-still requires an expert eye and can be very subtle.This militates against anyone who is not already familiar with SCDS from incorporating its analysis into their materials science studies.
Third, this approach is inevitably computationally intensive as the interactions induce the correlations in the model through an MC simulation; only then can the model be Fourier transformed to calculate the expected diffuse scattering, which means that evaluating a set of model parameters is relatively slow.For a large model, the cycle of MC simulation then Fourier transform calculation can take hours, even on relatively current hardware.This report will focus on modelling SCDS from crystals which can be assembled from discrete units, be they molecules or atoms.It is less applicable to systems which are to be modelled as infinite networks of, for example, corner connected octahedra or similar.Hence, it lends itself to the study of molecular crystals, in which the crystal can be considered as an assembly of discrete molecules, interacting via forces much weaker than those that hold the molecules themselves together, and where none of the atoms is shared between molecules, but each can be assigned to a particular molecule.A single atom can be a molecule in this approach.

The Nature of the Model
The model is a model crystal, an assembly of unit cells typically 32×32×32 unit cells on a side.Such a large assembly is necessary if the model is to capture the statistics of the range of possible local environments and so provide a high quality Fourier transform.If the system is a molecular crystal with  = 4 (a typical value) and around 30 atoms per molecule, this results in ≈10 6 atoms and three times as many position coordinates.
Atoms and molecules are made to interact via simple Hooke's law potentials: where   is the length of vector  connecting atoms on adjacent molecules,  0 is its equilibrium length, and   is its force constant.The sum is over all contact vectors (cv).Thus, choosing the   will determine how the displacements of molecules are related, since the interactions will cause the displacements to interact.The question arises: How do we choose the interactions to use?With the increases in computing power, one answer, which was not practicable until recently, is "use them all."One typical model for intermolecular forces is the Buckingham potential, which can be seen to approach zero relatively quickly as length increases.So this suggests that the   themselves may be parameterised, an approach that has been used and found to show significant benefits in rapidly obtaining a model that gives a good representation of the displacive diffuse scattering [13,25].
This model is implemented using the program ZMC with supplementary toolbox programs to generate the force constants and contact vector lists.These details are outlined more fully in the Supplementary Material available online at http://dx.doi.org/10.1155/2015/878463.ZMC uses a z-matrix to describe the molecule(s), which in essence means that the molecular geometry is described using a local coordinate system, and the assembled molecule is then placed into the unit cell using a set of external coordinates.Since monoclinic 9-chloro-10-methylanthracene (CMA) is effectively rigid (except for rotation of the CH 3 group, and H atoms do not contribute to the X-ray diffraction), no "internal" degrees of freedom were present.
In (1), the   are atomic size-effect parameters which allow that the equilibrium length of a contact vector may depend on the occupancies of the contacting sites.This is not relevant to CMA, which shows occupancy but not an apparent sizeeffect.
To induce chemical SRO (as distinct from correlations amongst atomic/molecular displacements) an occupancy (chemical ordering) simulation must be performed as well as the displacive simulation using ZMC.This is the case for CMA and is discussed below.Some discussion of occupancy simulations in the context of ZMC has been given elsewhere [12] and is also included in the Supplementary Material.

Preliminary Calculations.
The modelling process began with a reliable CIF [26] file of the (average) crystal structure.For CMA, such a file was obtained from the CCDC and reviewed with reference to the literature [27-29].The CIF file was modified.A "dummy" atom was created at the centre of the CMA molecule and used as the molecular origin.Atoms were then reordered to build the molecule logically out from that centre.Since the CH 3 group and the Cl are disordered, two CIF files were used ("1" and "2"), the original and one in which these groups were swapped.
The program Mercury [27] was used to read in each CIF file then pack out the unit cell ( = 4) and read the file out as a mol2 file.This was done for both species, one at a time.
A toolbox program, zmat maker was then applied to each mol2 file and produced a z-matrix and a set of external coordinates for each molecule of each species in the unit cell.Multiple z-matrices were used, though they are essentially the same matrix with the CH 3 and Cl transposed.It has been shown that in CMA the two species (or flipped versions of the one species) occur in the ratio ∼69 : 31 [28][29][30].
A simple toolbox program (make random occ) was used to generate an initial, random occupancy structure for the crystal.The user answers a few simple questions and the program outputs a file suitable for use in ZMC.However, in this case, there is evidence of ordering of the species, and so the file acts as an input into an occupancy simulation.See Section 3.3.
The other input required by ZMC was the contact vector list and the associated force constants.This is discussed in Section 3.2 below.
The output from ZMC was Fourier transformed to obtain the model diffuse scattering using a program, DZMC, built on DIFFUSE [31].This program requires a relatively straightforward input file that describes the region of reciprocal space to be calculated, gives some details of the calculation, and provides scattering factors.

Contact Vectors.
ZMC can be used to obtain an initial list of contact vectors (in this case, using all atoms apart from the artificial molecular origin atom), but this list must then be processed to classify the vectors into sets of symmetryequivalent vectors (types) and to calculate appropriate force constants for each type.
Once the list was obtained, a typical next step was undertaken; the file was imported into a spreadsheet and the contacts were sorted on length and within that on the species they originate on and within that on the species they then contact with.It was assumed that if two vectors were of the same length (to approximately three decimal places) and both vectors acted within the same type of molecular pair (e.g., acted between two "1" molecules) then they were of the same type.A simple toolbox program (Classify) processed the list and grouped vectors according to their length and the species they connected.
In the case of CMA, all vectors up to a length of 7 Å were generated by ZMC resulting in some thousands of vectors.Generally each class of contact contained four or eight vectors, resulting in some thousands of classes.The resulting file was fed into a further program (Springs) which used a simple equation [13] to calculate a force constant for each vector type and output a block of text suitable for insertion into the ZMC input file.Since many of the longer vectors were found to have zero interaction strength using this model, these were excised from both the contact vector list file and the list of force constants.The net result was ∼400 vector types included in the model, although some of these had force constants quite close to zero.

Occupancy Simulation.
A model already existed for the occupancy structure in CMA [28,29].Using the tools presented here, we first generated "contact vectors" using only the (artificial) origin atom of each molecule.Since the occupancies were modelled as interactions between discrete variables (the species numbers, 1 and 2), molecular geometry information was not necessary.This simulation read in a model crystal consisting of the correct proportions of species 1 and 2, set up using make random occs, and then performed a Monte Carlo simulation to create an occupancy structure that possessed the desired correlations and maintained the correct composition.
Figure 1 in [32] defines the occupancy correlations which give a reasonable fit to the data.For use with ZMC the occupancies were encoded in a file which gave the species type (z-matrix type) for each molecular position in each unit cell.In this case, "1" indicated one orientation, and "2" the other.Inducing SRO then reduced to introducing correlations into the array of 1s and 2s.Since there are two possibilities, the species numbers were transformed (internally, within the occupancy MC program) into ±1 variables and an Ising-type simulation was undertaken [8,33,34].If the site occupancies are assigned values of   = ±1, then the energy of the "spin," , is given by where   is the interaction constant for correlations of type  and the inner sum is over all   neighbours of type  (these could be nearest neighbours (NN), second nearest (2NN), and so on).The sign of   determines whether the occupancies tend to be positively or negatively correlated.In these simulations, the correlation magnitudes were specified, and the   iteratively adjusted to deliver the desired correlation structure.
As noted, the terms describing the interacting neighbour types were generated by using the same process as noted in Section 3.2 but using only the "dummy" atom at the molecular origin.The occupancy MC (Occ MC) then visited two randomly chosen sites of different occupancy (but which were otherwise equivalent), calculated their energy, swapped occupancies, calculated again, and kept or rejected the new configuration.The swapping algorithm ensured that the input overall concentrations were maintained.It should be borne in mind that system composition and competition amongst correlations can prevent some correlation values from being obtained (i.e., frustration can occur).

Running the Model.
Once the occupancy structure was established, the Hooke's law contact vectors were established, the crystal geometry was described by the lattice parameters, z-matrices, and "external" z-matrix coordinates, the simulation could be run.This allowed the force constants (  in (1)) to induce displacive correlations in the model.ZMC can then write out the resulting simulation-essentially a very big array of atomic coordinates.The file types that ZMC writes depend on the command line arguments but can include the following: (i) A file suitable for use by DZMC/DIFFUSE to calculate the diffuse scattering.(ii) A file that a future instance of ZMC can read in to resume the simulation.(iii) A file (and some notional macros) suitable for use with DISCUS [8], to allow DISCUS to be used to calculate the diffuse scattering.DISCUS can also calculate pair distribution functions and other properties.
Advances in Condensed Matter Physics (iv) A CIF file describing the average structure, found by averaging over the simulation.
(v) Various forms of statistical output designed to allow plotting of various correlations and structures of the model.
We note that other arguments to ZMC can be used to ask it to generate contact vectors, perform some simple plotting tasks, or provide some built-in help.1(a), 1(d), and 1(g) (first column) show sections of observed diffuse scattering from CMA, measured using the ID-11-B beamline at the Advanced Photon Source.These cuts have been assembled into slices using Xcavate [35,36] and corrected where necessary for beam fluctuations, overexposed pixels, and so on [37].The symmetry of the unit cell has not been applied, as is shown by the missing sections.The white circles indicate the extent of the calculated patterns in the remaining subfigures.

Calculated Diffuse Scattering.
To illustrate the different aspects of the modelling, we present a preliminary model and the final model.The preliminary calculation ("Model 1", see Figures 1(b), 1(e), and 1(h) (second column)), used a model of 32×32×32 unit cells in which the displacive correlations have been induced via the Hooke's law springs with force constants determined using a simple parameterisation [13] and then globally scaled such that the average isotropic ADP (averaged across all atoms) was  iso = 2.0 Å2 (this is the same as scaling the temperature of the MC simulation).The occupancies are random, though they give the correct global average.The thermal diffuse scattering around the Bragg positions mimics that seen in the observations (within limits imposed by model size and computer run time, at least).However, we can see that the broad bands of diffuse scattering most easily seen in the ℎ0 cut (see box in Figure 1(d)) are continuous, whereas in the observations they are collapsed into spots.This indicates that these features are associated with the chemical SRO (in fact, a simulation in which all molecules are of the same type does not show these bands at all) and that the randomness of the SRO in this model is not reflective of the correlations in the real system.Recent results [12,25] suggest that the process for modelling the displacive diffuse scattering is sufficiently reliable that it aids in determining what features are not displacive in origin and are therefore related to chemical SRO.This shows how the modelling helps with identifying the sources of the different scattering features.
Calculated diffuse scattering from a model incorporating SRO is shown in Figures 1(c), 1(f), and 1(i) (third column).This model ("Model 2") uses the same force constants,   , as Model 1, but incorporates the chemical SRO correlation structure from earlier work [28,29,32].The boxed regions in Figures 1(d) and 1(f) show how the correlations cause the diffuse bands to collapse into broad spots, while the other cuts also show subtle modifications that make the calculations look more like the observations.All input files and toolbox programs (and most output files) used to generate both models are included, with additional documentation, in the deposited material.
Points to note are the following: (i) The overall locus of the thermal diffuse scattering (some examples are arrowed) is well reproduced.
(ii) The approximate envelope of the SRO scattering is well reproduced (some examples are boxed).
(iii) No attempt has been made here to refine the model.The   are exactly as given by [13], with the parameters to equation 1 in [13] being  = 11.0, = −0.40,and  = −8.0.The agreement could therefore be improved, but this is not the focus of this particular report.

Conclusions
It is possible to implement a simulation of the diffuse scattering from a system as complex as a molecular crystal showing SRO and displacive correlations and to do so in a matter of hours.This does not, of course, mean that the simulation will capture all the crucial details without considerable refinement.In this case, the occupancy correlations were already known.However, clearly the combination of a useful staring approximation for the   and some insight allowing initial estimates for the occupancy correlations can allow a starting model to be assembled.Some of the programs included in the deposited material are general (e.g., ZMC, DZMC, and zmat maker) while others were derived from existing code and are used specifically for the CMA problem and would probably need modification if used on another problem (e.g., MC occ).Yet even these programs provide templates to enable a researcher to attack new problems from a hopefully useful starting point.
The analysis of diffuse scattering is complex and multivariate.Disorder inherently shows many aspects and can take on many forms.Perhaps expecting a single approach that can tackle disorder in systems as diverse as binary alloys, small molecules, and proteins is too much to expect.ZMC and the accompanying toolbox is an attempt to offer a way of tacking a subset of possible problems and a subset of possible types of disorder.

Figure 1 :
Figure 1: Observed (a, d, and g) and calculated diffuse scattering from CMA. Model 1 (no "chemical" SRO) and Model 2 ("chemical" SRO) are discussed in the text.First row is 0 cut, second is ℎ0, and third is ℎ0.Circles on (a), (d), and (g) indicate the extent of the calculated cuts.Some features are boxed for easier comparison.