R06 Case Study Analysis Powerpoint
Assimilation of ionosonde profiles into a global ionospheric model
Leo F. McNamara,
- Air Force Research Laboratory, Hanscom Air Force Base, Bedford, Massachusetts, USA
- Now at Air Force Research Laboratory, Kirtland Air Force Base, New Mexico, USA.
Gregory J. Bishop,
- Air Force Research Laboratory, Hanscom Air Force Base, Bedford, Massachusetts, USA
Judith A. Welsh
- Air Force Research Laboratory, Hanscom Air Force Base, Bedford, Massachusetts, USA
 The Utah State University Global Assimilation of Ionospheric Measurements (GAIM) ionospheric model has been run for multiday intervals in November 2008 and February/March 2009, to investigate the model's ability to assimilate plasma frequency profiles provided by Digisondes. Ionosondes are currently the only type of assimilation data that can provide information on the profile below the peak of the F2 layer. Attention has been focused on the Republic of South Africa, which has four Digisondes and thus offers a unique validation environment. The model has been run for multiple assimilation data scenarios, some of which include GPS total electron content (TEC) observations, in order to provide benchmarks for testing the profile assimilation. The Hermanus Digisonde was set aside to provide the ground truth, in particular the values of foF2, hmF2, and the width of the F2 layer. The values of these characteristics were also tested at the other three ionosonde sites, since the data assimilation procedures do not usually reproduce the assimilated data exactly. It was found that assimilation of ionosonde data did not improve the accuracy of the GAIM values of foF2 at Hermanus (or Grahamstown) beyond that provided by the GPS TEC data, but these TEC-only errors were already relatively small. However, the ionosonde-only errors were smaller than the relatively large TEC-only errors for Louisvale and Madimbo. Assimilation of ionosonde data did not provide any significant increases in the accuracy of the model values of hmF2 and width of the F2 layer.
 The Utah State University Global Assimilation of Ionospheric Measurements (GAIM) [Scherliess et al., 2004; Schunk et al., 2004; Thompson et al., 2006] model of the ionosphere is run on a routine basis by the Air Force Weather Agency (AFWA), which has tasked the Air Force Research Laboratory (AFRL) with the model's validation. The starting point for the assimilation is a background physics-based model that is driven by real-time geophysical indices, the Ionospheric Forecast Model (IFM [Schunk et al., 1997]). The IFM itself relies on several empirical models such as of the thermosphere, thermospheric winds and electric fields [Sojka et al., 2003]. GAIM assimilates observations of such things as slant total electron content (TEC) made at ground-based GPS sites, and thus generates an updated three-dimensional worldwide specification of the ionosphere from 92 to 1380 km. The current version of GAIM implemented at AFWA uses a Gauss-Markov technique to assimilate the observations, and generates a new specification of the ionosphere every 15 min between ±60° geographic. This model is identified as GAIM-GM, and we have tested version v2.4.3.
 AFRL has performed extensive validations of the GAIM-GM model against ground truth [see, e.g., McNamara et al., 2007; Decker and McNamara, 2007; McNamara et al., 2008, 2010]. These validations were mostly concerned with assimilation of GPS TEC and DMSP/SSIES in situ electron density observations.
 The plasma frequency (or electron density) versus altitude profiles provided by the Digisonde are a unique type of assimilation data, since they are the only data type that currently provides height (profile) information. (GAIM-GM v2.4.3 does not assimilate radio occultation data, which does provide height information. RO data will be assimilated by v2.8.1, which is currently being validated by AFRL.) The observed slant TEC and UV radiances are all integrated quantities, with no height information, while the in situ DMSP/SSIES electron densities are for a single altitude (∼840 km). The assimilation of Digisonde profiles was addressed in a limited study by McNamara et al. , who showed that GAIM-GM favored the TEC observations from a nearby GPS site over the Digisonde information.
 The assimilation by GAIM-GM of both GPS TEC and ionosonde profiles has previously been investigated by Thompson et al. . These authors assimilated profiles from 19 globally distributed ionosondes, and used the Bear Lake Observatory (BLO) ionosonde as ground truth. Assimilation of both slant TEC data (from 339 GPS sites) and ionosonde profiles was shown to produce the best comparison to the BLO observations of foF2, while maintaining the fidelity of the global TEC comparison.
 The present analysis takes advantage of the 4-Digisonde network and multiple GPS sites in the Republic of South Africa (RSA). The four Digisondes offer a regional validation capability that is not available elsewhere. With just one Digisonde, there is the invidious choice of either assimilating the profile information, or using it for validation purposes. We have run GAIM-GM for multiple scenarios. For example, we include/exclude GPS TEC observations from within the RSA, GPS TEC observations from “global” sites external to the RSA, and Digisonde profile information from up to three sites. At least one Digisonde (usually Hermanus) is set aside to provide ground truth. However, the assimilation procedures do not guarantee a perfect fit at the other Digisonde sites, so we also investigate the GAIM-GM specifications at those sites. The main study covered an interval in November 2008, with a supplementary study for February/March 2009 to investigate issues that arose from the November study. The ionograms were all autoscaled by ARTIST 5 [Galkin et al., 2007], as distinct from the earlier and less reliable versions of ARTIST used by Thompson et al. . The profiles are provided to GAIM-GM as “edp2 files”, with an altitude spacing of 1 km. (The current version of GAIM-GM uses only the 10 km values.) The new network of Digisonde DPS-4D ionosondes currently being deployed by AFWA will all use ARTIST 5.
 Our particular concern in this paper is to investigate the improvements in the GAIM-GM specifications that accrue from having ionosonde profiles available for assimilation, with and without the usual GPS TEC data. We run GAIM-GM with GPS TEC and SSIES in situ electron densities in basically the same manner as it is run by AFWA. These specifications provide a bench mark against which to measure the advantages provided by assimilating ionosonde profiles. All TEC quality control issues are addressed internally by GAIM-GM. However, we have been very careful with the ionosonde profiles that are assimilated, since that data source is our main interest. We are well aware of the limitations of autoscaled ionogram data [McNamara, 2006]. However, we wish to separate the issue of poor assimilation data from that of how much advantage GAIM-GM actually takes of ionosonde profiles.
 Section 2 discusses the Digisonde observations and the plasma frequency profiles that are assimilated by GAIM-GM. The ionograms are processed automatically (autoscaled), and then further processed so that unreliable profiles are not passed to GAIM-GM. Section 3 discusses the correlation between values of ΔfoF2 and of ΔhmF2 for different pairs of Digisondes (Δ is deviation from the monthly median), as well as autocorrelation coefficients for each Digisonde. The utility of the ionosonde information can be expected to be greater when these correlation coefficients are higher. Various assimilation scenarios, with different combinations of assimilation data are described in section 4. Section 5 describes some of the key results for the accuracy of the GAIM-GM values of foF2 at Hermanus, the assigned ground truth Digisonde, while section 6 does the same for hmF2. Section 7 discusses the GAIM-GM values of the F2 layer width or thickness. All calculations presented in this paper are based on edp2 files generated by the program QualScan [McNamara, 2006]. Section 8 compares edp2 files generated by QualScan and ARTIST 5, and discusses their differences. Section 9 presents a summary of the results of the study, while section 10 discusses their implications.
2. The Assimilated Observations
 Table 1 lists the GPS sites and their locations, while Figure 1 provides a simple map of the locations. The Lusaka (zamb) site is listed just to note that this potentially useful site seems to have been closed. Simonstown (simo) has also been closed, but not until after November 2008. There are two Sutherland sites that are essentially collocated. The same holds for Pretoria.
 The Digisonde locations are listed in Table 2, and plotted (separately) in Figure 2. The Hermanus, Louisvale and Grahamstown Digisondes form a roughly equilateral triangle with sides of ∼700 km, so the values of foF2 are expected to be reasonably well correlated [McNamara, 2009]. Actual correlation coefficients for the study intervals are given in section 3. Madimbo is ∼1200 km from Grahamstown and Louisvale, so the correlation coefficients would be expected to be relatively low, with the Madimbo edp2 data having little effect at Hermanus, which we use solely to provide ground truth.
 The Hermanus Digisonde started up in July 2008. The first validation period was chosen to be after this date, and at a time when the more remote Digisondes (Louisvale and Madimbo) were actually operating. Thus the first validation interval ran from 16–30 November 2008 (days 322–335). This interval included a storm-related doubling of the electron densities at all sites on day 330 (25 November 2008). This doubling would not have been unexpected on the basis of the kp and Ap indices, and illustrates the value of real-time assimilative models. The second validation interval ran from 11 February through 11 March 2009, after which the productivity of the remote stations decreased substantially.
 All ionograms were specifically autoscaled using ARTIST 5 [Galkin et al., 2007] by Ivan Galkin at the University of Massachusetts Lowell. At the time, only the Hermanus ionograms were routinely scaled using this latest version of ARTIST. The Hermanus and Grahamstown Digisondes generate ionograms every 15 min, while the other two have a 30 min cadence.
 The Digisonde profile information is provided to GAIM-GM in the form of a table of altitude (1 km grid), plasma frequency, electron density, and an estimated uncertainty in the last two. The data files have a standard extension of edp2, which identifies them to GAIM-GM as ionogram profile files. The second line of an edp2 file contains the values of foF2 and hmF2, along with the estimated uncertainties (or confidences). The values of foF2 and hmF2 are not specifically used by GAIM-GM as the peak of the profile because the values given by earlier versions of ARTIST were too unreliable when the GAIM algorithms were being developed. The ARTIST 5 values are much more reliable.
 For the present analysis, the edp2 files were generated by the program QualScan [McNamara, 2006], using the ARTIST 5 autoscaled data. QualScan inverts (i.e., obtains the plasma frequency profile from) the autoscaled ionogram trace using the program POLAN [Titheridge, 1985], but only after performing reasonableness checks (since a nonphysical trace can cause POLAN to fail gracelessly). QualScan assigns a confidence (uncertainty in plasma frequency) at each altitude that is equal to half the difference between the NHPC and POLAN plasma frequencies. (NHPC is the part of ARTIST that deduces the electron density profiles [see Reinisch and Huang, 1983; Huang and Reinisch, 1996].) QualScan is currently an element in the preprocessing of ionograms that provide plasma frequency profiles to AFWA.
 Some edp2 files contained only a header, either because ARTIST 5 classified the ionogram autoscaling as unreliable, or because QualScan did so. The Hermanus daytime ionograms often have sections of the trace missing on either side of the daytime foF1 cusp. This caused problems with an earlier version of ARTIST 5 and its profiles, as well as with POLAN. The ARTIST 5 issues were quickly resolved, but the missing part of the trace often leads to nonphysical POLAN profiles, so such ionograms are often rejected by QualScan. QualScan produced valid edp2 files (the others were created, but contained only a header) for only ∼80% of the RSA ionograms. The long time gaps between edp2 files (partly from missing ionograms) mean that GAIM-GM has to model the temporal decay of the isolated observations. The percentage of Grahamstown ionograms accepted by QualScan fell to ∼65% between about 0600 and 1200 UT for both study intervals.
3. Correlation Coefficients for foF2 and hmF2
 In a simple picture, the effects of an observation such as that of foF2 and hmF2 will fall off horizontally as exp(-d/D), where D is some correlation distance or length. Mostly for convenience, the correlation length is usually set to the distance at which the correlation coefficient drops to 0.7 [Klobuchar and Johanson, 1977]. Using manually scaled ionograms, McNamara  found that the correlation length for September/October 2006 for the RSA stations was generally greater than 735 km (the side length for the Hermanus, Louisvale and Grahamstown triangle), and certainly less than 1200 km, the distance from Louisvale and Grahamstown to Madimbo. Thus Louisvale and Grahamstown should contribute to the GAIM-GM specifications at the ground truth station (Hermanus), but Madimbo would be expected to have only an average effect (no space weather effects), if any.
 McNamara  cautioned against using autoscaled values to determine the correlation lengths, but in the assimilation world the autoscaled data represents the only option. We have therefore calculated the correlation coefficients for the autoscaled data at each UT hour for both foF2 and hmF2 for all station pairs, to estimate the utility of the Digisonde data. Actually, the correlation coefficients are for the deviations of foF2 and hmF2 from the monthly median values. We consider both cross correlation (between ionosondes) and autocorrelation (versus time lag for the same ionosonde).
3.1. Cross-Correlation Coefficients for foF2
 Figure 3 shows the cross-correlation coefficients (blue upper curve) for deviations of foF2 at Grahamstown and Hermanus, the stations with 15 min cadences, for November 2008. The red lower curve shows the counts for each hour (by 0.01) for which QualScan provided a value of foF2 for both locations.
 The correlation coefficients exceed 0.8 (a comfortably high value; a value of 0.7 is usually taken as a good value) for most of the 24 h, but not for around dawn. With autoscaled data, some of the low correlation coefficients arise from a single outlier value caused by an autoscaling “blunder”. The fact that the ionogram traces are often spread at night could also lead to low correlation coefficients, since ARTIST could handle the spread traces at different sites a little differently. The low value of 0.625 at 0330 UT is actually due to a relatively large uncertainty in the values of foF2 compared with the day-to-day variability at that time, which is only ±0.5 MHz. At 1200 UT, the correlation coefficient is 0.945, and the points for each day lie close to the line of best fit.
 The foF2 correlation coefficients for Hermanus-Louisvale and Grahamstown-Louisvale generally exceed 0.8 from 0800 to 2400 UT (from ∼1000 to 0200 LT), but are very noisy and below ∼0.6 at other times. For Grahamstown-Madimbo, the correlation coefficients are also very noisy. They reach ∼0.6 in the middle of the day, but drop below 0.2 at night. These low values are consistent with the large separation (∼1200 km).
 It should be noted that the correlation coefficients paint a somewhat pessimistic picture of how much a Digisonde can contribute to the GAIM-GM specification at the location of the ground-truth site, which is Hermanus. This is because Hermanus itself has autoscaling uncertainties, somewhat compromising the concept of ground truth.
 The high correlation coefficients for foF2 should manifest themselves in the form of accurate specifications of foF2 at Hermanus and Louisvale when the only data assimilated is the Grahamstown edp2 data (case R18, as described in section 5.3).
3.2. Cross-Correlation Coefficients for hmF2
 The correlation coefficients for hmF2 are significantly lower than those for foF2. Figure 4 shows the November 2008 correlation coefficients for deviations in hmF2 at Grahamstown and Hermanus, the stations with 15 min cadences.
 The correlation coefficients are even more erratic and lower for Grahamstown-Louisvale and Grahamstown-Madimbo. This low correlation is understandable, given that hmF2 is a derived parameter that relies on the validity of the whole autoscaled ionogram trace and the successful inversion of the trace to derive the profile. The accuracy of the inversion process is limited by a lack of specific knowledge of the distribution of ionization below the lowest ionogram frequency, fmin, especially at night, and in the daytime E-F1 valley. Model distributions must be applied in both cases. There is in fact some current concern with the model of the nighttime underlying ionization that is used in QualScan, since it does not match observed Arecibo incoherent scatter profiles (B. W. Reinisch, private communication, 2009). While this issue has yet to be resolved, it is not relevant to the present study because the assimilated and ground-truth profiles are all derived by QualScan, so the analysis is self consistent. The noise level in the values of hmF2 at Hermanus can potentially lead to large GAIM-GM errors in hmF2 at Hermanus.
3.3. Autocorrelation Coefficients for foF2 and hmF2
 The autocorrelation coefficients for deviations of foF2 from the median value for the four Digisondes are shown in Figure 5, as a function of the lag in hours, for November 2008. The autocorrelation coefficients are important because they influence the success with which GAIM-GM can account for missing ionograms.
 Figure 5 shows that the autocorrelation coefficient decreases linearly with the lag, starting at ∼0.9 for 0.25 h (sequential ionograms) and passing through 0.7 at ∼1.5 h. The Madimbo points consistently lie below those for the other Digisondes. The coefficients are somewhat lower for the February/March 2009 interval, passing though 0.7 at ∼1.0 h. For the same interval, the correlations are significantly higher for nighttime ionograms than for daytime ones. For the November 2008 interval, the correlations are similar for day and night, except for Madimbo, which has lower correlations at night.
 Figure 6 shows the autocorrelation coefficients for deviations of hmF2 from the median value for November 2008. As with the foF2 autocorrelation, the decrease of the hmF2 autocorrelation with increasing lag is approximately linear. However, the coefficient never exceeds the desired value of 0.7. The coefficients are even lower for February/March 2009. They are lower at night than during the day for both study intervals. The lowest daytime autocorrelations are for Hermanus, which is probably a result of the gaps in the ionogram trace around foF1that cause problems for POLAN (as mentioned at the end of section 2). These low autocorrelation coefficients for ΔhmF2 suggest that the GAIM-GM values of hmF2 will not be a substantial improvement over the values obtained without assimilating edp2 data, especially at night. This point is discussed in section 6.
Numerous RNA motifs play a key role in the transmission of the genetic information. For instance, the trans-activating responsive (TAR) element is a 59 nucleotide long imperfect RNA hairpin located at the very 5′-end of the human immunodeficiency virus (HIV) mRNA. The viral protein trans-activator protein (Tat) binds specifically to a bulge in the upper part of the TAR element and recruits the cellular cyclin T1 and the associated kinase CDK9, at the apical loop of the hairpin. This multi-protein–RNA complex hyperphosphorylates the carboxy terminal domain of the RNA polymerase II, making this enzyme more processive [9,10]. Therefore, specific ligands of the TAR RNA that would inhibit the binding of Tat are expected to severely hinder the transcription of the HIV genome and the development of the retrovirus. The TAR RNA structure was used as a target for the in vitro selection of aptamers.
2.1 R06 aptamer
In vitro selection against TAR identified RNA aptamers recognizing the folded structure . After ten selection rounds the sequenced candidates revealed a consensus octamer, 5′-GUCCCAGA-3′, the six central bases of which were complementary to the TAR RNA loop. Band shift assays were performed between the radiolabelled target and these candidates leading to a dissociation equilibrium constant of 20–50 nM under the selection conditions. Computer analysis used to predict the secondary structure of these aptamers revealed that they folded as imperfect hairpins. The 8-mer consensus sequence presented in the apical loop includes six bases complementary to the TAR loop. A minimal binding motif could be defined for the aptamer of highest affinity, R06, corresponding to the top part of the 98-nt parent candidate. The truncated aptamer, which retains similar TAR binding properties, is a perfect hairpin with a 8-bp stem and the octameric sequence displayed in its apical loop (Fig. 2A ). Enzymatic footprints gave strong evidence for loop–loop interaction or kissing-complex formation between TAR and R06. Further analysis of the target-aptamer complex by band shift assay demonstrated that Watson–Crick base pairing between the loops was crucial for the stability of the bimolecular complex: a point mutation in the R06 loop was detrimental for binding to TAR. The affinity for the target could be restored by introducing a compensatory mutation in the TAR loop. Therefore, R06 can discriminate between two hairpins differing by a single base in the hexanucleotide loop.
However, complementarity between the loops did not fully account for the binding capability of R06, suggesting that non-canonical interactions between TAR and the aptamer were crucial. Compared to the aptamer, the selected octameric sequence was a very poor ligand of TAR with an association constant 2–3 orders of magnitude lower than the full-length aptamer. Thermal denaturation experiments monitored by UV spectroscopy revealed that the complex formed between TAR and the 8-mer consensus sequence was characterized by a melting temperature (Tm) lowered by 27 °C compared to that of the full-length aptamer–TAR complex . Binding to immobilized TAR, monitored by surface plasmon resonance (SPR), could not even be observed when the 8-mer sequence was injected over the sensorchip surface, while at a similar concentration the hairpin aptamer gave rise to clear association and dissociation phases . Therefore, even if the loop–loop interaction was driven primarily by Watson–Crick base pairing, stability of the complex required the interacting region to be presented in a structured context.
The structure of the kissing complex formed by TAR and TAR* (Fig. 2A), a rationally designed hairpin with a six-nucleotide loop complementary to the TAR loop and a 5-bp stem, was shown by NMR spectroscopy to be bent towards the major groove of the loop–loop helix with an angle of 30° . A quasi-continuous stacking of base pairs from one stem helix to the other one, through the loop–loop helix was observed. These structural features were also observed in other kissing complexes derived from NMR experiments [14–16]. In contrast an almost straight coaxial stacking of the helices was observed in the crystal structure of the 23S RNA kissing complex  and in the DIS–DIS one , a kissing complex that constitutes the initial step of the dimerization process of the human immunodeficiency virus type 1 (HIV-1) genome. Interestingly, molecular dynamics (MD) performed on the TAR–TAR* solution structure reconciled these observations . The initial curvature of the structure disappeared during the first nanosecond of the simulation run and TAR–TAR* adopted an almost straight coaxial structure, indicating that this latter conformation was likely the most stable one. The three-dimensional structure of TAR–R06 is not known yet but one can reasonably expect that this complex adopts an overall conformation similar to that observed for other RNA–RNA loop–loop complexes. In particular, TAR–R06 as other kissing complexes was recognized by the structure specific Rop protein encoded by the ColE1 plasmid from Escherichia coli.
The 8 nt loop of the anti-TAR R06 aptamer corresponding to the selected consensus shows selected G and A residues flanking the six nucleotide sequence complementary to the TAR loop , suggesting that they might play a role for the stability of the TAR–aptamer complex. This was demonstrated by investigating the properties of different combinations of loop closing residues [11,12]. Purine,purine combinations gave rise to the less destabilized TAR–aptamer complexes, the A,G combination, for instance, being almost equivalent to the selected G,A one. In contrast, pyrimidine, pyrimidine combinations drastically decreased the stability of the complex. It is worth noting that complex stability was not directly related to the aptamer stem stability. The G,C combination, for instance, that provided the aptamer stem with an additional Watson–Crick base pair, generated an aptamer of increased thermal stability compared to the selected aptamer (ΔTm=+11 °C) but the resulting bimolecular complex with the viral target was less stable (ΔTm=−16 °C). The U,A combination that gave rise to a TAR*-like aptamer resulted in a −17 °C decreased thermal stability of the complex that exactly matched the stability obtained with TAR*, suggesting that the increased stability of the TAR–R06 complex over TAR–TAR* really originated in the G and A residues closing the aptamer loop. This result validated the usefulness of an in vitro combinatorial approach over a rational one to identify high affinity RNA ligands.
Kinetic analysis by SPR further emphasized the stabilizing role of the G and A residues ; the decreased stability of the complexes formed with the mutated aptamers originated in a faster dissociation reaction, while the association reaction remained unchanged. In the selection buffer which contained 3 mM Mg2+, binding of TAR* to TAR was hardly detected. Structural studies have suggested that the stabilizing effect of this cation on loop–loop complexes would result from direct binding in two specific metal ion binding sites made by phosphate clusters flanking the major groove of the loop–loop helix [13,14,21].
MD was used recently to investigate the role of the residues closing the aptamer loop . As the three-dimensional structure of TAR–R06 is not established yet, the TAR–TAR* solution structure was taken as the starting structure. The TAR* hairpin was converted into an R06 like aptamer by substituting the U closing the loop on the 5′-side by a G, thus generating the characteristic loop closing G,A combination. The choice of the initial conformation of these residues was obviously crucial. The MD results on TAR–TAR* showed that the UA base pair next to the TAR* loop adopted a conformation characterized by a large C1′–C1′ interglycosidic distance, which likely favoured stacking of the bases at the stem–loop junctions. Taking also into account the preliminary NMR experiments on TAR–R06, which suggested that the GA base pair did not adopt a sheared conformation, the G,A bases were positioned in a cis Watson–Crick/Watson–Crick conformation. The results obtained during the MD simulation runs suggested that the stabilizing role of G,A likely resulted from inter-backbone hydrogen bonds that were not observed in the TAR–TAR* structure and an optimized stacking of the bases at the stem–loop junctions.
Stacking interactions as key structural determinants for stable kissing complexes were also demonstrated in the case of a loop–loop interaction derived from a transient RNA–RNA kissing complex that regulates the replication of the E. coli plasmid ColE1, by introducing 2-aminopurine fluorescent probe, an analogue that forms with uracil a Watson–Crick base pair isosteric with AU at the stem–loop junctions . Kinetic analysis of the interaction by fluorescence-detected stopped-flow experiments showed that loop–loop association followed a two-step mechanism: an initial encounter reaction was followed by a slower kinetic step that might reflect an isomerization reaction for optimizing the stacking interactions at the stem–loop junctions.
Other in vitro selection of RNA candidates against RNA hairpins further supported the idea that purine-purine base pairs might be preferred for closing hairpin loops involved in RNA loop–loop interactions. Scarabino et al. identified hairpin aptamers that bound to the anticodon loop of the yeast tRNAphe through kissing complex formation . As reported for the R06 anti-TAR aptamer, the 7 nt loop of the aptamer complementary to the anticodon loop was flanked by G and A residues. In vitro selection against the DIS hairpin that regulates dimerization of the HIV-1 RNA genome also selected kissing aptamers showing mostly purine-purine combinations to close the loop . However, the A,A combination was preferred over all others, including GA, confirming that the identified AA sheared pair that flanked the auto-complementary hexanucleotide sequence was crucial for dimerization .
In vitro selection of DNA candidates against TAR RNA led also to aptamers that formed loop–loop complexes with the viral target. However, footprinting studies  and NMR experiments  showed that the conformation of the RNA–DNA interacting loops differed from that of the TAR–RNA aptamer complex. The loop–loop interaction involved only five base pairs instead of six for TAR–R06. In contrast to this latter complex that has no linking residue to connect the loop–loop helix to the TAR or aptamer stems, one RNA and two DNA residues constituted linkers in the TAR–DNA aptamer complex. Difference in geometry between RNA–RNA and DNA–RNA loop–loop complexes was further supported by the fact that the TAR–DNA aptamer was not recognized by the Rop protein. The RNA–DNA loop–loop helix was even different from that of linear RNA–DNA duplexes as it was not recognized by the E. coli RNase H protein. These results indicate that depending on the chemistry of the random pool, the selection process evolved not only to optimize the Watson–Crick interactions that primarily drive loop–loop interactions but also to favour non-canonical interactions that significantly contribute to the thermodynamic stability of kissing complexes.
2.2 Chemically modified aptamers
In vitro selected DNA or RNA aptamers are ligands that display high affinity and strong selectivity for their target and might be of interest for biological applications such as the regulation of gene expression. When used in a cellular context, efficiency of the selected candidates decreased inevitably as their life-time is drastically reduced by nuclease degradation. Several modifications were developed in the frame of the antisense strategy to circumvert this limitation and even to improve the affinity for the targeted sequences . Two approaches can be used to generate nuclease-resistant aptamers. The first one takes advantage of some unnatural nucleotides that can be enzymatically incorporated by the polymerases during the selection process. Phosphorothioate linkage in place of phosphodiester is compatible with the SELEX enzymes . 2′-Fluoro and 2′-amino pyrimidines are chemically modified nucleotides that were successfully used to generate nuclease resistant aptamers [30,31].
The second approach which consists in introducing post-selection, chemical modifications in the selected aptamer is risky. The chemistry of the random pool dictates the structure of the complex, reflecting the exquisite adaptation of the aptamer to its target. A DNA version of the R06 RNA aptamer is a poor TAR ligand and vice versa. Then, any modification that will alter the geometry of the nucleotides could affect affinity and specificity of the selected candidates for the target. In this context, modifications that will retain the conformation of the parent aptamers will generate a priori good mimics. 2′-O-methyl (2-′-OMe), N3′ → P5′ phosphoramidate deoxynucleotide (NP-DNA) and “locked nucleic acid” (LNA) (Fig. 3 ) modifications confer resistance to nucleases and adopt the N-type (C3′-endo) conformation characteristic of the RNA .
Indeed, fully modified 2′-OMe and NP-DNA R06 derivatives formed complexes with TAR, characterized by similar or even slightly higher affinity constants than the parent RNA–RNA complex [33,34]. Moreover, these derivatives retained the key structural determinants identified by in vitro selection. In particular, the crucial G,A loop-closing combination of the aptamer still contributed to the thermodynamic stability of the complex indicating that they adopted an overall conformation close to that of the selected TAR–RNA complex. However, these chemical modifications introduced subtle changes in the geometry of the resulting kissing complex and the loop–loop helices differed from that observed with linear hybrids. In contrast to the RNA–RNA kissing complex, neither 2′-OMe- nor NP-DNA–TAR complexes were recognized by the Rop protein. Moreover, the increased stability per modified residue of 2′-OMe-RNA (ΔTm=+0.5 °C)  or NP-DNA–RNA (ΔTm=+2.5 °C)  versus RNA–RNA linear duplexes was not observed. This further underlined the non-canonical conformation of the loop–loop region.
The results obtained with LNA analogues illustrate that post-selection modification of aptamers is not trivial and that C3′ endo-conformation of the incorporated modified nucleotides will not always guarantee success. LNA is a recently introduced chemical modification that generates the most stable hybrids ever characterized with a ΔTm of +3 °C and +10 °C per LNA residue upon binding to DNA and RNA, respectively . In contrast to fully 2′-OMe and NP-DNA versions of R06, the LNA aptamer did not form a stable complex with TAR. Neither were good ligands hairpins that displayed an LNA stem with a DNA loop or vice versa (Darfeuille et al., unpublished).
Having in mind that flexibility of the ribose ring was restricted due to the 2′-O,4′-C-methylene linkage and that oligomers which alternated DNA and LNA nucleotides adopted an overall A-type conformation [37–39], a series of mixmer LNA/DNA hairpins were synthetized. No rules really dictated the positions at which DNA or LNA residues were introduced. One derivative, LNA06, (Fig. 2C) with the G and A residues closing the loop being DNA, led to a complex as stable as the one obtained with the parent RNA aptamer. As previously observed with 2′-OMe and NP-DNA modified aptamers, the LNA modification did not generate a complex of increased stability. Further analysis of the interaction between the mixmer DNA/LNA derivative and the viral target showed that stacking interactions at the stem–loop junctions were still crucial and related to the identity (G and A) and the chemistry (DNA) of the nucleotides closing the loop of the hairpin analogue. Surprisingly, the LNA/DNA antisense octamer (corresponding to the hairpin loop) displayed the same affinity for the target as the aptamer, even if it behaved kinetically differently, whereas its RNA version hardly bound to TAR RNA. However, kissing over antisense interactions provided recognition with increased specificity, validating the usefulness of in vitro selection (Darfeuille et al., unpublished).
The biological effect of 2′-OMe and NP-DNA anti-TAR aptamer derivatives was evaluated. Both analogues were inhibitors of the Tat-mediated in vitro transcription with an IC50 of about 400 nM compared to >4 μM for the in vitro selected RNA aptamer [33,34]. This effect was specific as the R06 analogue with a loop-closing C,U combination, which did not bind to TAR, did not inhibit Tat-mediated transcription. As demonstrated with the NP-DNA hairpin and a Tat peptide, the inhibitory effect was likely in part related to a competition between the viral protein and the chemically modified aptamer for binding to TAR. Interestingly, as the binding sites of these ligands do not overlap this suggested that interaction of the aptamer analogue induced structural changes that prevent Tat binding or inhibited the conformation changes of the TAR RNA taking place upon Tat binding.