Secondary Structure Design of Multi-state DNA Machines Based on ...

8 downloads 6840 Views 247KB Size Report
This paper deals with the problem of designing the secondary structure of a ... It considers the minimum free energy of the structure, the structure transition paths, ...
Secondary Structure Design of Multi-state DNA Machines Based on Sequential Structure Transitions Hiroki Uejima and Masami Hagiya Japan Science and Technology Corporation (JST-CREST) and Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan {uejima, hagiya}@is.s.u-tokyo.ac.jp

Abstract. This paper deals with the problem of designing the secondary structure of a multi-state molecular machine in which the formation of repeated DNA hairpin structures changes sequentially with the aim of implementing more sophisticated DNA nanomachines. Existing methods are insufficient to construct such a huge molecular machine using multiple DNA molecules. The method used in this paper validates the changes in formation exhaustively by dividing the secondary structure into hairpin units. It considers the minimum free energy of the structure, the structure transition paths, and the total frequency of optimal and sub-optimal structures. Hence, it can better design base sequences using the principles of thermodynamics.

1 1.1

Introduction Multi-State Molecular Machine

In the field of DNA nanotechnology, various nanomachines made of DNA molecules have been implemented. Mao et al. [9] proposed a DNA motor based on changes in the structure of the DNA helix (B and Z) with salt concentration. However, changes in salt concentration affect all the DNA molecules in a solution uniformly, and each motor cannot be operated individually regardless of its base sequence. Moreover, the motor is restricted to two states. Yurke et al. [20] proposed a molecular system called the molecular tweezers, which has two states that result from changes in its secondary DNA structure. This system depends on the base sequences of its DNA molecules and can be operated individually. Simmel and Yurke then implemented a three-state machine [15] by extending their nanoactuator [14]. Unfortunately, it is not obvious how to extend this to general multi-state machines. Yan et al. applied ‘fuelling’, which was established in the aforementioned works to control the transition between JX2 and PX DNA tiles [19]. The basis for constructing more generalized nanoscale machines and computing devices is to implement multi-state machines using molecules. This study J. Chen and J. Reif (Eds.): DNA9, LNCS 2943, pp. 74–85, 2004. c Springer-Verlag Berlin Heidelberg 2004 

Secondary Structure Design of Multi-state DNA Machines

75

Fig. 1. Repeated hairpin structures in DNA molecules are opened by hybridizing the structures with DNA oligomers in a specific order. Here, the solid and dashed lines of the same color are complementary sequences

examines the design of a multi-state molecular machine that undergoes sequential state transitions as a result of multiple inputs. Prototype Hairpin-Based State Machine A conformational state machine is a multi-state machine in which its conformation (i.e., secondary structure) indicates its state. We propose a hairpin-based state machine that uses the sequential opening of DNA hairpins to implement a conformational state machine. This system consists of DNA hairpins and oligomers whose sequences appear in the hairpin stems. A DNA oligomer can open the corresponding hairpin structure by invading the hairpin stem via branch migration. The hairpins are concatenated with an additional sticky end and form repeated hairpin structures in a single DNA strand. The entire series of repeated hairpin structures comprises a multi-state machine, which maintains its state with its hairpin structures, and the DNA oligomers function as state transition signals. Initially, an oligomer can interact with the hairpin structure at the end of a strand with a sticky end. If the sequences of the hairpin stem and oligomer match, the oligomer invades the hairpin structure via branch migration after hybridizing with the sticky end. When the hairpin opens, a new sticky end is revealed and it plays a role in opening the next hairpin. Consequently, the hairpin structures are opened sequentially by the corresponding oligomers starting from one end of the DNA strand, as depicted in Figure 1. Following Turberfield et al. [16], who pointed out that bulge loops can inhibit hybridization, a similar state machine can be implemented using bulge loops, as shown in Figure 2. This machine is superior to ours in that the topological configuration of the strands more strongly inhibits the invasion of a bulge loop by an oligomer. Moreover, it is also inhibited by the stiffness of the double strand that is formed. Rather than inhibiting hybridization, our machine uses a hairpin

76

Hiroki Uejima and Masami Hagiya

Fig. 2. The repeated bulge structures of DNA molecules are opened by hybridizing them with DNA oligomers in a specific order

to reveal the next single-stranded part after it is opened by an oligomer. As our machine is simpler, because it is made of a single strand, if it is proven to work robustly, it can be used as another type of building block for DNA machines. 1.2

Thermodynamic Analysis of DNA Hybridization

Thermodynamic models, such as nearest-neighbor (NN) thermodynamics, can be used to analyze secondary structures quantitatively. NN thermodynamics considers the stability of the secondary structure. It assumes that the stability of a given base pair depends on the identity and orientation of the neighboring base pairs. In this study, we use the thermodynamic parameters reported by John SantaLucia Jr. [13] for the NN model. The folding problem calculates the secondary structure into which a given base sequence folds to give the most stable structure. Zuker et al. [21] proposed a dynamic programming algorithm to solve this problem in the polynomial time O(n3 ), where n is the length of the base sequence. Their algorithm is one of the algorithms implemented in “the Vienna RNA Package” by Hofacker et al. [7] and in “mfold” by Zuker et al. The inverse folding problem calculates the base sequence that folds into a given secondary structure as the most stable structure. The algorithm is based on a simple search that is evaluated using a folding function that computes the similarity between the target structure and the minimum free energy structure of a sequence. It is also implemented in “the Vienna RNA Package” by Hofacker et al. The distribution of secondary structures formed by a DNA/RNA molecule in the equilibrium state depends on the partition function of the sequence and the free energy of each structure. Therefore, the minimum free energy structure by itself is not sufficient to predict the actual behavior of a molecule. Wuchty et al. [18] proposed an algorithm that finds the complete set of sub-optimal RNA structures. Their algorithm was a modification of the algorithm used to find the optimal structure. This algorithm was also implemented by Hofacker et al. As mentioned above, the partition function is one of the most important factors for predicting the behavior of DNA/RNA molecules. McCaskill [10] solved this problem by using programming in a manner similar to the folding problem.

Secondary Structure Design of Multi-state DNA Machines

77

The time complexity of his algorithm is O(n3 ). The algorithm is also implemented in “the Vienna RNA Package” by Hofacker et al. There are two tractable ways to approximate the energy barrier height between two given structures. One method generates transition paths randomly and finds their lowest energy mountaintop. A transition path is generated in such a way that the transition proceeds more frequently in the direction with the smaller increase in the free energy. This algorithm was proposed by Flamm et al. [6] The other takes advantage of the heuristic proposed by Morgan et al. [11] and is based on a simplified energy model of secondary structures. Their model uses the number of base pairs as the free energy of a structure. Their heuristic generates a path with a very low energy mountaintop efficiently, but guarantees nothing about the properties of the path.

2

Method

First, we introduce the criteria of selectivity and ordinality, which need to be satisfied by our hairpin-based state machine. Then, we explain what frequencies of structures should be focused on to design the molecular machine. 2.1

Formalism

The selectivity and ordinality should be guaranteed by any number of hairpin sequences concatenated in any order. These criteria are reduced to conditions involving a minimum of two repeated hairpin structures, because successive hairpin opening is an orderly behavior. Only the combination of a sticky end sequence and a hairpin sequence needs to be verified to guarantee selectivity. As for ordinality, only the combination of two hairpin sequences needs to be verified. Selectivity We call the oligomer that opens a hairpin structure an input oligomer (Figure 3 (a), (b)). The input oligomer consists of two parts. The part that hybridizes with the sticky end of a hairpin is called the head (the green part in Figure 3 (a)), and the part that invades and hybridizes with the stem of the hairpin is called the tail (the red part in Figure 3 (a)). The sticky end of a hairpin is also part of the stem of another hairpin. Some of the notation used for sequences is defined here. Hairpin(s1 ) is the sequence of the hairpin structure that includes the sequences s1 and s1 in its stem, where s1 denotes the complementary sequence of s1 . Sticky(s1 ) is the sequence of the stem part of Hairpin(s1 ), which functions as the sticky end. Therefore, if hairpins are opened from the 5’-end, Sticky(s1 ) = s1 . Sticky(s1 )Hairpin(s2 ) indicates that the sequence of the hairpin structure is concatenated at its 5’-end with the sequence of the sticky end. For example, the sequence of the structure shown in Figure 3 (1) is represented by Sticky(s1 )Hairpin(s2 ), where the sequence s1 corresponds to the green dotted line, and s2 and s2 correspond to the solid and dotted red lines, respectively.

78

Hiroki Uejima and Masami Hagiya

a

b

1

1-a

1-b

2

2-a

2-b

Fig. 3. A schematic of the requirements for selectivity

Opener(s1 , s2 ) is the opener sequence that consists of part of Hairpin(s1 ) as its head and part of Hairpin(s2 ) as its tail. Therefore, if hairpins are opened from the 5’-end, Opener(s1 , s2 ) is s2 s1 or a part of s2 s1 . For example, the oligomer shown in Figure 3 (a) is represented by Opener(s1 , s2 ), where part of sequence s1 corresponds to the green solid line, and s2 corresponds to the red dotted line. The selectivity of the set of hairpin sequences Hairpin(x) (x = s1 , . . . , sn ) is defined as follows: The hairpin structure Sticky(x)Hairpin(y) is opened by the oligomer Opener(x, y) for any x, y ∈ {s1 , . . . , sn } (x = y). In this case, the hairpin structures are opened properly (Figure 3 (1-a)). The hairpin structure Sticky(x)Hairpin(y) is not opened by the oligomer Opener(z, w), where x = z or y = w, for any x, y, z, w ∈ {s1 , . . . , sn } (x = y, z = w). In this case, the hairpin structures are never opened (Figure 3 (1-b), (2-a), (2-b)). In case (1-b), the oligomer may invade the hairpin structure without hybridizing with the sticky end and open the hairpin. If we identify the oligomer using its tail sequence, such a situation causes no problem, because the tail of the oligomer agrees with the hairpin. In other words, both (a) and (b) are identical signals for opening the red hairpin structure regardless of the sticky end. Hence, this case is omitted when confirming the selectivity. An invading oligomer without a sticky end is involved in ordinality rather than in selectivity. Ordinality Similarly, the ordinality of the set of hairpin sequences Hairpin(x) (x = s1 , . . . , sn ) is defined as follows: The two sequential hairpin structures Hairpin(x)Hairpin(y) are not opened by the oligomer Opener(z, w) for any x, y, z, w ∈ {s1 , . . . , sn } (x = y, z = w). Namely, a hairpin should not be opened until the adjacent hairpin is opened. The cases in which the hairpin and tail do not agree are not verified because the hairpin is seldom opened in such cases. Therefore, only the cases shown in Figure 4 are verified.

Secondary Structure Design of Multi-state DNA Machines

(A)

79

(B)

Fig. 4. A schematic of the requirements for ordinality

2.2

Procedure

First, we describe how selectivity and ordinality are verified based on these definitions. Next, we explain how to calculate the frequencies of the structures corresponding to these criteria. We implemented this procedure using the C programming language with the library of the Vienna RNA package. Verifying Selectivity Selectivity can be confirmed using a simple condition: Given the strands of a hairpin structure and an oligomer, their minimum free energy structure is similar to the target structure (the opened or closed hairpin structure). This condition is verified as an instance of the folding problem. To use Zuker’s folding algorithm [21], these two strands are concatenated with virtual bases that cannot hybridize with any base and they are dealt with as one strand. A secondary structure can be considered as a set of base pairs. Naturally, the similarity or distance between two structures is defined by the size of the symmetric difference between the sets corresponding to the structures. In short, the distance d(Ω1 , Ω2 ) between secondary structures Ω1 and Ω2 is: d(Ω1 , Ω2 ) = |Ω1 ∆Ω2 | = |(Ω1 ∪ Ω2 ) \ (Ω1 ∩ Ω2 )|. When the target structure is Ω, the structure Ω  such that d(Ω, Ω  ) ≤ D is similar to Ω and these two structures can be identified. The threshold D is based on the target structure and size. Verifying Ordinality Selectivity can be confirmed by folding all combinations of sequences as explained in the previous section and checking that the target structure is similar to the optimal one. However, it is impossible to satisfy ordinality in this way. For some combinations of sequences, the DNA oligomer hybridizing with and opening the hairpin structure is the minimum free energy structure, even when the sticky end of the hairpin is not included. Although a structure violating ordinality might be the minimum free energy structure, the high-energy barrier on the transition path leading to the violating structure guarantees the rarity of violations of ordinality in actual situations. Minimizing the valley depth also makes the violating structure less stable. Therefore, our

80

Hiroki Uejima and Masami Hagiya

(a) (b) h

1

h

2

Requirement: (barrier height) h > B and (valley depth) h < V 1

2

Fig. 5. An energy curve of a structure transition violating ordinality

program checks whether the energy barrier is higher than a given threshold and the energy valley is shallower than a second given threshold. The depth of the energy valley is the difference between the energies of the initial (Figure 5 (a)) and final (Figure 5 (b)) structures. The latter is the hairpin opened by an improper invasion of the oligomer. The barrier height is the lowest energy peak in the structure transition. In our program, the barrier height [17] is approximated using the lowest peak for several paths generated with Morgan and Higgs’ algorithm [11]. However, justifying and improving this condition for ordinality requires more precise analyses of hairpin opening, including kinetic analyses. This is left for a future study and is briefly discussed in the final section. Maximizing the Frequency of a Structure In addition to requiring that the target structure be similar to the minimum free energy structure, the frequency of the target structure should also be maximized. This frequency can be computed as the sum of the frequencies of structures similar to the target. For example, the frequency F˜selectivity used to verify the selectivity is:  F˜selectivity = F (S), S: d(S,T )≤D

where T is the target structure, i.e., an opened or closed hairpin, and F (S) is the frequency of structure S. This frequency should be maximized to obtain the best sequence. In calculating the frequencies for selectivity and ordinality, the search looks for sub-optimal structures using the algorithm of Wuchty et al. [18]. A structure is sub-optimal if its energy is lower than [minimum free energy] + h. By adopting only sub-optimal structures, we can neglect structures present at low frequency.

Secondary Structure Design of Multi-state DNA Machines F r e e e n e r g y

81

Suboptimal structures Target structure

h mfe

Secondary structures

Structures similar to the target one Calculate the sum of the frequencies of the structures here! Fig. 6. The energy landscape of the DNA secondary structures of a sequence

3

Experiment

Based on the criteria introduced in the previous section, a structure design program has been implemented. This section explains the software and the result of its execution. 3.1

Programming

We have developed a secondary structure design program called DNAhairpin, in the C programming language, which uses the library of the Vienna RNA Package [7] mainly for thermodynamic calculations. The original library of the Vienna RNA Package does not support the hybridization of multiple DNA strands. DNAhairpin concatenates multiple strands with virtual bases and regards them as a single strand using functions in the library. We modified several library functions, so that the effect of a loop structure built from virtual bases is ignored in the thermodynamics calculation. Thermodynamic Parameters The thermodynamic parameters included in the Vienna RNA Package are for RNA molecules only. The parameters for DNA molecules were obtained by referring to reported physicochemical analyses of DNA hybridization. The thermodynamic parameters required in the library are listed below. – The free energies and enthalpies of stacked pairs [13]. – The free energy of the interaction between the closing pair of an interior loop and the two unpaired bases adjacent to the helix [1, 2, 3, 4, 12].

82

Hiroki Uejima and Masami Hagiya

Start

Generate hairpin sequences When oligomer intrudes, barrier height>B and valley depth