A brief summary of major molecular pathways and biochemical events induced by PDT
We briefly mention here (the biochemistry of PDT has been extensively studied by several other investigators and various publications exist in the medical literature) that in PDT there is a contribution of two reaction mechanisms (Type I and Type II) which result in damage mechanisms that depend on oxygen tension and photosensitizer concentration. Here we adopt our previous PDT modeling scheme of Type II PDT which is considered to be the major reaction mechanism [5]. Singlet oxygen 1O2, a highly reactive state of oxygen, is a cytotoxic agent generated during PDT treatment. Singlet oxygen is produced during PDT via a triplet-triplet annihilation reaction between ground state molecular oxygen (which is in a triplet state) and the excited triplet state of the photosensitizer (which then returns to its singlet ground state). First, a photosensitizer, a light-absorbing molecule that alone is harmless and has no effect on tissue, is activated by the second element, directed light of a corresponding wavelength that is delivered to the patient [7], selectively targeting the abnormal tissue. Molecular oxygen is consumed during the photochemical reaction to produce cytotoxic agents, thus destroying neoplastic tissue. Besides singlet oxygen and other reactive oxygen species (ROS), activation of caspase cascades known as "executioner caspases" such as caspase-3, -6 and -7 is the next step of the apoptosis/necrosis process [13]. The active executioner caspases cleave cellular substrates, which leads to characteristic biochemical and morphological changes observed in dying cells. Active caspases are potent effector of post-treatment cell apoptosis: For the intrinsic cell death pathway, apoptosis is triggered by intracellular events such as DNA damage and oxidative stress. For the extrinsic cell death pathway, apoptosis is triggered by extracellular stimuli such as TNF and TRAIL. A sharp increase in the levels of caspase 3 indicates the beginning of apoptosis. A very early step upon illumination is cytochrome c release from the mitochondria into the cytosol of treated cells [7]. A possible correlation exists between the cytochrome c release and the loss of the mitochondrial membrane potential and might be related to MOMP (mitochondrial permeability transition pore). Calcium ion release through MOMP is correlated to cytochrome c loss. PDT has a very subtle effect on mitochondrial membrane. Cells could die from ATP depletion (necrosis) or indeed follow the apoptosis activation of the caspase-pathway. Caspase 3 is the caspase that cleaves a large number of proteins that are involved in cell structure and maintenance, such as PARP. Cleaved PARP has been used as the marker of the apoptotic extent. PDT treatment with Pc 4, BPD, or aluminum phthalocyanine (AlPc) has been shown to lead to cleavage of PARP in different cell lines [7, 14]. A schematic representation of known molecular pathways and biochemical events induced by PDT is given in (Figure 1).
Rate distortion theory
An input vector signal will represent the combined stimulus of different PDT cell death inducing-molecules (Figure 2). We have previously shown [7] in that we can solve an ODE group that describes the major molecular interactions and levels of concentrations during PDT treatment in time domain and obtain the normalized concentrations where:
(1)
PDT treatment starts at t = 0, and ends at a later time t = t
d
(this can be 10 to 30 minutes and determines the optical dose for a given photon density). Observation ends at t = tmax = 30 hours. We define as the probability by which the source produces the "word" (normalized molecular concentration levels) [15] at time t . For a continuous distribution, starting with ε = 0.001 or ε < 0.001 we define the bump function approximation to the delta function, in the phase space of the normalized concentrations:
(2)
(3)
We notice that this is a molifier of the Dirac delta function (η
ε
converges to δ in the sense of measures, ), and for values of ε small enough, the radius of the compact support is shorter than the step size of the spatial grid of the simulations. We initialize the marginal probability distribution of the cell decision with binary values for the variable y as equal to:
(4)
A distortion measure is defined as , which is a measure of the penalty charged for reproducing the strength of the cell death signal described by the vector stimulus by the decision and thus quantifies how disadvantageous a given decision is in response to a given stimulus . The most common distortion measures the Hamming distortion measures of the form:
(5)
Where the first inequality hold for at least one and d1, d2 are real positive numbers. The distortion function describes the goals of a decision-making pathway by quantifying how disadvantageous, or ''distorted,'' a decision is in response to a stimulus . In our case, suppose that when all molecular concentrations are below a threshold , the cell should not die; for at least one concentration greater than its fixed threshold , the cell should die. In practice, the thresholds may not be clear and a cell can be forgiven for make either decision in response to a stimulus close to the threshold. To represent this situation, a graded distortion function needs to be used [8]. The mechanism by which data is gathered, stored, and utilized by the cell are poorly understood, and rate distortion theory may provide some insight into this function [8]. The "decompressed" data strength (cell decision) may be different from the original data (level of cell death stimulation). Typically, there is some distortion between the original and reproduced signal. This distortion measure, may be cell dependent, or time dependent, describing essential features of a cell, such as how does a cell estimate the state of its environment, how does it quantify alternative decisions, and how does it relate these decisions to the maximization of the fitness of the population [8]. In other words, the distortion measure is the penalty for an incorrect classification of the level of a molecular concentration, which leads to errors in the stimulus pattern recognition, which in this frame is the assignment of a cell fate probability to the given input by the bio molecular reactions. For the purposes of information theory a discrete channel is described by a probability transition matrix where Q is the conditional probability of receiving the output-signal letter given that the input letter signals were transmitted. As a conditional probability it is related to the probability distributions of the random vector and the random variable , by the equation:
(6)
In the minimization of the mutual information defined below, the conditional probability matrix Q will be calculated through a relation that is derived by the method of Lagrange multipliers [8, 15]:
(7)
where s is taken to be a negative number. This is the Lagrange multiplier for the method of calculus of variations, which is used to find the optimal cell decision probability q and conditional probability Q by minimizing the average mutual information between source (stimulus vector) and receiver (cell/cell decision) [16] (Finding extrema of a function is a most common problem, but difficulties often arise when one wishes to maximize or minimize a function subject to fixed outside conditions or constraints. The method of Lagrange multipliers is a powerful tool for solving this class of problems without the need to explicitly solve the conditions and use them to eliminate extra variables). We calculate the strategy as defined by equation (5) and (6) that minimizes the average mutual information between the input (death stimuli, normalized concentrations) and the output (decision ), and the decision probability for cell survival (Figure 3) or cell death by implementing a time dependent optimization Blahut Arimoto algorithm (Figure 2). The average mutual information is defined as [15, 16]:
(8)
The rate distortion function R(D) is defined as
(9)
where, Q
D
is defined as the collection of all conditional probabilities-strategies such that d(Q) ≤ D where the expected distortion is given by [16]
(10)
The function R(D) describes the amount of information needed to be preserved by this biochemical data compression scheme of the source output which is given in the form of levels of molecular concentrations, so that reproduction of the death/survival signal can be subsequently generated from the compressed data with average distortion less than or equal to some specified value D (the rate distortion function R(D) has been initially defined as the effective rate at which the source produces information and passes it to the "user", subject to the constraint that the "user" can tolerate an average distortion D [16]). According to [8] the complexity and (metabolic) cost of a channel generally varies directly with its capacity, and a less complex strategy is more likely to be followed and be realized by a biological system. With minimal information I and a cell dependent distortion measure d, the optimal strategy incorporates randomness to generate biological variation. With respect to the distortion measure d, following [8] one can notice that while this analysis requires knowing the probability distribution of stimuli as well as the conditional probability distributions of probabilities-strategies, an experimental estimate of these distributions through survival curves can produce an approximation of the distortion function around which the pathway is optimized. Starting from:
(11)
And solving for the distortion measure, we get:
(12)
We can pick such that
(13)
And this determines the distortion measure.
Logical design
Computation of cell decision discrete probability , distortion D and rate distortion function R(D) is performed using the method of Lagrange multipliers. The problem is solved computationally using the Blahut-Arimoto algorithm. We summarize the basic steps:
-
a)
Assign a value ε that determines the accuracy of the algorithm.
-
b)
Initialize the exponential function .
-
c)
Initialize the probability distribution of the decision as a binary distribution as in equation (4).
-
d)
Given compute the conditional probability distribution that minimizes the mutual information while satisfying the condition of equation (6).
-
e)
Update the probability that minimizes the mutual information by using equation (5).
-
f)
Calculate the distortion D given by the equation (9).
-
g)
Calculate the rate distortion function given by [16]:
(14)
The convergence of the algorithm depends on the difference between two bounds the lower and upper bound ([15],Theorem 7):
(15)
(16)
The algorithm gives better convergence for small difference between these two bounds in absolute value:
-
h)
Iterate steps (a-g) for all simulation times t, and this yields for one set values of PDT treatment parameters: photo-density ρ, drug concentration [S0] and initial molecular oxygen concentration .
-
i)
(optional) Vary (, [S0], [F]) where F is the fluence, is the molecular oxygen concentration in the cell, [S0] is the concentration of ground state photosensitizer, and ([F]is the fluence (see [5]) and obtain corresponding survival curve for optimization of fluence/drug dose modeling parameters for the biomolecular mechanism studied and the given choice of molecular components of the stimulus vector). Fit to experimental data of survival curves to find the optimal range for the parameters.
Survival functions - predator prey models
Survival functions can be derived using predator-prey model. The predator-prey model has been used for the description of the survival probability in dynamic energy budget models [17] under the assumption that that the per capita death rate has two contributions, a constant loss due to random misfortunes, and a density-dependent loss due to predation, with a Holling Type II functional form. This model was designed to predict the growth and reproduction patterns of a species based on the characteristics of individual organisms, particularly the strategy used to allocate resources.. This model takes an individual-based approach where all members of the prey population are "copies" of one individual, and each "copy", could be the "model individual" itself. The use of a predator-prey model (a continuous model used for the simulation of discrete population dynamics) for the modeling of survival probability(a continuous variable) suggests the quantization of survival probability. Indeed, the quantization of probability has been proposed by other authors [18, 19]. The existence of the "chance-quantum" (c.q.), implies certain axioms [Go 43]. For example, if the probability of an event is equal to or greater than one c.q., it may ultimately occur, if an event has a calculated probability of less than one cq. it will not occur, for an event having an appreciable probability (equivalent to many cq.), a change in surrounding conditions leading to a computed change in probability of less than one cq. will in fact cause no change in the probability of the event, etc.
Survival Units Duality refers to the idea that the life a cell (or survival probability) can be discretized (quantized) in quanta of life (survival units) which are assumed here as the basic units of life in every cell. New cells are produced by existing cells, and therefore the termination of a cell does not allow to assign any morphological or biochemical characteristics to the life of the cell itself, since these characteristics can only be considered as the manifestations of the monitoring, interaction and response of the cell, as a biochemical unit undividedly united to cellular life ("life units"), to the extracellular environment. Cellular life (survival probaility) is a set of life units (survival probaility quanta), where each cellular life unit contains the whole complete life of the cell in itself, therefore allowing the cell to repair itself after any loss of survival units due to the attack of cell death inducers or other factors (Figure 3). Therefore, the survival probability of a cell is a set that contains itself within each of its elements (survival probablity quanta or units). This idea is not new in mathematics. This is in accordance with Von Neumann's idea that any information-based replicator must contain inside itself a symbolic representation of itself, an "image" of itself.
Russell's paradox and information
A set is a collection of objects, or elements. Sets are defined by the unique properties of their elements and sets and elements may not be mentioned simultaneously, since sets are determined by their elements and therefore one notion has no meaning without other. Bertrand Russell, while working on his "Principia Mathematica" (Principles of Mathematics) in 1903, he discovered a paradox that arised from Frege's set theory that leads to a contradiction [20]. It says "the set of all sets which are not members of themselves contains itself." In mathematical terms, let , then S ∈ S ⇔ S ∉ S. Although the precise rules for set formation have been under intense investigations and several different logical systems have been proposed, sets that contain themselves as elements, like S, are definitely ruled out, as "abnormal". Based on the work Russell and Whitehead, Kurt Gödel was able to show that a theorem could be stated within the context of Russell and Whitehead's system that was impossible to prove within that system [21]. Gödel's Incompleteness Theorem states that there are mathematical statements that can never be proved, in any consistent system of axioms such as the arithmetic system.
The need for the distinction between two kinds of collection can be found back in the work of Schroder and Cantor [22]:
"If we start from the notion of a definite multiplicity of things, it is necessary, as I discovered, to distinguish two kinds of multiplicities (by this I always mean definite multiplicities). For a multiplicity can be such that the assumption that all of its elements "are together" leads to a contradiction, so that it is impossible to conceive of the multiplicity as a unity, as "one finished thing". Such multiplicites I call absolutely infinite or inconsistent multiplicities.... If on the other hand the totality of the elements of a multiplicity can be thought of without contradiction as "being together", so that they can be gathered together into "one thing", I call it a consistent multiplicity or a "set".
Cantor's conclusions are the ancestors of today's distinction between classes and sets, as they appear in the work of Von Neumann [23]. For von Neumann all sets are classes, but not all classes are sets. And those classes that are not sets - the so-called proper classes -cannot themselves be members [22]. In Von Neumann's axiomatization theory, some major advantages are [22]: There are extensions for the predicates 'set', 'non-self-membered set', 'well-founded set', 'ordinal'. There is a well-determined collection of all the Zermelo-Fraenkel sets; and there is a domain for quantification over sets. Further, the Axiom of Choice is provable in von Neumann's system. Several issues, both technical and intuitive, have been reported with respect to this system. A discussion can be found in [22], and here we only mention the consequence of this theory, that the concept of class has no extension (based on the axioms of this system, there is no class of all classes, and therefore the problem has just been pushed back). Therefore the resolution of this paradox remains unresolved.
In mathematical logic, it is suggested that problems that are essentially the same must be resolved by the same means, and similar paradoxes should be resolved by similar means. This is the principle of uniform solution [281]. Two paradoxes can be thought to be of the same kind when (at a suitable level of abstraction) they share a similar internal structure, or because of external considerations such as the relationships of the paradoxes [281]. The question rises as to the existence of other paradoxes that are of the same kind with Russell's paradox. Russell focused more on the underlying structure of the paradoxes and saw them all as paradoxes of impredicativity. The "inclosure schema" was proposed by Priest, as a formal schema that can be used to classify paradoxes [24]. Although the schema will not be analyzed in this work, the conclusion is very interesting: Russell's paradox is of one kind with the "sorites" paradox (the paradox of the "heap"). This paradox was introduced by to Eubulides of Miletus (4th century BC), a pupil of Euclid, and appears when one considers a heap of sand, from which grains are removed. Is it still a heap when only one grain remains? If not, when did it change from a heap to a non-heap? These two paradoxes are neighboring paradoxes, and it has been suggested that we should not just consider the internal structure of the paradoxes, although that is undoubtedly important, but we also consider the external relationships--the relationships to other nearby paradoxes [281]. The way nearby neighbors (paradoxes of one kind) respond or fail to respond to proposed treatments tells us something about what makes the whole family tick and about their structural similarity [281]. The question "when is the cell dead?" indicates confusion between cessation of organic coherence and cellular activity. When a cell irrevocably loses its organization, it's dead. The point when it becomes irrevocably damaged is related to the sorites problem.
It is known that continuous time Markov processes, are used for the formulation of stochastic predator prey models that are based on within individual variation [25]. Within individual variation, used under the name of "demographic stochasticity", has been used in the theory of adaptive dynamics. The theory of adaptive dynamics aim at describing the dynamics of the dominant trait in a population, that is called the 'fittest' trait. The main approach is through stochastic or individual centered models which in the limit of large population, can be transformed into integro-differential equations or partial differential equations [26]. Stochastic simulations, using a finite size population, involve extinction phenomenon operating through demographic stochasticity which acts drastically on small populations [26]. These simulations involve a unit for minimal survival population size, which corresponds to a single individual. In general though, typical stochastic and deterministic simulations do not fit and give rather different behaviors in terms of branching patterns. It has been observed that the notion of demographic stochasticity does not occur in general in deterministic population models, and an alternative proposed has been proposed in order to include a similar notion in these models: the notion of a survival threshold [27], which allows some phenotypical traits of the population to vanish when represented by too few individuals. In particular, through the investigations of simple and standard Lotka Volterra systems that describe the time of the distribution of phenotypic traits in time, it is shown that the inadequacy of deterministic models to handle extinction phenomena through demographic stochasticity, can be corrected by the introduction of a survival threshold, leading to a mimicking effect of the extinction probability due to demographic stochastcity in small sub-populations, while hardly influences the dynamics of large sub-populations [26]. In this framework, the above principle implies (at the extreme) that densities correspond to less than one individual are undesirable [26], indicating that the link between the continuous (large populations) and the discrete (small sub populations), between the existence (survival) and the vanishing (extinction - demographic stochasticity) is correlated with the existence of a survival threshold in the model.
Furthermore, this hybrid approach of survival, as continuous-discrete function with a survival threshold assigned to a population, raises the following question: Is there an internal quantization scheme that relates the continuous models for large populations with survival thresholds to small populations' discrete models? The existence of both features, of continuity and quantization in a single process, appears in the study of the conditional survival probabilities of a firm (the computation of the conditional survival probability of the firm from an investor's point of view, i.e., given the "investor information"). Callegaro and Sagna used a quantization procedure, to analyze and compare the spread curves under complete and partial information in new and more general settings in their work on applications to credit risk of optimal quantization methods for nonlinear filtering. The theory of quantization probability they used was based on an earlier study of local quantization behavior of absolutely continuous probabilities [28]. This study analyzes the Lr quantization error estimates for Lr(P) codebooks for absolutely continuous probabilities P and and Voronoi partitions satisfying specific conditions. But the origins of the theory developed there can be traced back to electrical engineering and image processing and in particular in digitizing analog signals and compressing digital images [29]. Therefore, in the heart of the study of survival probabilities we find a theory for the quantization as analog-to-digital conversion and as data compression. Analog signal is a continuous signal which transmits information as a response to changes in physical phenomenon and uses continuous range of values to represent information, where digital signals are discrete time signals generated by digital modulation and use discrete or discontinuous values to represent information. The quality of a quantizer can be measured by the goodness of the resulting reproduction of a signal in comparison to the original. This is accomplished with the definition of a distortion measure that quantifies cost or distortion resulting from reproducing the signal, and the consideration of the average distortion as a measure of the quality of a system, with smaller average distortion meaning higher quality [29].
This is precisely the framework we adopt in this work to study and analyze the process of cell survival during treatment (in our framework). This suggests an organic connection among an axiomatic system foundation, a predator prey rate equation and information theoretic signal processing.
Mathematical modeling scheme
Following [3], biochemical reaction equations were derived from the following scheme, for the apoptosis/necrosis pathways. For a given binary reaction i the biochemical equation is represented by one of following general mass-action paradigms:
(18)
Where k
i
, k-i are the reaction rates. For the autophagy pathway, we follow [4] and the neural network modeling method as it is described in [4, 30]. We use an intermediate modeling strategy that employs nonlinear ODE to describe protein regulatory networks but is not tied to specific reaction mechanisms and rate constants. More precisely, we use ODEs of the form:
(19)
(20)
Where is the expression level of a molecular concentration and is a sigmoidal function that varies from 0 (when ) to 1 (when ). The parameter σ controls the steepness of the sigmoidal function at its inflection point. W
i
is the net effect on molecule i of all molecules in the network. The coefficient ω
ij
is less than 0 if molecule j inhibits the expression of molecule i, more than 0 if molecule j activates molecule i, or equal to 0 if there is no effect of molecule j on molecule i. This equation has the great advantage that it is subject to all the powerful analytical and simulation tools of nonlinear ODEs, yet, in the limit of large σ, it behaves like a discrete Boolean network [30]. When σ ≫ 1, tends to flip (on a timescale ≈ γ-1) between 0 and 1, and the dynamical system approximates a Boolean network [30].
Modeling and simulation
A group of rate equations based on a molecular diagram (Figure 1) can be used to quantitate the time evolution of the following molecule species in a Type II PDT process: photosensitizers (Photofrin) in ground state S
0
, single and triple excited states S
1
and T; oxygen molecules in triplet grounded and single excited states 3O
2
and 1O
2
. Death ligand such as TRAIL and TNF; inactive receptor complex R*;FLICE-like inhibitory protein flip; procaspase-8 and procaspase-10, inactive, both as C8, bi-functional apoptosis regulator Bar; (cleaved) active caspase-8 and caspase-10 C8* ; procaspase-3 and procaspase-7, inactive, both as C3 ; procaspase-6, inactive capsase-6 C6; (cleaved) active caspase-3 and caspase-7 C3* (Figure 4) and active caspase-6 C6*; × linked inhibitor of Apoptosis in the cell XIAP; Poly (ADP-ribose) polymerase PARP (Figure 5), as DNA damage repair enzyme , here all substrate of active caspase-3 C3*;The BH3 interacting-domain death agonist Bid as a substrate of cleaved caspase-8 in its inactive form; the anti-apoptotic protein Bcl-2 (Figure 6); the Bcl-2-associated × protein in its inactive form Bax and its active form Bax* ; Bax in the mitochondrial compartment as Baxm; cytochrome c inside the mitochondria in the mitochondrial compartment CyCm and cytochrome c release from the mitochondria but remaining in mitochondrial compartment CyCr; cytochrome c in cellular compartment CyC ; second mitochondria-derived activator of caspases. Smac and Smac/Diablo released from the mitochondria but remaining in mitochondrial compartment, Smacr ; Apoptosis activating factor Apaf-1, substrate of CyC, in its inactive form Apaf1; active form of Apaf-1, Apaf*; inactive form of procaspase- 9 C9; the apoptosome Apop which is the complex Apaf*:C9; inositol-requiring protein 1 IRE1; JUN N-terminal kinase, JNK; death associated protein kinase DAPK; Beclin mediator of autophagy phosphorylated by death associated protein kinase DAPK, BECN1;the tumor suppressor protein p53; the intracellular concentration of calcium Ca2+; the protease Cathepsin Cath and the protease Calpain; the inositol 1,4,5-trisphosphate receptor IP3R or IPR3; Even though the singlet excited oxygen molecules 1O
2
may play a critical role by themselves, it is well known that other ROS species are also involved in the cytotoxicity of PDT with mitochondria as the possible source and target sites. The 1O2 should be interpreted as the representatives of ROS (Figure 7). The molecular equations, together with the definitions of coefficients and their values that were used to mathematically model the molecular network described above have been detailed in [3–5] and have presented by the author in his dissertation thesis (see below legend of Figure 1). A 70 equation group can be solved to characterize the main molecular interaction involved in Type-II PDT. We use the ordinary differential equation (ODE) stiff solver (ode15s) by MATLAB (The MathWorks, Natick, MA) to obtain the solution vector as a function of illumination time t from the start of illumination at t = 0 to 1800 (s). Experimental verification of these quantities that describe the levels of all these molecular concentrations can be very difficult if not impossible and they relate indirectly to the ultimate consequence of PDT for cell killing. In [5] we introduced a cell killing model that related the molecular concentrations of the singlet oxygen and the unoxidized receptors to the cell survival ratio, which can be measured with an in vitro cell model. A system of 70 ODE (ordinary differential equations) was solved numerically to characterize the main molecular interactions involved in Type-II PDT. The output of this equation group is the time dependent levels of molecular concentrations for the stimulus vector of corresponding to singlet oxygen 1O
2
, cPARP and Caspase 3. The concentrations were normalized with respect to their maximum values and their range is 0[1].
The total time for the simulations was up to 30,000 sec to monitor post-treatment cell killing. We used the stiff solver (ode15s) by MATLAB (The Math Works, Natick, MA) to obtain the solution vector as a function of illumination and observation times, from the start of illumination at t = 0 to 1800 (s) (end of illumination time) and from 1800 to 30,000 (s). Experimental verification of these quantities that describe the levels of all these molecular concentrations can be very difficult if not impossible and they relate indirectly to the ultimate consequence of PDT for cell killing. In [5] we introduced a cell killing model that related the molecular concentrations of the singlet oxygen and the unoxidized receptors to the cell survival ratio, which can be measured with an in vitro cell model. The same software (MATLAB) is used for producing the simulations for the decision mechanism of a single cell model design. The output of the time dependent Blahut Arimoto algorithm is the cell survival probability (Figure 3). The distortion measure d that quantifies how disadvantageous a decision y is, in response to the stimulus vector is defined by the equation if for some i, and if for some i, and a small number otherwise. The thresholds for the normalized concentrations were all set to 0.5. This distortion measure penalizes a cell survival error more than cell death error for given stimuli, by one order of magnitude. For the range of the Lagrange multipliers the equation s = -e-nwas used and in the simulations n varied over a finite set of integers (a sample of n values from 1 to 20 was taken for the simulations below). The initial survival probability was set equal to 0.9. The treatment parameters for the PDT model that was introduced in our previous work [5], was linked to the input of this algorithm.