Experimental Assessment of the Reliability for Watermarking and Fingerprinting Schemes

We introduce the concept of reliability in watermarking as the ability of assessing that a probability of false alarm is very low and below a given signiﬁcance level. We propose an iterative and self-adapting algorithm which estimates very low probabilities of error. It performs much quicker and more accurately than a classical Monte Carlo estimator. The article ﬁnishes with applications to zero-bit watermarking (probability of false alarm, error exponent), and to probabilistic ﬁngerprinting codes (probability of wrongly accusing a given user, code length estimation).


INTRODUCTION
Watermark decoders are in essence stochastic processes.There are at least three sources of randomness: the unknown original content (for blind decoders), the unknown hidden message, and the unknown attack the watermarked content has undergone.The output of the decoder is thus a random variable and this leads to a very disturbing fact: there will be errors in some decoded messages.This also holds for watermark detectors which have to take the decision whether the content under scrutiny has been watermarked or not.

Watermarking reliability
In order to be used in an application, a watermarking technique must be reliable.We introduce here the concept of reliability as the guarantee that not only these inherent errors very rarely happen, but also that their frequency or their probability is assessed to be below a given level.Here are two application scenarii where a wrong estimation of the probability of error could lead to a disaster.

Copy protection
Assume commercial contents are encrypted and watermarked and that future consumer electronics storage devices have a watermark detector.These devices refuse to record a watermarked content.The probability of false alarm is the probability that the detector considers an original piece of content (which has not been watermarked) as protected.The movie that a user shot during his holidays could be rejected by his storage device.This absolutely nonuser-friendly behavior really scares consumer electronics manufacturers.In the past, the Copy Protection Working Group of the DVD forum evaluated that at most one false alarm should happen in 400 hours of video [1].As the detection rate was one decision per ten seconds, this implies a probability of false alarm in the order of 10 −5 .An accurate experimental assessment of such a low probability of false alarm would demand to feed a real-time watermarking detector with nonwatermarked content during 40 000 hours, that is, more than 4 years!Proposals in response of the CPTWG's call were, at that time, never able to guarantee this level of reliability.

Fingerprinting
In this application, users' identifiers are embedded in purchased content.When content is found in an illegal place (e.g., a P2P network), the right holders decode the hidden message, find a serial number, and thus they can trace the traitor, that is, the customer who has illegally broadcasted his copy.However, the task is not that simple because dishonest users might collude.For security reason, anticollusion codes have to be employed.Yet, these solutions (also called weak traceability codes [2]) have a nonzero probability of error (defined as the probability of accusing an innocent).This probability should be, of course, extremely low, but it is also a very sensitive parameter; anticollusion codes get longer (in terms of the number of bits to be hidden in content) as the probability of error decreases.Fingerprint designers have to strike a tradeoff, which is hard to conceive when only a rough estimation of the probability of error is known.The major issue for fingerprinting algorithms is the fact that embedding large sequences implies also assessing reliability on a huge amount of data which may be practically unachievable without using rare event analysis.

Prior works
Estimation of probabilities of false alarm or bit and message error rate have been a concern of watermark designers since the beginning.A first choice is to derive the formula of this probability.However, this is often impossible or with the restricting assumption that the real data fit the statistic model used in the derivation.The decoder's decision is often based on a value of score, which writes as a sum of many and more or less independent random extracted features.This explains the abusive resort to the central limit theorem to establish the probability that this score reaches a given value; the score is supposed to be Gaussian distributed and the probability is expressed with the erf function [3,4].However, even if the conditions are sometimes fulfilled to apply the CLT, the convergence rate to the Gaussian law is very crucial and depends on the third moment of the extracted features (in the most simple case) as stated by the Berry-Esséen bound [5].Roughly speaking, a small probability of error amounts to integrate the tail of the cdf of the score, where the CLT approximation by a Gaussian law is very bad.A better way is to calculate upper bounds (Chernoff 's bound, union bound, and nearest neighbor bound).The issue is then about the tightness of the bound, which is usually good only over a precise range of parameter values.Numerical approximations of the probability formula also exist like the Beaulieu method [6] or the DFT method [7] used when the score is the summation of i.i.d.random variables.However, they need the true pdf or the characteristic function of these variables.
When these approaches are not possible, then the last choice is the experimental estimation.However, we have seen in watermarking articles [3,4,8,9] that experimental estimation were only based on the Monte Carlo (MC) method, which is very inefficient for a low probability of error.This naive approach consists in running n experiments and to count the number of times k that the decoder failed.Then, the probability of error p is estimated by the frequency that error happened on this set of experiments: p = k/n.Let X i denote the result of the ith experiment.X i equals 1 if there is a decoding error, 0 else.X i is then a Bernoulli random variable, and asymptotically goes to zero.However, one needs at least around p −1 experiments to make it work, and even worse, its relative standard deviation is given by Var[ p]/E[ p] ≈ 1/ √ pn.Hence, for a decent accuracy of the estimator reflected here by its relative standard deviation, n must be taken as several times bigger than p −1 .This method is thus inadequate for estimating a probability below 10 −6 because it then needs millions of experiments.
As far as we know, almost nothing concerning experimental estimation of low probability of errors has been done by the watermarking community although there exist better methods than the simple MC simulations [10].They have been successfully applied, for instance, to estimate frequencies of packet losses in digital communications [11].We guess that, sadly, the watermarking community is more image processing oriented, so that people usually ignore these recent tools.Yet, the application of these methods to watermarking is not trivial, because they assume a proper statistical model which may not be suitable for watermarking.There is a need for very general or self-adaptive methods resorting to the fewest as possible assumptions.
This article proposes a new experimental method estimating low probabilities of error for watermarking applications.It is a strong adaptation of a complex method whose good properties have been theoretically proven in [12].Section 2 presents as simply as possible this adaptation, and Section 3 experimentally validates the approach for a very simple scenario of watermarking detection (a.k.a.zero-bit watermarking).Section 4 applies the method to a more difficult watermarking problem: the experimental measurement of error exponents under attacks, the reliability of Tardos code in the fingerprinting application.

OUR ALGORITHM
Before explaining the method, let us first describe zerobit watermarking within a few lines.A watermark detector receives two types of contents: original contents and watermarked (possibly attacked) contents.It decides the type of the piece of content under scrutiny based on the observation of L features extracted from the content, whose values are stacked in a vector x.Then, it calculates the likelihood that this vector is watermarked thanks to a score function d(•) : R L → R. It decides that the content is watermarked if d(x) is above a given threshold τ.The probability of false alarm p fa is the probability that d(x) > τ whereas the piece of content has not been watermarked.The probability of false negative p fn is the probability that d(x) < τ when the content is indeed watermarked.The reliability of the watermarking technique is assessed if p fa is below a small level.From a geometrical point of view, let us define the acceptance region A ⊂ R L as the set of vectors x such that d(x) > τ.
For the sake of simplicity, we explain the proposed method when applied on zero-bit watermarking.The key idea of our experimental method is to gradually encourage the occurrences of the rare event (here a false alarm) by generating a sequence of events which are rarer and rarer.In terms of probability, we factorize the very small probability to be estimated in a product of bigger probabilities and thus easier to estimate.Our estimator can be written as p fa = N i=1 p i .

Key idea
To factorize a probability into a product of bigger probabilities, we use the following trick: let A N = A be the rare event, and A N−1 a related event such that when A N occurs, A N−1 has also occurred.However, when A N−1 occurs, it does not imply that A N is true.Hence, A N−1 is less rare an event than A N .This justifies the first equality in the following equation, the second one being just the Bayes rule: ( Repeating the process, we finally obtain: provided that {A j } N j=1 is a sequence of nested events.Knowing that estimation of a probability is easier when its value is bigger, we have succeeded in decomposing a hard problem into N much easier problems.
This decomposition is very general, but the construction of this sequence of nested events is usually not a simple task.An exception is when the rare event A N admits a geometrical interpretation: A N occurs when x ∈ A N .A sequence of nested events translates then in a sequence of subsets The task is even simpler in zero-bit watermarking because an indicator function of these events can be as follows: x ∈ A j if d(x) > τ j .Nested events are created for a sequence of increasing thresholds:

Generation of vectors
The first step of our algorithm estimates p 1 = Prob[A 1 ].In practice, N is large enough so that this probability is not lower than 0.1.Then, a classical MC estimator is efficient.However, the variance of the estimator p 1 has a strong impact on the variance of p fa .Therefore, the number of trials n 1 is several times bigger than p −1 1 , while being far less than p −1 fa , the order of magnitude of the number of trials needed for a direct MC estimator of the probability of false alarm.
For this first step, we must generate n 1 vectors x.Either we have a good statistical model, that is, the pdf p X of these extracted features accurately captures the reality.Then, it is not difficult to generate synthetic data, that is, pseudorandom vectors that follow the statistical behavior of the extracted features.Or, we feed the watermarking detector with n 1 natural images.We count the number k 1 of vectors whose score is higher than τ 1 .
As a byproduct, this first step leads not only to an estimator but also to a generator of the event A 1 .It is not very efficient, as it produces vectors x∼p X and then select those vectors belonging to the set A 1 .Hence, approximately only p 1 n 1 occurrences of the event A 1 are produced.

Replication of vectors
The issue of the second step is the estimation of the probability . We set the threshold τ 2 just above τ 1 , so that this probability is large (typically not lower than 0.1).Once again, an MC estimator is where k 2 is the number of vectors x of the set A 1 which also belong to A 2 .We need to generate n 2 vectors x distributed according to p X and in the set A 1 .We could the first step of the algorithm to generate these vectors, but it is not efficient enough as such.
We then resort to a so-called replication process, which almost multiplies by a factor ρ the size of a collection of vectors in a particular region of the space.For each vector of this collection, we make ρ copies of it, and then we slightly modify the copies in a random manner.This modification process must leave the distribution of the data invariant: if its inputs are distributed according to p X , its outputs follow the same distribution.A modified copy is likely to belong to the set if the modification is light.However, we check whether this is really the case, and go back to the original vector if not.The replication process is thus a modification (line 15 in Algorithm 1) followed by a filter (line 16).It leaves the distribution of the vector invariant.
Since we have run the first step, we have k 1 vectors in A 1 .We choose a replication factor ρ 1 ≈ p 1 more or less conserving the same amount of vectors because the second probability to be estimated has the same order of magnitude than the first one and n 1 was enough for that purpose.We calculate the score of the ρ 1 k 1 modified copies.We conserve the copies whose score is bigger than τ 1 , and replace the other ones by their original vector.This makes the n 2 = ρ 1 k 1 input vectors of the MC estimator.Again, these two first steps lead to an estimator p 2 of p 2 , but also to a mean to generate occurrences of the event A 2 .
The core of the algorithm is thus the following one.A selection process kills the vectors (named particles in statistics) whose score is lower than an intermediate threshold τ i , these are branched on selected particles.A replication process proposes randomly modified particles and filters the ones that are still above the intermediate threshold.Selection and replication steps are iterated to estimate the remaining probabilities p 3 , . . ., p N .

Adaptive threshold
The difficulty is now to give the appropriate values to the thresholds {τ i } N−1 1 , and also to the sizes of the sets {n i } N 1 .The probabilities to be estimated must not be very weak in order to maintain reasonable set sizes.Moreover, it can be shown that the variance of p fa is minimized when the probabilities {p i } N i are equal.However, to set the correct value of the thresholds, we would need the map τ = F −1 (p) which we do not have .Otherwise, we would already know what the value of p fa = F(τ) is.
Require: subroutines GENERATE, HIGHER SCORE & MODIFY 1: for i = 1 to n do 2: x i ← GENERATE(p X ); dx i ← d(x i ); 3: end for 4: N ← 1; 5: τ N ← HIGHER SCORE(dx, k); τ ← τ N ; 6: while τ < τ and N < N max do 7: t ← 1; 8: for i = 1 to n do 9: if dx i ≥ τ then 10: The idea is to set the thresholds adaptively.The number of vectors is constant in all the experimental rounds: n i = n for all i ∈ {1 • • • N}.The threshold τ i has the value such that k i = k.k and n are thus the parameters of the algorithm.The estimated probabilities are all equal to It means that the selection process sorts the scores in a decreasing order, and adaptively sets τ i as the value of the kth higher score.Vectors whose score are below this threshold are removed from the stack, and replaced by copies of vectors above.This means that the size of the stack is constant and that the replication factors {ρ i } are all equal to n/k (k divides n).All the vectors in the stack are independently modified.The modification of a vector is accepted if its new score is still above the threshold.
The last step is reached when τ i > τ.Then, we set N = i, τ N = τ, and k N is the number of scores above τ, so that, for this last iteration, p N = k N /n.At the end, the probability of false alarm is estimated by We stress the fact that, formally, N, k N , and {τ i } N 1 are indeed random variables and their estimations in the algorithm should be denoted by N, k N , and { τ i } N i=1 .For the sake of clarity, we did not use different notations from the deterministic case of the preceding section.The number of iterations is expected to be as follows: The total number of detector trials is nN, which has a logarithmic scale with respect to p −1 fa , whereas a classical MC estimator would need at least p −1 fa trials.The method is given in pseudocode in Algorithm 1.Note that the thresholds {τ i } are stored in memory.This is not useful when estimating p fa , but this gives a nice byproduct for ROC curves: the map p = f (τ) is estimated through the following points: {(p j , τ j )} N−1 j=1 .From [12], the method inherits the asymptotic properties of consistency and normality.With equations Require: subroutine GENERATE RAND PERM π = GENERATE RAND PERM (k); for i = 1 to n do Index(i) = π(mod(i, k) + 1); end for return vector Index; We can also show that, in the asymptotic regime, the bias decreases inversely proportional with n: which means that E[( p fa − p fa )/ p fa ] αn −1 , where α is always a positive number.A remarkable fact is that the bias is positive, so that estimations tend to overestimate the probability of rare event.In concrete situations, the rare event often corresponds to a catastrophic scenario to be prevented, and overestimating is then a nice property.

Some improvements
This subsection proposes some improvements of Algorithm 1 .If n is huge, the subroutine HIGHER SCORE must be carefully implemented.In general, an efficient way is to use the quick-sort algorithm whose complexity is on average O(n log n).If n = 2k, a median algorithm is recommended because its complexity is on average O(n).
The most restrictive part is that, so far in Algorithm 1, k must divide n; the k selected vectors {y i } k 1 are each replicated n/k times (line 14).To take over this restriction, we create a subroutine SELECT which randomly picks up n vectors from the set {y i } k 1 .Each y i is thus replicated a random number of times, whose expectation is n/k.Algorithm 2 shows an implementation, where each vector is at least selected once, and where it is selected a constant number of times if k divides n.

APPLICATION TO ZERO-BIT WATERMARKING
This part first applies the method to a well-known watermarking detector for which there exist bounds and a numerical method to derive the probability of false alarm.This allows to benchmark the method and to fine-tune its parameters.
We have selected the absolute value of the normalized correlation [13] as the score function, so that x is deemed watermarked if where u is a secret vector whose norm equals one.A geometrical interpretation shows that the acceptance region is a two-sheet hypercone whose axis is given by u and whose angle is θ = cos −1 (τ) (with 0 < θ < π/2).Hence, for an isotropic distribution whose probability density function only depends on the norm of x, the probability of false alarm is the proportion of the solid angle of the hypercone compared to the solid angle of the full space p fa = I L (θ)/I L (π/2), with . The authors of [9] derived a simple program numerically calculating the probability of false alarm based on iterative integration by part.Yet, when implemented in MATLAB or standard C, this program fails calculating very weak probabilities, due to computer precision limit.For this reason, they proposed an upper and lower bound, respectively, based on a Gaussian and a Fisher Z-statistic approximations: Indeed, a much better way is to resort to the Fisher-Snedecor F-distribution.If x belongs to the acceptance region, then the angle x, u is smaller than θ or bigger than π −θ.Instead of a cosine, we translate this in term of tangent: Here, P is the projector matrix: Px = (x T u)u, and I the L × L identity matrix.We suppose that x is a white Gaussian noise, as an example of isotropic distribution: p X = N (0, I L ).Therefore, its projections on complementary subspaces Px and (I − P)x are independent and Gaussian distributed.This implies that Px 2 and (I − P)x 2 are chi-square with one and L − 1 degrees of freedom, respectively, and that the random variable F = (I − P)x 2 /(L − 1) Px 2 has a Fisher-Snedecor F-distribution.Consequently, p fa = Prob[F < (L − 1) −1 tan 2 θ].This is by definition the cumulative distribution function whose expression relies on incomplete beta function.After simplifications, we have with I(x; a, b) the regularized incomplete beta function.
MATLAB or Mathematica provides far more accurate approximations than those proposed in [9].The random vector x being a white Gaussian noise, the replication process modifies x into z = (x+μn)/ 1 + μ 2 , with n∼N (0, I L ).Hence, z is still a white Gaussian noise with unit variance.This defines the GENERATE and MODIFY processes in Algorithm 1.

Experimental investigation number 1: strength of the replication
The main shortcoming of our algorithm is that the parameter μ needs a manual fine-tuning.The algorithm as described above works fine for the problem studied in this section when μ = 0.2.This value sets the strength of the replication process, which can be expressed as the expected square distance between the modified vector z and its initial value x.
Here, this square distance is equal to 2L(1−(1 + μ 2 ) −1/2 ).The strength of the replication fixes the dynamic of the system.There is a tradeoff to be found between two undesirable effects.The goal of this subsection is to experimentally show and explain these two effects and to find a trick to circumvent this manual fine-tuning shortcoming.The others parameters are set as follows: L = 20, τ = 0.95, and n = 6400.This gives p fa = 4.704 * 10 −11 .We have found that μ = 0.2 is a correct value because the algorithm behaves as expected.However, a greater or a lower value have negative impacts as we will see.
As the estimator goes, the set A i is smaller and smaller, and the modification process is more and more likely to move vectors out of this set when the strength is too big.Let us define the filtering rate of the replication process as the number of times a modification is refused divided by n. Figure 1 shows this filtering rate along the iteration number.Typically, a factor μ greater than 0.5 (red curves) yields a filtering rate of 100% for the last iterations.This implies that the particules and their scores are not renewed any longer.Thus, threshold τ j saturates and the algorithm does not converge.It stops thanks to the constraint on the maximum number of iterations N max = 100.We seize the opportunity of this case study where the true map p = F(τ) is known to plot the relative error along the ROC curve (p j − F(τ j ))/F(τ j ) in Figure 2. Red curves were simulated for a strong replication strength: μ = 0.7.We observe that, when the filtering rate is too high, the relative error has a peak followed by an exponential decay towards −1.The peak is explained by the fact that the vectors and their scores are no longer renewed, so that the thresholds quickly converge towards the supremum of these scores.Once the thresholds saturate to this supremum, F(τ j ) became fixed, and the relative error has an exponential decay due to the term p j .When this latter becomes negligible compared to F(τ j ), the relative error tends to −1.The impact of a small μ is not noticeable in the filtering rate which is far below the saturation phenomenon (see Figure 1).Yet, Figure 2 shows very strong relative errors  (blue curves) in the first iterations.Factor μ is so weak that replicated particules are almost located at the same place as the previous ones.This prevents us from exploring the space due to a low dynamic and from moving the vectors towards the acceptance region.Hence, the scores of the replicated particules are almost the same scores than the previous ones.This is almost as if μ = 0, that is, classical Monte Carlo.The behavior of the relative error is then strongly dependent on the initialization process which yields the first set of particules.The selection process keeps a thinner and thinner portion (k/n) j of this initial cloud of particules and the intermediate threshold converges to the maximum of the initial scores.Once this is achieved, the intermediate thresholds saturate to this maximum value, and we again Frédéric Cérou et al. observe an exponential decay toward −1 (Figure 2-blue curves).To check our explanations, we plotted the values of the intermediate thresholds in Figure 3.For the iteration number giving a maximum of relative error in Figure 2, we plot a circle centered on the maximum score of the initial particles.This illustrates the dependence between this maximum initial score and the saturated intermediate threshold.
The best tradeoff can be stated in the following terms: find the maximum value of μ such that the filtering rate is below a given level.We modify Algorithm 1 as follows: μ is set to one at the beginning.For each iteration, we measure the filtering rate.If this latter is bigger than the level, we reduce the value of μ and repeat the iteration until the filtering rate is below the level.The value of μ is thus now found adaptively.However, the number of detection trials is no longer fixed.Experimentally, we decrease μ by a factor 1.1 anytime the filtering rate is above 0.7.

Experimental investigation # 2: p = k/n
Parameter p strikes a tradeoff between the speed and the accuracy of the estimator.Equation ( 4) tells us that the lower p is, the faster is the estimation of p fa .However, ( 6) and (7) show that the relative variance and the bias are decreasing functions of p.
Observe first the excellence of the estimator.n = 12 800 particles (last point on curves) represent around 1000 000  Surprisingly enough, the measured variance and bias follow the laws ( 6) and ( 7) known for the asymptotic regime even for a low number of particles.The bias is not measured with enough precision with only 1000 trials for n = 12 800 because its order of magnitude is 0.001 times the value of p fa .Yet, the asymptotic regime is only achieved if the estimations are Gaussian distributed.An Anderson-Darling test [14] reveals that this is the case only for the biggest values of n.This happens quicker for p closer to one according to the scores of Table 1: estimations { p (i)  fa } are deemed Gaussian distributed when n equals 6400 for p = 3/4 whereas this hypothesis is clearly rejected for p = 1/2.
Our conclusions of this experiment are the following.There are two typical use cases of our algorithm.If the user looks for the order of magnitude of the probability to be estimated, then the choice p = 1/2 with around n = 2000 particules gives a fast estimation (around 68 000 detection trials).This is especially true since the variance (6) and the bias (7) are not drastically bigger than the ones for p = 3/4.
If the issue is to assess an estimation with a given accuracy and confidence range, then the estimator must be in the asymptotic regime where the pdf of the estimation error is known.This experiment shows that a ratio 3/4 (i.e., closer to one) is advised.Each estimation lasts longer but, in the end, this is the quickest way to achieve the asymptotic regime.
This experimental work also stresses what we still ignore; the standard deviation and the bias are in practice bigger than the theoretical expressions.This is normal as these latter are theoretical lower bounds.However, we ignore the scaling factor.Other experiments showed that it does not depend on L. We suppose however that there is a strong relationship with the detection score function.This prevents us from establishing confidence ranges supported with the Gaussian distribution in the asymptotic regime.This strategy implies a heavy experimental work, where a hundred of estimations are needed in order to first confirm the asymptotic regime, and second, to estimate the standard deviation.Then, the probability of false alarm is lower than Mean({ p (i)  fa }) + 2 * Std({ p (i)  fa }) with a confidence of 97.7%.A faster way to yield a confidence interval is to observe the number of iterations of several independent estimations.For p = 1/2 and n ≥ 800, more than two thirds of the estimations end at N = 34 iterations (see Figure 6), which gives a confidence interval of [p N , p N+1 ] = [2.91,5.82] * 10 −11 .For p = 3/4 and n ≥ 1600, more than two thirds of the estimations end at N = 82 iterations (see Figure 7), which gives a confidence interval of [p N , p N+1 ] = [4.26, 5.69] * 10 −11 .Once again, a bigger p provides more accurate results but at the cost of slower estimations.

Error exponents for zero-rate watermarking scheme
A watermarking scheme is deemed as sound if its probability of false alarm and its probability of false negative decrease exponentially with the dimension L of the signals under an embedding power constraint.Within this class, the comparison of two watermarking schemes can be based on their exponential decreasing rates, that is, their error exponents defined as follows: There are very few watermarking schemes where error exponents have closed form expressions [13]; for instance, the additive spread spectrum with a single nappe hypercone detection region, the improved sign embedder with a dual nappe hypercone detection region.Furthermore, these theoretical expressions do not foresee a noisy channel (i.e., attack) to calculate E fn (τ).In practice, it is extremely hard to estimate these error exponents because huge values of L should imply very very low probabilities of errors if the watermarking scheme is good.This is no more a problem with our algorithm, and we simply estimate the error exponents by   E fa (τ) = − log p fa /L and E fn (τ) = − log p fn /L with a given big enough L.
For the false negative, the rare event is that a watermarked (and possibly attacked) vector has a score below a small threshold.At each step, the estimator sets τ j as the kth highest scores.Hence, the intermediate thresholds are indeed decreasing.We can also study the impact of an attack on E fn  as soon as the attack vector n has a statistical model with the two following properties: (i) we are able to generate vectors distributed as p N , (ii) there exists a modification process with a controllable strength that lets this distribution invariant.
Then, a particle is now a couple of vectors {x, n}, and its score is the detection function applied to the attacked and watermarked vector: d(z) = d(w(x) + n), where w(•) : R L → R L is the watermark embedding function.The replication process changes both vectors of the particle, each one with its law invariant modification process.Another technical detail is that our algorithm is run only once, storing the intermediate thresholds in order to estimate the mapping { E fn (τ j ), τ j }.The same holds for the false alarm error exponents { E fa (τ j ), τ j }.An interpolation finally gives { E fa (τ j ), E fn (τ j )}.The experimental setup is the following: L = 4000, host vectors are Gaussian distributed with variance σ X = 1.The embedding power equals P e = 0.1.We test three watermarking schemes: additive spread spectrum scheme with d(x) = x T u/ x , "improved" sign embedder with d(x) = |x T u|/ x as detailed in [13], and the JANIS scheme with order 2 [8].For the first two schemes, the relationship between E fa (τ) and the threshold τ is perfectly known [13].However, there is no expression for E fn (τ) under an attack (here a Gaussian white noise with variance σ 2 N = 0.1).For the JANIS scheme, we have to estimate both E fa (τ) and E fn (τ). Figure 8 shows the results.
From an experimental point of view, the measurements are good with only a small inaccuracy.We blame two shortcomings.L is not big enough and the ratio L −1 log p fa (idem with the false negative exponent) does not reflect the rate of the exponential decay.A better way would be to estimate log(p fa ) for several values of L and to estimate the exponent with a linear regression.Second, these plots were obtained very rapidly with our algorithm working with only a n = 3200 vectors stack, and k = n/2.Therefore, the accuracy of the estimation of p fa itself is not at best, but, we are indeed interested in showing that error exponents can be measured very rapidly; the experimental curves for the additive and improved watermarking scheme have the right shape (in particular for the improved scheme, E fn goes to infinity when E fa goes to zero).In the same way, the range of the measurements is limited by the maximum number of iterations allowed, which is here set to 200.
From a watermarking point of view, it is quite difficult to announce which scheme performs better.All of them share the same detection complexity.The improved scheme has the advantage of an infinite E fn when there is no attack.JANIS performances curve seems to be better only at high E fa .Yet, performances of course collapse with the presence of an attack, but JANIS seems to be more robust of the three compared schemes.

APPLICATION TO PROBABILISTIC FINGERPRINTING CODE
Fingerprinting is the application where a content server gives personal copies of the same content to n different buyers.c of them are dishonest users, called colluders, who mix their copies to yield a pirated content.A binary fingerprinting code is a set of N different m bit sequences {x i } i∈ [N] .Each sequence identifying a user has to be hidden in the personal copy with a watermarking technique.When a pirated copy is found, the server retrieves an m bit sequence and accuses some users or nobody.There are two kinds of errors: accusing an innocent (i.e., a false alarm), and accusing none of the colluders (i.e., a false negative).The designers of the fingerprinting code must assess the minimum length of the code so that the probabilities of error are below some significance levels: p fa < 1 and p fn < 2 .One of the best fingerprinting codes is a probabilistic code proposed by Tardos [15], where m = O(c 2 log(1/ 1 )).Before Tardos' work, the existence of such a short code was only theoretically proven.Tardos is the first to exhibit a construction which is, moreover, surprisingly simple.The main point of interest for us is that the accusation is based on the calculus of scores and their comparison to a threshold.Consequently, this fingerprinting code is very well suited with respect to our algorithm.

New accusation strategy
In Tardos probabilistic fingerprinting code, the accusation is focused; the detection decides whether a given user is guilty or not.It calculates his score from the code sequence of the user and the sequence recovered in the pirated copy.The user is deemed guilty when his score is higher than a threshold.The size of the collusion c, the probabilities 1 and 2 are the inputs of the code.The outputs are the code length m and the value of the threshold T. We think that this approach is not adapted in practice.We believe that the length of the code sequence to be embedded in content is not tunable but fixed by the payload of the watermarking technique and the length of the content.It is clear that the longer the sequence is, the better the accusation process works.But, in practice, there is certainly a wide range in the length of the sequences to be embedded due to a wide diversity of contents.In the same way, it might be complicated to derive the right value of the threshold for different sequence lengths.We propose a different approach.Once we have recovered the sequence y in the pirated copy, we calculate all the scores of the users to which the content has been delivered and accuse the most likely guilty users, that is, the ones with the highest scores.In the sequel, consider that user j is accused because j = arg max i∈[N] S i .There is no longer need of a threshold.However, we cannot guaranty a probability 1 .To be fair, the output of our accusation process is the index j of the most likely guilty user associated with the probability of making an error, that is, the probability that an innocent gets a score bigger or equal to S j knowing the pirate sequence y: Prob(SCORE(x, y) > S j |y).
We use our algorithm to estimate this probability where x is distributed as a code sequence.Particules in this framework are now binary sequences of length m.The GENERATE and the SCORE functions are given by the construction of the code and its accusation method [15].One very important fact is that the symbols in a code sequence are statistically independent and distributed according to their own law.The MODIFY function is thus very simple: we randomly select a fraction μ of them (parameter μ sets the replication strength), and regenerate them according to their own law.These nondeterministic changes leave the distribution of the code sequence invariant.

Code length
We now come back to the original Tardos focused accusation process.Typical papers about Tardos code [16][17][18] aim to find the tightest lower bound of the length, that is, to find the lower constant A so that m > Ac2 log(1/ 1 ) implies that the constraints on the probabilities of error are fulfilled.In the original Tardos' paper, 2 is set to c/ 4  1 , and A = 100 for his asymmetric code.In our experiments, we use the symmetric version of Tardos code as proposed by Škorić et al. [17] where A = 50 thanks to the symmetry.The goal of this subsection is to experimentally find this constant A, which is a priori challenging because 1 is very low.
The experiment is twofold.First, we need to estimate the plot mapping 1 and the threshold T, for different couples (c, m).We generate c code sequences of length m.The collusion strategy is to randomly pick up the symbols of pirated copy y among the c colluders' sequences.Then, we estimate the curve T = F −1 (Prob(SCORE(x, y) > T|y)) with our algorithm.Indeed, the target threshold is fixed to a very high value so that the algorithm stops after N max iterations.The obtained mapping is indeed N max couples (T j , (k/n) j ).However, this holds for a special occurrence of y.
Therefore, we need to integrate this conditional probability by integrating it over K different sequences {y i } i∈ [K] .Each time, c code sequences are drawn independently and uniformly to forge a pirated sequence.The jth threshold is averaged over the K estimates.
The second part of the experiment measures the false negative probability 2 .A particle is now a set of c Tardos sequences {x 1 , . . ., x c } plus an allocation sequence a which dictates the way to produce y from the c sequences: The SCORE function is more complicated.From a particle, it generates the pirated sequence y thanks to its allocation sequences, and calculates the c accusation sums.The score of the particle is then their mean or their maximum.Tardos and his followers based their analysis on the mean of the scores of the colluders because this leads to tractable equations.The rationale is that if the mean is above the threshold, then there is at least one colluder whose score is higher than the threshold.However, the probability that the mean is below the threshold T is a very rough estimate of the probability of false negative 2 .We choose to follow Tardos' choice of the mean to appreciate the refinement about the constant A given by our experimental investigation compared to the constants found by Tardos and his followers via Chernoff bounds.However, we can also set the score of a particle as the maximum of the c accusation sums in order to really estimate 2 as the probability that none of the c colluders gets caught.The rest of the second part works like the first part.We are interested by estimating the mapping T = F −1 (Prob(max i∈[c] SCORE(x i , y) < T)) (max or mean) using our algorithm.The experiment is run K times, and the intermediate thresholds are averaged for a better precision.Then, we plot in the same figure (see Figure 9) the false positive mapping and the false negative mapping, except that for this latter one, the probability is taken to the power 4/c to provide a fair comparison to previous evaluations of constant A where 2 was set to c/ 4  1 .The intersection of the two mappings at a point (T 0 (m, c), 0 (m, c)) implies that it is possible to find a threshold such that 1 = 0 (m, c) and

Algorithm 2 :
SELECT: Random selection of n objects among a set of size k. with σ 2 p 2 fa (N − 1)

Figure 2 : 1
Figure 2: Relative errors for the same estimator runs as used in Figure 1.

Figure 3 :
Figure 3: Typical evolution of the intermediate thresholds when μ is too small.Intermediate thresholds are taken from the same estimator runs as used in Figure 1 with μ = 0.01.
b e r o f p a r t i c l e s − l o g 1 0 32 34 36 N u m b e r o f it e r a ti o n s

Figure 6 :
Figure 6: Confidence intervals are smaller as number of particles increases.Percentage of estimations over 1000 runs for k/n = 1/2.

Figure 7 :
Figure 7: Confidence intervals are smaller as number of particles increases.Percentage of estimations over 1000 runs for k/n = 3/4.
x a(k) (k) for all k ∈ [m].The indices stored in a are independent and identically distributed over [c].The MODIFY function is twofold.It independently modifies the c Tardos sequences as described above, and it modifies the allocation sequence by randomly selecting a fraction μ of indices and draw them against the uniform law over [c].

Table 1 :
Anderson-Darling test on 1000 estimation runs for p = 1/2 and p = 3/4.The hypothesis of Gaussian distribution is accepted when the score is lower than 0.752 with a significance level of 0.05.