attacks module¶

The Attack class, providing a universal abstract interface describing attacks, and many implementations of it.

class cleverhans.attacks.ABCMeta(name, bases, namespace, **kwargs)[source]¶

Bases: type

Metaclass for defining Abstract Base Classes (ABCs).

Use this metaclass to create an ABC. An ABC can be subclassed directly, and then acts as a mix-in class. You can also register unrelated concrete classes (even built-in classes) and unrelated ABCs as ‘virtual subclasses’ – these and their descendants will be considered subclasses of the registering ABC by the built-in issubclass() function, but the registering ABC won’t show up in their MRO (Method Resolution Order) nor will method implementations defined by the registering ABC be callable (not even via super()).

register(subclass)[source]¶

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

class cleverhans.attacks.Attack(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: object

Abstract base class for all attack classes.

construct_graph(fixed, feedable, x_val, hash_key)[source]¶

Construct the graph required to run the attack through generate_np.

Parameters

fixed – Structural elements that require defining a new graph.
feedable – Arguments that can be fed to the same graph when they take different values.
x_val – symbolic adversarial example
hash_key – the key used to store this graph in our cache

construct_variables(kwargs)[source]¶

Construct the inputs to the attack graph to be used by generate_np.

Parameters: kwargs – Keyword arguments to generate_np.
Returns: Structural arguments Feedable arguments Output of arg_type describing feedable arguments A unique key

generate(x, **kwargs)[source]¶

Generate the attack’s symbolic graph for adversarial examples. This method should be overriden in any child class that implements an attack that is expressable symbolically. Otherwise, it will wrap the numerical implementation as a symbolic operator.

Parameters

x – The model’s symbolic inputs.
**kwargs –
optional parameters used by child classes. Each child class defines additional parameters as needed. Child classes that use the following concepts should use the following names:

clip_min: minimum feature value clip_max: maximum feature value eps: size of norm constraint on adversarial perturbation ord: order of norm constraint nb_iter: number of iterations eps_iter: size of norm constraint on iteration y_target: if specified, the attack is targeted. y: Do not specify if y_target is specified.

If specified, the attack is untargeted, aims to make the output class not be y. If neither y_target nor y is specified, y is inferred by having the model classify the input.

For other concepts, it’s generally a good idea to read other classes and check for name consistency.

Returns

A symbolic representation of the adversarial examples.

generate_np(x_val, **kwargs)[source]¶

Generate adversarial examples and return them as a NumPy array. Sub-classes should not implement this method unless they must perform special handling of arguments.

Parameters

x_val – A NumPy array with the original inputs.
**kwargs –
optional parameters used by child classes.

Returns

A NumPy array holding the adversarial examples.

get_or_guess_labels(x, kwargs)[source]¶: Get the label to use in generating an adversarial example for x. The kwargs are fed directly from the kwargs of the attack. If ‘y’ is in kwargs, then assume it’s an untargeted attack and use that as the label. If ‘y_target’ is in kwargs and is not none, then assume it’s a targeted attack and use that as the label. Otherwise, use the model’s prediction as the label and perform an untargeted attack.

parse_params(params=None)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Parameters: params – a dictionary of attack-specific parameters
Returns: True when parsing was successful

class cleverhans.attacks.BasicIterativeMethod(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.projected_gradient_descent.ProjectedGradientDescent

The BasicIterativeMethod attack.

cleverhans.attacks.BoundaryAttackPlusPlus(model, sess, dtypestr='float32', **kwargs)[source]¶: A previous name used for HopSkipJumpAttack.

class cleverhans.attacks.CallableModelWrapper(callable_fn, output_layer)[source]¶

Bases: cleverhans.model.Model

A wrapper that turns a callable into a valid Model

fprop(x, **kwargs)[source]¶: Forward propagation to compute the model outputs. :param x: A symbolic representation of the network input :return: A dictionary mapping layer names to the symbolic

representation of their output.

class cleverhans.attacks.CarliniWagnerL2(model, sess, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

This attack was originally proposed by Carlini and Wagner. It is an iterative attack that finds adversarial examples on many defenses that are robust to other attacks. Paper link: https://arxiv.org/abs/1608.04644

At a high level, this attack is an iterative attack using Adam and a specially-chosen loss function to find adversarial examples with lower distortion than other attacks. This comes at the cost of speed, as this attack is often much slower than others.

Parameters

model – cleverhans.model.Model
sess – tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

generate(x, **kwargs)[source]¶

Return a tensor that constructs adversarial examples for the given input. Generate uses tf.py_func in order to operate over tensors.

Parameters

x – A tensor with the inputs.
kwargs – See parse_params

parse_params(y=None, y_target=None, batch_size=1, confidence=0, learning_rate=0.005, binary_search_steps=5, max_iterations=1000, abort_early=True, initial_const=0.01, clip_min=0, clip_max=1)[source]¶

Parameters

y – (optional) A tensor with the true labels for an untargeted attack. If None (and y_target is None) then use the original labels the classifier assigns.
y_target – (optional) A tensor with the target labels for a targeted attack.
confidence – Confidence of adversarial examples: higher produces examples with larger l2 distortion, but more strongly classified as adversarial.
batch_size – Number of attacks to run simultaneously.
learning_rate – The learning rate for the attack algorithm. Smaller values produce better results but are slower to converge.
binary_search_steps – The number of times we perform binary search to find the optimal tradeoff- constant between norm of the purturbation and confidence of the classification.
max_iterations – The maximum number of iterations. Setting this to a larger value will produce lower distortion results. Using only a few iterations requires a larger learning rate, and will produce larger distortion results.
abort_early – If true, allows early aborts if gradient descent is unable to make progress (i.e., gets stuck in a local minimum).
initial_const – The initial tradeoff-constant to use to tune the relative importance of size of the perturbation and confidence of classification. If binary_search_steps is large, the initial constant is not important. A smaller value of this constant gives lower distortion results.
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value

class cleverhans.attacks.DeepFool(model, sess, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

DeepFool is an untargeted & iterative attack which is based on an iterative linearization of the classifier. The implementation here is w.r.t. the L2 norm. Paper link: “https://arxiv.org/pdf/1511.04599.pdf”

Parameters

model – cleverhans.model.Model
sess – tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

generate(x, **kwargs)[source]¶

Generate symbolic graph for adversarial examples and return.

Parameters

x – The model’s symbolic inputs.
kwargs – See parse_params

parse_params(nb_candidate=10, overshoot=0.02, max_iter=50, clip_min=0.0, clip_max=1.0, **kwargs)[source]¶

Parameters

nb_candidate – The number of classes to test against, i.e., deepfool only consider nb_candidate classes when attacking(thus accelerate speed). The nb_candidate classes are chosen according to the prediction confidence during implementation.
overshoot – A termination criterion to prevent vanishing updates
max_iter – Maximum number of iteration for deepfool
clip_min – Minimum component value for clipping
clip_max – Maximum component value for clipping

class cleverhans.attacks.ElasticNetMethod(model, sess, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

This attack features L1-oriented adversarial examples and includes the C&W L2 attack as a special case (when beta is set to 0). Adversarial examples attain similar performance to those generated by the C&W L2 attack in the white-box case, and more importantly, have improved transferability properties and complement adversarial training. Paper link: https://arxiv.org/abs/1709.04114

Parameters

model – cleverhans.model.Model
sess – tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

generate(x, **kwargs)[source]¶

Return a tensor that constructs adversarial examples for the given input. Generate uses tf.py_func in order to operate over tensors.

Parameters

x – (required) A tensor with the inputs.
kwargs – See parse_params

parse_params(y=None, y_target=None, beta=0.01, decision_rule='EN', batch_size=1, confidence=0, learning_rate=0.01, binary_search_steps=9, max_iterations=1000, abort_early=False, initial_const=0.001, clip_min=0, clip_max=1)[source]¶

Parameters

y – (optional) A tensor with the true labels for an untargeted attack. If None (and y_target is None) then use the original labels the classifier assigns.
y_target – (optional) A tensor with the target labels for a targeted attack.
beta – Trades off L2 distortion with L1 distortion: higher produces examples with lower L1 distortion, at the cost of higher L2 (and typically Linf) distortion
decision_rule – EN or L1. Select final adversarial example from all successful examples based on the least elastic-net or L1 distortion criterion.
confidence – Confidence of adversarial examples: higher produces examples with larger l2 distortion, but more strongly classified as adversarial.
batch_size – Number of attacks to run simultaneously.
learning_rate – The learning rate for the attack algorithm. Smaller values produce better results but are slower to converge.
binary_search_steps – The number of times we perform binary search to find the optimal tradeoff- constant between norm of the perturbation and confidence of the classification. Set ‘initial_const’ to a large value and fix this param to 1 for speed.
max_iterations – The maximum number of iterations. Setting this to a larger value will produce lower distortion results. Using only a few iterations requires a larger learning rate, and will produce larger distortion results.
abort_early – If true, allows early abort when the total loss starts to increase (greatly speeds up attack, but hurts performance, particularly on ImageNet)
initial_const – The initial tradeoff-constant to use to tune the relative importance of size of the perturbation and confidence of classification. If binary_search_steps is large, the initial constant is not important. A smaller value of this constant gives lower distortion results. For computational efficiency, fix binary_search_steps to 1 and set this param to a large value.
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value

class cleverhans.attacks.FastFeatureAdversaries(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

This is a fast implementation of “Feature Adversaries”, an attack against a target internal representation of a model. “Feature adversaries” were originally introduced in (Sabour et al. 2016), where the optimization was done using LBFGS. Paper link: https://arxiv.org/abs/1511.05122

This implementation is similar to “Basic Iterative Method” (Kurakin et al. 2016) but applied to the internal representations.

Parameters

model – cleverhans.model.Model
sess – optional tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

attack_single_step(x, eta, g_feat)[source]¶

TensorFlow implementation of the Fast Feature Gradient. This is a single step attack similar to Fast Gradient Method that attacks an internal representation.

Parameters

x – the input placeholder
eta – A tensor the same shape as x that holds the perturbation.
g_feat – model’s internal tensor for guide

Returns

a tensor for the adversarial example

generate(x, g, **kwargs)[source]¶

Generate symbolic graph for adversarial examples and return.

Parameters

x – The model’s symbolic inputs.
g – The target value of the symbolic representation
kwargs – See parse_params

parse_params(layer=None, eps=0.3, eps_iter=0.05, nb_iter=10, ord=inf, clip_min=None, clip_max=None, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Attack-specific parameters:

Parameters

layer – (required str) name of the layer to target.
eps – (optional float) maximum distortion of adversarial example compared to original input
eps_iter – (optional float) step size for each attack iteration
nb_iter – (optional int) Number of attack iterations.
ord – (optional) Order of the norm (mimics Numpy). Possible values: np.inf, 1 or 2.
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value

class cleverhans.attacks.FastGradientMethod(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

This attack was originally implemented by Goodfellow et al. (2014) with the infinity norm (and is known as the “Fast Gradient Sign Method”). This implementation extends the attack to other norms, and is therefore called the Fast Gradient Method. Paper link: https://arxiv.org/abs/1412.6572

Parameters

model – cleverhans.model.Model
sess – optional tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

generate(x, **kwargs)[source]¶

Returns the graph for Fast Gradient Method adversarial examples.

Parameters

x – The model’s symbolic inputs.
kwargs – See parse_params

parse_params(eps=0.3, ord=inf, loss_fn=<function softmax_cross_entropy_with_logits>, y=None, y_target=None, clip_min=None, clip_max=None, clip_grad=False, sanity_checks=True, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Attack-specific parameters:

Parameters

eps – (optional float) attack step size (input variation)
ord – (optional) Order of the norm (mimics NumPy). Possible values: np.inf, 1 or 2.
loss_fn – Loss function that takes (labels, logits) as arguments and returns loss
y – (optional) A tensor with the true labels. Only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None. Labels should be one-hot-encoded.
y_target – (optional) A tensor with the labels to target. Leave y_target=None if y is also set. Labels should be one-hot-encoded.
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value
clip_grad – (optional bool) Ignore gradient components at positions where the input is already at the boundary of the domain, and the update step will get clipped out.
sanity_checks – bool, if True, include asserts (Turn them off to use less runtime / memory or for unit tests that intentionally pass strange input)

class cleverhans.attacks.HopSkipJumpAttack(model, sess, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

HopSkipJumpAttack was originally proposed by Chen, Jordan and Wainwright. It is a decision-based attack that requires access to output labels of a model alone. Paper link: https://arxiv.org/abs/1904.02144 At a high level, this attack is an iterative attack composed of three steps: Binary search to approach the boundary; gradient estimation; stepsize search. HopSkipJumpAttack requires fewer model queries than Boundary Attack which was based on rejective sampling. :param model: cleverhans.model.Model :param sess: tf.Session :param dtypestr: dtype of the data :param kwargs: passed through to super constructor. see parse_params for details.

generate(x, **kwargs)[source]¶: Return a tensor that constructs adversarial examples for the given input. Generate uses tf.py_func in order to operate over tensors. :param x: A tensor with the inputs. :param kwargs: See parse_params

generate_np(x, **kwargs)[source]¶

Generate adversarial images in a for loop. :param y: An array of shape (n, nb_classes) for true labels. :param y_target: An array of shape (n, nb_classes) for target labels. Required for targeted attack. :param image_target: An array of shape (n, **image shape) for initial target images. Required for targeted attack.

See parse_params for other kwargs.

parse_params(y_target=None, image_target=None, initial_num_evals=100, max_num_evals=10000, stepsize_search='geometric_progression', num_iterations=64, gamma=1.0, constraint='l2', batch_size=128, verbose=True, clip_min=0, clip_max=1)[source]¶

Parameters

y – A tensor of shape (1, nb_classes) for true labels.
y_target – A tensor of shape (1, nb_classes) for target labels.

Required for targeted attack. :param image_target: A tensor of shape (1, **image shape) for initial target images. Required for targeted attack. :param initial_num_evals: initial number of evaluations for

gradient estimation.

Parameters

max_num_evals – maximum number of evaluations for gradient estimation.
stepsize_search –
How to search for stepsize; choices are ‘geometric_progression’, ‘grid_search’. ‘geometric progression’ initializes the stepsize

by ||x_t - x||_p / sqrt(iteration), and keep decreasing by half until reaching the target side of the boundary. ‘grid_search’ chooses the optimal epsilon over a grid, in the scale of ||x_t - x||_p.
num_iterations – The number of iterations.
gamma – The binary search threshold theta is gamma / d^{3/2} for l2 attack and gamma / d^2 for linf attack.
constraint – The distance to optimize; choices are ‘l2’, ‘linf’.
batch_size – batch_size for model prediction.
verbose – (boolean) Whether distance at each step is printed.
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value

class cleverhans.attacks.LBFGS(model, sess, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

LBFGS is the first adversarial attack for convolutional neural networks, and is a target & iterative attack. Paper link: “https://arxiv.org/pdf/1312.6199.pdf” :param model: cleverhans.model.Model :param sess: tf.Session :param dtypestr: dtype of the data :param kwargs: passed through to super constructor

generate(x, **kwargs)[source]¶: Return a tensor that constructs adversarial examples for the given input. Generate uses tf.py_func in order to operate over tensors. :param x: (required) A tensor with the inputs. :param kwargs: See parse_params

parse_params(y_target=None, batch_size=1, binary_search_steps=5, max_iterations=1000, initial_const=0.01, clip_min=0, clip_max=1)[source]¶

Parameters

y_target – (optional) A tensor with the one-hot target labels.
batch_size – The number of inputs to include in a batch and process simultaneously.
binary_search_steps – The number of times we perform binary search to find the optimal tradeoff- constant between norm of the purturbation and cross-entropy loss of classification.
max_iterations – The maximum number of iterations.
initial_const – The initial tradeoff-constant to use to tune the relative importance of size of the perturbation and cross-entropy loss of the classification.
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value

class cleverhans.attacks.MadryEtAl(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.projected_gradient_descent.ProjectedGradientDescent

The attack from Madry et al 2017

class cleverhans.attacks.MaxConfidence(model, sess=None, base_attacker=None)[source]¶

Bases: cleverhans.attacks.attack.Attack

The MaxConfidence attack.

An attack designed for use against models that use confidence thresholding as a defense. If the underlying optimizer is optimal, this attack procedure gives the optimal failure rate for every confidence threshold t > 0.5.

Publication: https://openreview.net/forum?id=H1g0piA9tQ

Parameters

model – cleverhans.model.Model
sess – optional tf.session.Session
base_attacker – cleverhans.attacks.Attack

attack(x, true_y)[source]¶: Runs the untargeted attack. :param x: The input :param true_y: The correct label for x. This attack aims to produce misclassification.

attack_class(x, target_y)[source]¶: Run the attack on a specific target class. :param x: tf Tensor. The input example. :param target_y: tf Tensor. The attacker’s desired target class. Returns:

A targeted adversarial example, intended to be classified as the target class.

generate(x, **kwargs)[source]¶

Generate symbolic graph for adversarial examples and return.

Parameters

x – The model’s symbolic inputs.
kwargs – Keyword arguments for the base attacker

parse_params(y=None, nb_classes=10, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Parameters: params – a dictionary of attack-specific parameters
Returns: True when parsing was successful

class cleverhans.attacks.Model(scope=None, nb_classes=None, hparams=None, needs_dummy_fprop=False)[source]¶

Bases: object

An abstract interface for model wrappers that exposes model symbols needed for making an attack. This abstraction removes the dependency on any specific neural network package (e.g. Keras) from the core code of CleverHans. It can also simplify exposing the hidden features of a model when a specific package does not directly expose them.

O_FEATURES = 'features'¶

O_LOGITS = 'logits'¶

O_PROBS = 'probs'¶

fprop(x, **kwargs)[source]¶: Forward propagation to compute the model outputs. :param x: A symbolic representation of the network input :return: A dictionary mapping layer names to the symbolic

representation of their output.

get_layer(x, layer, **kwargs)[source]¶: Return a layer output. :param x: tensor, the input to the network. :param layer: str, the name of the layer to compute. :param **kwargs: dict, extra optional params to pass to self.fprop. :return: the content of layer layer

get_layer_names()[source]¶: Return the list of exposed layers for this model.

get_logits(x, **kwargs)[source]¶

Parameters: x – A symbolic representation (Tensor) of the network input
Returns: A symbolic representation (Tensor) of the output logits

(i.e., the values fed as inputs to the softmax layer).

get_params()[source]¶: Provides access to the model’s parameters. :return: A list of all Variables defining the model parameters.

get_predicted_class(x, **kwargs)[source]¶

Parameters: x – A symbolic representation (Tensor) of the network input
Returns: A symbolic representation (Tensor) of the predicted label

get_probs(x, **kwargs)[source]¶

Parameters: x – A symbolic representation (Tensor) of the network input
Returns: A symbolic representation (Tensor) of the output

probabilities (i.e., the output values produced by the softmax layer).

make_input_placeholder()[source]¶

Create and return a placeholder representing an input to the model.

This method should respect context managers (e.g. “with tf.device”) and should not just return a reference to a single pre-created placeholder.

make_label_placeholder()[source]¶

Create and return a placeholder representing class labels.

This method should respect context managers (e.g. “with tf.device”) and should not just return a reference to a single pre-created placeholder.

make_params()[source]¶: Create all Variables to be returned later by get_params. By default this is a no-op. Models that need their fprop to be called for their params to be created can set needs_dummy_fprop=True in the constructor.

class cleverhans.attacks.MomentumIterativeMethod(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

The Momentum Iterative Method (Dong et al. 2017). This method won the first places in NIPS 2017 Non-targeted Adversarial Attacks and Targeted Adversarial Attacks. The original paper used hard labels for this attack; no label smoothing. Paper link: https://arxiv.org/pdf/1710.06081.pdf

Parameters

model – cleverhans.model.Model
sess – optional tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

generate(x, **kwargs)[source]¶

Generate symbolic graph for adversarial examples and return.

Parameters

x – The model’s symbolic inputs.
kwargs – Keyword arguments. See parse_params for documentation.

parse_params(eps=0.3, eps_iter=0.06, nb_iter=10, y=None, ord=inf, decay_factor=1.0, clip_min=None, clip_max=None, y_target=None, sanity_checks=True, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Attack-specific parameters:

Parameters

eps – (optional float) maximum distortion of adversarial example compared to original input
eps_iter – (optional float) step size for each attack iteration
nb_iter – (optional int) Number of attack iterations.
y – (optional) A tensor with the true labels.
y_target – (optional) A tensor with the labels to target. Leave y_target=None if y is also set. Labels should be one-hot-encoded.
ord – (optional) Order of the norm (mimics Numpy). Possible values: np.inf, 1 or 2.
decay_factor – (optional) Decay factor for the momentum term.
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value

class cleverhans.attacks.Noise(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

A weak attack that just picks a random point in the attacker’s action space. When combined with an attack bundling function, this can be used to implement random search.

References: https://arxiv.org/abs/1802.00420 recommends random search to help identify

gradient masking.

https://openreview.net/forum?id=H1g0piA9tQ recommends using noise as part: of an attack bundling recipe combining many different optimizers to yield a stronger optimizer.

Parameters

model – cleverhans.model.Model
sess – optional tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

generate(x, **kwargs)[source]¶

Generate symbolic graph for adversarial examples and return.

Parameters

x – The model’s symbolic inputs.
kwargs – See parse_params

parse_params(eps=0.3, ord=inf, clip_min=None, clip_max=None, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Attack-specific parameters:

Parameters

eps – (optional float) maximum distortion of adversarial example compared to original input
ord – (optional) Order of the norm (mimics Numpy). Possible values: np.inf
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value

class cleverhans.attacks.ProjectedGradientDescent(model, sess=None, dtypestr='float32', default_rand_init=True, **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

This class implements either the Basic Iterative Method (Kurakin et al. 2016) when rand_init is set to 0. or the Madry et al. (2017) method when rand_minmax is larger than 0. Paper link (Kurakin et al. 2016): https://arxiv.org/pdf/1607.02533.pdf Paper link (Madry et al. 2017): https://arxiv.org/pdf/1706.06083.pdf

Parameters

model – cleverhans.model.Model
sess – optional tf.Session
dtypestr – dtype of the data
default_rand_init – whether to use random initialization by default
kwargs – passed through to super constructor

FGM_CLASS¶: alias of cleverhans.attacks.fast_gradient_method.FastGradientMethod

generate(x, **kwargs)[source]¶

Generate symbolic graph for adversarial examples and return.

Parameters

x – The model’s symbolic inputs.
kwargs – See parse_params

parse_params(eps=0.3, eps_iter=0.05, nb_iter=10, y=None, ord=inf, loss_fn=<function softmax_cross_entropy_with_logits>, clip_min=None, clip_max=None, y_target=None, rand_init=None, rand_init_eps=None, clip_grad=False, sanity_checks=True, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Attack-specific parameters:

Parameters

eps – (optional float) maximum distortion of adversarial example compared to original input
eps_iter – (optional float) step size for each attack iteration
nb_iter – (optional int) Number of attack iterations.
y – (optional) A tensor with the true labels.
y_target – (optional) A tensor with the labels to target. Leave y_target=None if y is also set. Labels should be one-hot-encoded.
ord – (optional) Order of the norm (mimics Numpy). Possible values: np.inf, 1 or 2.
loss_fn – Loss function that takes (labels, logits) as arguments and returns loss
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value
rand_init – (optional) Start the gradient descent from a point chosen uniformly at random in the norm ball of radius rand_init_eps
rand_init_eps – (optional float) size of the norm ball from which the initial starting point is chosen. Defaults to eps
clip_grad – (optional bool) Ignore gradient components at positions where the input is already at the boundary of the domain, and the update step will get clipped out.
sanity_checks –
bool Insert tf asserts checking values (Some tests need to run with no sanity checks because the

tests intentionally configure the attack strangely)

class cleverhans.attacks.SPSA(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

This implements the SPSA adversary, as in https://arxiv.org/abs/1802.05666 (Uesato et al. 2018). SPSA is a gradient-free optimization method, which is useful when the model is non-differentiable, or more generally, the gradients do not point in useful directions.

Parameters

model – cleverhans.model.Model
sess – optional tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

DEFAULT_DELTA = 0.01¶

DEFAULT_LEARNING_RATE = 0.01¶

DEFAULT_SPSA_ITERS = 1¶

DEFAULT_SPSA_SAMPLES = 128¶

generate(x, y=None, y_target=None, eps=None, clip_min=None, clip_max=None, nb_iter=None, is_targeted=None, early_stop_loss_threshold=None, learning_rate=0.01, delta=0.01, spsa_samples=128, batch_size=None, spsa_iters=1, is_debug=False, epsilon=None, num_steps=None)[source]¶

Generate symbolic graph for adversarial examples.

Parameters

x – The model’s symbolic inputs. Must be a batch of size 1.
y – A Tensor or None. The index of the correct label.
y_target – A Tensor or None. The index of the target label in a targeted attack.
eps – The size of the maximum perturbation, measured in the L-infinity norm.
clip_min – If specified, the minimum input value
clip_max – If specified, the maximum input value
nb_iter – The number of optimization steps.
early_stop_loss_threshold – A float or None. If specified, the attack will end as soon as the loss is below early_stop_loss_threshold.
learning_rate – Learning rate of ADAM optimizer.
delta – Perturbation size used for SPSA approximation.
spsa_samples – Number of inputs to evaluate at a single time. The true batch size (the number of evaluated inputs for each update) is spsa_samples * spsa_iters
batch_size – Deprecated param that is an alias for spsa_samples
spsa_iters – Number of model evaluations before performing an update, where each evaluation is on spsa_samples different inputs.
is_debug – If True, print the adversarial loss after each update.
epsilon – Deprecated alias for eps
num_steps – Deprecated alias for nb_iter.
is_targeted – Deprecated argument. Ignored.

generate_np(x_val, **kwargs)[source]¶

Generate adversarial examples and return them as a NumPy array. Sub-classes should not implement this method unless they must perform special handling of arguments.

Parameters

x_val – A NumPy array with the original inputs.
**kwargs –
optional parameters used by child classes.

Returns

A NumPy array holding the adversarial examples.

class cleverhans.attacks.SaliencyMapMethod(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

The Jacobian-based Saliency Map Method (Papernot et al. 2016). Paper link: https://arxiv.org/pdf/1511.07528.pdf

Parameters

model – cleverhans.model.Model
sess – optional tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

Note

When not using symbolic implementation in generate, sess should be provided

generate(x, **kwargs)[source]¶

Generate symbolic graph for adversarial examples and return.

Parameters

x – The model’s symbolic inputs.
kwargs – See parse_params

parse_params(theta=1.0, gamma=1.0, clip_min=0.0, clip_max=1.0, y_target=None, symbolic_impl=True, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Attack-specific parameters:

Parameters

theta – (optional float) Perturbation introduced to modified components (can be positive or negative)
gamma – (optional float) Maximum percentage of perturbed features
clip_min – (optional float) Minimum component value for clipping
clip_max – (optional float) Maximum component value for clipping
y_target – (optional) Target tensor if the attack is targeted

class cleverhans.attacks.Semantic(model, center, max_val=1.0, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

Semantic adversarial examples

https://arxiv.org/abs/1703.06857

Note: data must either be centered (so that the negative image can be made by simple negation) or must be in the interval [-1, 1]

Parameters

model – cleverhans.model.Model
center – bool If True, assumes data has 0 mean so the negative image is just negation. If False, assumes data is in the interval [0, max_val]
max_val – float Maximum value allowed in the input data
sess – optional tf.Session
dtypestr – dtype of data
kwargs – passed through to the super constructor

generate(x, **kwargs)[source]¶

Generate the attack’s symbolic graph for adversarial examples. This method should be overriden in any child class that implements an attack that is expressable symbolically. Otherwise, it will wrap the numerical implementation as a symbolic operator.

Parameters

x – The model’s symbolic inputs.
**kwargs –
optional parameters used by child classes. Each child class defines additional parameters as needed. Child classes that use the following concepts should use the following names:

clip_min: minimum feature value clip_max: maximum feature value eps: size of norm constraint on adversarial perturbation ord: order of norm constraint nb_iter: number of iterations eps_iter: size of norm constraint on iteration y_target: if specified, the attack is targeted. y: Do not specify if y_target is specified.

If specified, the attack is untargeted, aims to make the output class not be y. If neither y_target nor y is specified, y is inferred by having the model classify the input.

For other concepts, it’s generally a good idea to read other classes and check for name consistency.

Returns

A symbolic representation of the adversarial examples.

class cleverhans.attacks.SparseL1Descent(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

This class implements a variant of Projected Gradient Descent for the l1-norm (Tramer and Boneh 2019). The l1-norm case is more tricky than the l-inf and l2 cases covered by the ProjectedGradientDescent class, because the steepest descent direction for the l1-norm is too sparse (it updates a single coordinate in the adversarial perturbation in each step). This attack has an additional parameter that controls the sparsity of the update step. For moderately sparse update steps, the attack vastly outperforms Projected Steepest Descent and is competitive with other attacks targeted at the l1-norm such as the ElasticNetMethod attack (which is much more computationally expensive). Paper link (Tramer and Boneh 2019): https://arxiv.org/pdf/1904.13000.pdf

Parameters

model – cleverhans.model.Model
sess – optional tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

generate(x, **kwargs)[source]¶

Generate symbolic graph for adversarial examples and return.

Parameters

x – The model’s symbolic inputs.
kwargs – See parse_params

parse_params(eps=10.0, eps_iter=1.0, nb_iter=20, y=None, clip_min=None, clip_max=None, y_target=None, rand_init=False, clip_grad=False, grad_sparsity=99, sanity_checks=True, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Attack-specific parameters:

Parameters

eps – (optional float) maximum distortion of adversarial example compared to original input
eps_iter – (optional float) step size for each attack iteration
nb_iter – (optional int) Number of attack iterations.
y – (optional) A tensor with the true labels.
y_target – (optional) A tensor with the labels to target. Leave y_target=None if y is also set. Labels should be one-hot-encoded.
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value
clip_grad – (optional bool) Ignore gradient components at positions where the input is already at the boundary of the domain, and the update step will get clipped out.

:param grad_sparsity (optional) Relative sparsity of the gradient update: step, in percent. Only gradient values larger than this percentile are retained. This parameter can be a scalar, or a vector of the same length as the input batch dimension.

Parameters

sanity_checks –

bool Insert tf asserts checking values (Some tests need to run with no sanity checks because the

tests intentionally configure the attack strangely)

class cleverhans.attacks.SpatialTransformationMethod(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

Spatial transformation attack

generate(x, **kwargs)[source]¶: Generate symbolic graph for adversarial examples and return. :param x: The model’s symbolic inputs. :param kwargs: See parse_params

parse_params(n_samples=None, dx_min=- 0.1, dx_max=0.1, n_dxs=2, dy_min=- 0.1, dy_max=0.1, n_dys=2, angle_min=- 30, angle_max=30, n_angles=6, black_border_size=0, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes. :param n_samples: (optional) The number of transformations sampled to

construct the attack. Set it to None to run full grid attack.

Parameters

dx_min – (optional float) Minimum translation ratio along x-axis.
dx_max – (optional float) Maximum translation ratio along x-axis.
n_dxs – (optional int) Number of discretized translation ratios along x-axis.
dy_min – (optional float) Minimum translation ratio along y-axis.
dy_max – (optional float) Maximum translation ratio along y-axis.
n_dys – (optional int) Number of discretized translation ratios along y-axis.
angle_min – (optional float) Largest counter-clockwise rotation angle.
angle_max – (optional float) Largest clockwise rotation angle.
n_angles – (optional int) Number of discretized angles.
black_border_size – (optional int) size of the black border in pixels.

class cleverhans.attacks.VirtualAdversarialMethod(model, sess=None, dtypestr='float32', **kwargs)[source]¶

Bases: cleverhans.attacks.attack.Attack

This attack was originally proposed by Miyato et al. (2016) and was used for virtual adversarial training. Paper link: https://arxiv.org/abs/1507.00677

Parameters

model – cleverhans.model.Model
sess – optional tf.Session
dtypestr – dtype of the data
kwargs – passed through to super constructor

generate(x, **kwargs)[source]¶

Generate symbolic graph for adversarial examples and return.

Parameters

x – The model’s symbolic inputs.
kwargs – See parse_params

parse_params(eps=2.0, nb_iter=None, xi=1e-06, clip_min=None, clip_max=None, num_iterations=None, **kwargs)[source]¶

Take in a dictionary of parameters and applies attack-specific checks before saving them as attributes.

Attack-specific parameters:

Parameters

eps – (optional float )the epsilon (input variation parameter)
nb_iter – (optional) the number of iterations Defaults to 1 if not specified
xi – (optional float) the finite difference parameter
clip_min – (optional float) Minimum input component value
clip_max – (optional float) Maximum input component value
num_iterations – Deprecated alias for nb_iter

cleverhans.attacks.clip_eta(eta, ord, eps)[source]¶

Helper function to clip the perturbation to epsilon norm ball. :param eta: A tensor with the current perturbation. :param ord: Order of the norm (mimics Numpy).

Possible values: np.inf, 1 or 2.

Parameters: eps – Epsilon, bound of the perturbation.

cleverhans.attacks.fgm(x, logits, y=None, eps=0.3, ord=inf, loss_fn=<function softmax_cross_entropy_with_logits>, clip_min=None, clip_max=None, clip_grad=False, targeted=False, sanity_checks=True)[source]¶

TensorFlow implementation of the Fast Gradient Method. :param x: the input placeholder :param logits: output of model.get_logits :param y: (optional) A placeholder for the true labels. If targeted

is true, then provide the target label. Otherwise, only provide this parameter if you’d like to use true labels when crafting adversarial samples. Otherwise, model predictions are used as labels to avoid the “label leaking” effect (explained in this paper: https://arxiv.org/abs/1611.01236). Default is None. Labels should be one-hot-encoded.

Parameters

eps – the epsilon (input variation parameter)
ord – (optional) Order of the norm (mimics NumPy). Possible values: np.inf, 1 or 2.
loss_fn – Loss function that takes (labels, logits) as arguments and returns loss
clip_min – Minimum float value for adversarial example components
clip_max – Maximum float value for adversarial example components
clip_grad – (optional bool) Ignore gradient components at positions where the input is already at the boundary of the domain, and the update step will get clipped out.
targeted – Is the attack targeted or untargeted? Untargeted, the default, will try to make the label incorrect. Targeted will instead try to move in the direction of being more like y.

Returns

a tensor for the adversarial example

cleverhans.attacks.optimize_linear(grad, eps, ord=inf)[source]¶

Solves for the optimal input to a linear function under a norm constraint.

Optimal_perturbation = argmax_{eta, ||eta||_{ord} < eps} dot(eta, grad)

Parameters

grad – tf tensor containing a batch of gradients
eps – float scalar specifying size of constraint region
ord – int specifying order of norm

Returns

tf tensor containing optimal perturbation

cleverhans.attacks.projected_optimization(loss_fn, input_image, label, epsilon, num_steps, clip_min=None, clip_max=None, optimizer=<cleverhans.attacks.spsa.TensorAdam object>, project_perturbation=<function _project_perturbation>, early_stop_loss_threshold=None, is_debug=False)[source]¶

Generic projected optimization, generalized to work with approximate gradients. Used for e.g. the SPSA attack.

Args:

param loss_fn

A callable which takes input_image and label as arguments, and returns a batch of loss values. Same interface as TensorOptimizer.

param input_image

Tensor, a batch of images

param label

Tensor, a batch of labels

param epsilon

float, the L-infinity norm of the maximum allowable perturbation

param num_steps

int, the number of steps of gradient descent

param clip_min

float, minimum pixel value

param clip_max

float, maximum pixel value

param optimizer

A TensorOptimizer object

param project_perturbation

A function, which will be used to enforce some constraint. It should have the same signature as _project_perturbation.

param early_stop_loss_threshold

A float or None. If specified, the attack will end if the loss is below early_stop_loss_threshold.

Enabling this option can have several different effects:

Setting the threshold to 0. guarantees that if a successful attack is found, it is returned. This increases the attack success rate, because without early stopping the optimizer can accidentally bounce back to a point where the attack fails.

Early stopping can make the attack run faster because it may run for fewer steps.

Early stopping can make the attack run slower because the loss must be calculated at each step. The loss is not calculated as part of the normal SPSA optimization procedure. For most reasonable choices of hyperparameters, early stopping makes the attack much faster because it decreases the number of steps dramatically.

param is_debug

A bool. If True, print debug info for attack progress.

Returns:

adversarial version of input_image, with L-infinity difference less than: epsilon, which tries to minimize loss_fn.

Note that this function is not intended as an Attack by itself. Rather, it is designed as a helper function which you can use to write your own attack methods. The method uses a tf.while_loop to optimize a loss function in a single sess.run() call.

cleverhans.attacks.reduce_max(*args, **kwargs)¶: Issues a deprecation warning and passes through the arguments.

cleverhans.attacks.reduce_mean(*args, **kwargs)¶: Issues a deprecation warning and passes through the arguments.

cleverhans.attacks.reduce_sum(*args, **kwargs)¶: Issues a deprecation warning and passes through the arguments.

cleverhans.attacks.softmax_cross_entropy_with_logits(sentinel=None, labels=None, logits=None, dim=- 1)[source]¶: Wrapper around tf.nn.softmax_cross_entropy_with_logits_v2 to handle deprecated warning

cleverhans.attacks.vatm(model, x, logits, eps, num_iterations=1, xi=1e-06, clip_min=None, clip_max=None, scope=None)[source]¶

Tensorflow implementation of the perturbation method used for virtual adversarial training: https://arxiv.org/abs/1507.00677 :param model: the model which returns the network unnormalized logits :param x: the input placeholder :param logits: the model’s unnormalized output tensor (the input to

the softmax layer)

Parameters

eps – the epsilon (input variation parameter)
num_iterations – the number of iterations
xi – the finite difference parameter
clip_min – optional parameter that can be used to set a minimum value for components of the example returned
clip_max – optional parameter that can be used to set a maximum value for components of the example returned
seed – the seed for random generator

Returns

a tensor for the adversarial example

cleverhans.attacks.wrapper_warning()[source]¶

Issue a deprecation warning. Used in multiple places that implemented attacks by automatically wrapping a user-supplied callable with a CallableModelWrapper with output_layer=”probs”. Using “probs” as any part of the attack interface is dangerous. We can’t just change output_layer to logits because: - that would be a silent interface change. We’d have no way of detecting

code that still means to use probs. Note that we can’t just check whether the final output op is a softmax—for example, Inception puts a reshape after the softmax.

automatically wrapping user-supplied callables with output_layer=’logits’ is even worse, see wrapper_warning_logits

Note: this function will be removed at the same time as the code that calls it.

cleverhans.attacks.wrapper_warning_logits()[source]¶: Issue a deprecation warning. Used in multiple places that implemented attacks by automatically wrapping a user-supplied callable with a CallableModelWrapper with output_layer=”logits”. This is dangerous because it is under-the-hood automagic that the user may not realize has been invoked for them. If they pass a callable that actually outputs probs, the probs will be treated as logits, resulting in an incorrect cross-entropy loss and severe gradient masking.

cleverhans.attacks.xrange¶: alias of range

attacks module¶

CleverHans

Navigation

Related Topics