Agilkia API Docs

TraceSet

class agilkia.TraceSet(traces: List[agilkia.json_traces.Trace], meta_data: Dict[str, Any] = None)

Bases: object

Represents a set of traces, either generated or recorded.

Typical usage is to create an empty TraceSet and then add traces to it one by one:

traces = agilkia.TraceSet([], meta_data = {"author":"MarkU", "dataset":"Example 1"})
for i in ...:
    traces.append(agilkia.Trace(...))

Once all traces have been added, the TraceSet should be considered read-only (except for adding meta-data and clustering information). If you want to create subsets of the traces, it is recommended to create those as new TraceSet objects.

Invariants:
  • forall tr:self.traces (tr._parent is self) (TODO: set _parent to None when a trace is removed?)
  • self.meta_data is a dict with keys: date, source at least.
Public data fields include:
  • self.traces: List[Trace]. However, the iteration, indexing, and len(_) methods have been lifted from the trace list up to the top-level TraceSet object, so you may not need to access self.traces at all.
  • self.meta_data: MetaData. Or use get_meta(key) to get an individual meta-data value.
  • self.version: str. Version number of this TraceSet object.
  • self.cluster_labels: optional list giving a cluster number for each trace.
    That is, self.cluster_labels[i] is the number of the cluster that trace self.traces[i] (or equivalently, self[i]) belongs to.
  • self.cluster_linkage: optional hierarchical clustering (SciPy linkage matrix).
append(trace: agilkia.json_traces.Trace)

Appends the given trace into this set of traces. This also sets its parent to be this trace set.

arff_type(pandas_type: str) → Union[str, List[str]]

Maps each Pandas data type to the closest ARFF type.

create_clusters(data: pandas.core.frame.DataFrame, algorithm=None, normalizer=None, fit: bool = True) → int

Runs a clustering algorithm on the given data and remembers the clusters.

Note that clustering results are now saved into JSON files.

Parameters:
  • data – a Pandas DataFrame, typically from get_trace_data(), with the i’th row of the DataFrame being for the i’th trace in this set of traces.
  • algorithm – a clustering algorithm (default is MeanShift()).
  • normalizer – a normalization algorithm (default is MinMaxScaler).
  • fit – True means fit the data into clusters, False means just predict clusters assuming that the algorithm and normalizer have already been trained.
Returns:

The number of clusters generated.

extend(traces: List[agilkia.json_traces.Trace])

Appends all the given traces into this set of traces. This also sets their parents to be this trace set.

get_all_actions(event_to_str=None)

Returns a sorted list (with duplicates removed) of all the keys in data.

Parameters:event_to_str (Event -> str) – an optional feature-extractor function that maps each event to a single string. The default is just to return the action name of the event. This can be used to customise the column names in the DataFrame generated by get_trace_data.
get_cluster(num: int) → List[agilkia.json_traces.Trace]

Gets a list of all the Trace objects that are in the given cluster.

get_clusters() → List[int]

Get the list of cluster numbers for each trace.

Precondition: self.is_clustered()

classmethod get_default_meta_data() → Dict[str, Any]

Generates some basic meta-data such as date, user and command line.

get_event_chars()

Gets the event-to-char map that is used to visualise traces.

This maps each action name to a single character. If set_event_chars has not been called, this getter will calculate and cache a default mapping from action names to characters.

get_meta(key: str) → Optional[Any]

Returns requested meta data, or None if that key does not exist.

get_num_clusters() → int

Return the number of clusters. Zero means not clustered.

get_trace_data(method: Union[str, Callable[[agilkia.json_traces.Event], str]] = 'action_counts', columns: List[str] = None) → pandas.core.frame.DataFrame

Returns a Pandas table of statistics/data about each trace.

The resulting table can be used as training data for machine learning algorithms. The method specifies the feature-encoding function for each trace. Any missing data values are replaced by zeroes. The default method is Trace.action_counts, which does a bag-of-words encoding using the event action names. It is equivalent to:

lambda tr: tr.action_counts()

or (expanding out the default Event-to-string mapping of action_counts):

lambda tr: tr.action_counts(event_to_str=(lambda ev: ev.action)

As another example, if you wanted to encode pairs of events, you could do it using a method function like this:

lambda tr: Counter([f"{tr[i].action}_{tr[i+1].action}" for i in range(len(tr) - 1)])
Parameters:
  • method – the feature encoding method to use for each trace. This method must return a Dict[str, number] for some kind of number. The method can be specified by name, if one of the existing methods in the Trace class is what you want. The default is the action_counts() method, which corresponds to the bag-of-words algorithm, counting one string from each Event. Alternatively, method can be a user-defined function that takes a Trace and returns its features (a Dict[str, int]).
  • columns – optional list of column names. This can be used to reorder or remove or add columns. (Any added columns will be filled with zeroes).
Returns:

A table of data that can be used for clustering or machine learning. If columns is not specified, the columns of the table will be in alphabetical order. The i’th row of the table is the data for the i’th trace in this set.

classmethod load_from_json(file: pathlib.Path) → agilkia.json_traces.TraceSet

Load traces from the given file.

This upgrades older trace sets to the current version if possible.

message(msg: str)

Print a progress message.

save_to_arff(file: pathlib.Path, name=None) → None

Save all the events in all traces into an ARFF file for machine learning.

Parameters:
  • filename – the name of the file to save into. Should end with ‘.arff’.
  • name – optional relation name to identify this data inside the ARFF file. The default is the base name of ‘file’.
save_to_json(file: pathlib.Path) → None

Saves this TraceSet into the given file[.json] in JSON format.

The file extension is forced to be .json if it is not already that. The file includes a version number so that older data files can be updated if possible.

set_clusters(labels: List[int], linkage: numpy.ndarray = None)

Record clustering information for the traces in this TraceSet.

The set of flat clusters must be given - one cluster number for each Trace.

If hierarchical clusters are supplied (as a linkage array), then the flat clusters are typically a cut through that tree.

After this method has been called, the flat clusters will be saved in self.cluster_labels. If the linkage argument is not None, then the hierarchical clustering information will be saved in self.cluster_linkage which records the binary clustering tree in a compact format. This SciPy linkage array is directly useful for drawing dendograms and calculating various statistics (see https://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html). This can be converted to an explicit tree of ClusterNode objects if needed, via:

scipy.cluster.hierarchy.to_tree(self.cluster_linkage)
Parameters:
  • labels (List[int]) – an array of cluster numbers (0..), containing one number for each Trace.
  • linkage (np.ndarray, optional) – an optional scipy linkage array that encodes a binary hierarchical tree. The default is None, as hierarchical clustering is optional.
Raises:

Exception – if labels is not the same length as the number of traces, or the cluster label numbers are not contiguous in the range 0..n for some n, or if any arguments are malformed.

Returns:

Return type:

None.

set_event_chars(given: Mapping[str, str] = None)

Sets up the event-to-char map that is used to visualise traces.

This will calculate a default mapping for any actions that are not in given. For good readability of the printed traces, it is recommended that extremely common actions should be mapped to ‘small’ characters like ‘.’ or ‘,’.

If given is None, then meta data “action_chars” will be used as a basis instead. If that is also None, then all action characters will be calculated using the global default_map_to_chars() function.

Parameters:given – optional pre-allocation of a few action names to chars.
set_meta(key: str, value: Any) → Optional[Any]

Sets the requested meta data, and returns the old value if any.

to_pandas() → pandas.core.frame.DataFrame

Converts all the traces into a single Pandas DataFrame (one event/row).

The first three columns are ‘Trace’ and ‘Event’ which give the number of the trace and the position of the event within that trace, and ‘Action’ which is the name of the action of the event.

Each named input value is recorded in a separate column. For outputs, by default there are just ‘Status’ (int) and ‘Error’ (str) columns.

visualize_clusters(algorithm=None, fit: bool = True, xlim=None, ylim=None, cmap=None, markers=None, markersize=None, filename: str = None, block: bool = True)

Visualize the clusters from create_clusters().

Parameters:
  • algorithm – the visualization algorithm to map data into 2D (default TSNE).
  • fit – True means fit the data, False means algorithm is pre-trained, so use it to just transform the data into 2D without fitting the data first. Note that TSNE does not support fit=False yet. If you want fit=False, use another dimension-reduction algorithm like PCA(…).
  • xlim (Pair[float,float]) – optional axis limits for the X axis.
  • ylim (Pair[float,float]) – optional axis limits for the Y axis.
  • cmap (Union[ColorMap,str]) – optional color map for the cluster colors, or the name of a color map. See https://matplotlib.org/3.1.1/tutorials/colors/colormaps.html. Default is ‘brg’, which has a wide range of colors going from blue through red to green, and prints in black and white okay - though very non-linear - because it does not go all the way to white.
  • markers (matplotlib.markers.MarkerStyle) – optional marker styles for clusters. If this is a string, then the i’th character in the string will be used for the i’th marker style. See https://matplotlib.org/3.1.1/api/markers_api.html for the available marker characters. Note that clusters will be drawn from 0 up to n-1, so later clusters will be on top. Also, the earlier clusters tend to have more elements. One approach to improve readability is to use line-based shapes (from “1234+x|_”) for the first few clusters (which have many points), and then filled shapes (from “.o<^>vsphPXd*”) for the later clusters (which have few points). Note also that you can use a space for the marker character of a cluster if you want to not display that cluster at all. However, if your markers string is shorter than the number of clusters, all remaining clusters will be displayed using the “o” marker.
  • markersize (float) – size of the markers in points (only when markers is a str). The default seems to be about 6 points.
  • filename (str) – optional file name to save image into, as well as displaying it.
  • block (bool) – True (the default) means wait for user to close figure before returning. False means non-blocking.
  • Limitations – if you call this multiple times with different numbers of clusters, the color map will not be exactly the same.
with_traces_grouped_by(name: str = None, key: Callable[[agilkia.json_traces.Event], str] = None, property: bool = False, allow_missing: bool = False) → agilkia.json_traces.TraceSet

Returns a new TraceSet with each trace grouped into shorter traces.

It generates a new trace for each distinct key value.

Parameters:
  • name – the name of an input. This is a convenience parameter that is a shorthand for key=(lambda ev: ev.inputs[name]) if property=False or for key=(lambda ev: ev.meta_data.get(name, None)) if property=True.
  • key – a function that takes an Event object and returns the groupby key string.
  • [deprecated] (property) – True means name is a meta-data field, not an input.
  • allow_missing – True allows key to return None, meaning that that event will be silently discarded. False means it is an error for key to give None.
Returns:

a new TraceSet, usually with more traces and shorter traces.

with_traces_split(start_action: str = None, input_name: str = None, split: Callable[[agilkia.json_traces.Event, agilkia.json_traces.Event], bool] = None) → agilkia.json_traces.TraceSet

Returns a new TraceSet with each trace in this set split into shorter traces.

It will start a new trace whenever the split function returns True. The split function is called on each adjacent pair of events in each trace, and should return True whenever the second of those events should start a new trace.

The start_action and input_name parameters give shortcuts for common splitting criteria.

Parameters:
  • start_action – the name of an action that starts a new trace. This is shorthand for split=(lambda e1,e2: e2.action==start_action).
  • input_name – the name of an input. Whenever the value of this input changes, then a new trace should be started. This is shorthand for split=(lambda e1,e2: e1.inputs[input_name] != e2.inputs[input_name]).
  • split – a function that is called on each adjacent pair of events to determine if the trace should be split between those two events.
Returns:

a new TraceSet, usually with more traces and shorter traces.

Trace

class agilkia.Trace(events: List[agilkia.json_traces.Event], parent: Optional[agilkia.json_traces.TraceSet] = None, meta_data: Optional[Dict[str, Any]] = None, random_state=None)

Bases: object

Represents a single trace, which contains a sequence of events.

Public data fields include:
  • self.events: List[Event]. However, the iteration, indexing, and len(_) methods have been lifted from the events list up to this Trace object, so you may not need to access self.events at all.
  • self.meta_data: MetaData. Or use get_meta(key) to get an individual meta-data value.
action_counts(event_to_str: Callable[[agilkia.json_traces.Event], str] = None) → Dict[str, int]

Returns a bag-of-words count of all the Events in this Trace.

Firstly, each Event is mapped to a single string using the event_to_str function (the default is just the action name of the Event), and then the resulting strings are counted into a bag-of-words dictionary showing how many times each string occurs in this trace.

Parameters:event_to_str (Event->str) – optional function for converting each Event into the string that is counted. These strings become the keys of the result dictionary. The default custom is (lambda ev: ev.action).
Returns:A dictionary of counts that can be used for clustering traces.
action_status_counts() → Dict[str, int]

Counts how many times each action-status pair occurs in this trace.

Returns:A dictionary of counts that can be used for clustering traces.
get_meta(key: str) → Optional[Any]

Returns requested meta data, or None if that key does not exist.

to_string(to_char: Dict[str, str] = None, compress: List[str] = None, color_status: bool = False)

Return a one-line summary of this trace, one character per event. See ‘trace_to_string’ for details. NOTE: throws an exception if no to_char map is given and this trace has no parent.

trace_set()

Returns the TraceSet that this trace is part of, or None if not known.

Event

class agilkia.Event(action: str, inputs: Dict[str, Any], outputs: Dict[str, Any], meta_data: Optional[Dict[str, Any]] = None)

Bases: object

An Event is a dictionary-like object that records all the details of an event.

Public data fields include:
  • self.action (str): the full action name.
  • self.inputs (Dict[str,Any]): the named inputs and their values.
  • self.outputs (Dict[str,Any]): the named outputs and their values.
  • self.meta_data (Dict[str,Any]): any extra properties such as “timestamp”. Note: if self.meta_data[“timestamp”] is present, it should be in ISO 8601 format. Or use get_meta(key) to get an individual meta-data value.
error_message

Read-only error message output by this operation. If no output[‘Error’] field is available, this method always returns “”.

status

Read-only status of the operation, where 0 means success. If output ‘Status’ is not available or is not numeric, this method still returns 0.

status_float

Read-only status of the operation, where 0.0 usually means success.

This method is useful for applications that use non-integer status values. If no output ‘Status’ is available or it is not a valid number, NaN is returned.

TraceEncoder

class agilkia.TraceEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: json.encoder.JSONEncoder

An internal class used by TraceSet to encode objects into JSON format.

We use a custom JSON encoder because objects from zeep could not be serialised.

Based on ideas from this blog entry by ‘The Fellow’ (Ouma Rodgers): https://medium.com/python-pandemonium/json-the-python-way-91aac95d4041.

This does not handle XML objects, as they should be decoded via xml_decode first.

default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

RandomTester

class agilkia.RandomTester(urls: Union[str, List[str]], methods_to_test: List[str] = None, input_rules: Dict[str, List[T]] = None, rand: random.Random = None, action_chars: Mapping[str, str] = None, verbose: bool = False)

Bases: object

Does random testing of a given web service.

Give it a URL to a web service (or a list of URLs if there are several web services), and it will read the WSDL specifications from those web services and generate any number of random test sequences to test the methods.

For more sophisticated (user-directed) testing you can also: * supply a username and password if login credentials are needed. * supply the subset of method names that you want to focus on testing (default is all). * supply a set of default input values (or generation functions) for each data type. * supply a set of input values (or generation functions) for each named input parameter.

add_web_service(url: str)

Add another web service using the given url.

call_method(name: str, args: Dict[str, Any] = None, meta_data: Optional[Dict[str, Any]] = None)

Call the web service name(args) and add the result to trace.

Parameters:
  • name (str) – the name of the method to call.
  • args (dict) – the input values for the method. If args=None, then this method uses ‘choose_input_value’ to choose appropriate values for each argument value of the method.
  • meta_data – optional meta data to add to the resulting Event.

Returns: Before the call, this method replaces some symbolic inputs by actual concrete values. For example the correct password token is replaced by the real password – this avoids recording the real password in the inputs of the trace.

Returns:The whole Event object created by this method call.
choose_input_value(arg_name: str) → str

Choose an appropriate value for the input argument called ‘arg_name’. If no set of input rules is defined for ‘arg_name’, then ‘generate_input_value’ is called to generate a suitable input value. Subclasses can override this.

Parameters:arg_name (str) – the name of the input parameter.
Returns:a string if successful, or None if no suitable value was found.
decode_outputs(raw) → Dict[str, Any]

Decode the outputs from a web service/site call into a dictionary.

This adds a ‘Status’ entry in the output dictionary, to say if the operation was successful (0) or not (non-zero).

generate_input_value(arg_name: str) → Any

Can be overridden in subclasses to generate smart values for an input argument.

generate_trace(start=True, length=20, methods: List[str] = None) → agilkia.json_traces.Trace

Generates the requested length of test steps, choosing methods at random.

Parameters:
  • start (bool) – True means that a new trace is started (unless current one is empty).
  • length (int) – The number of steps to generate (default=20).
  • methods (List[str]) – only these methods will be chosen (None means all are allowed)
Returns:

the whole of the current trace that has been generated so far.

get_methods() → Mapping[str, Mapping[str, Mapping[str, str]]]

Return the set of all method names in all the web services.

set_username(username: str, password: str = None)

Set the username and (optional) password to be used for the subsequent operations. If password is not supplied, this method will immediately interactively prompt for it.

summary(value) → str

Returns a one-line summary of the given value.

SmartSequenceGenerator

class agilkia.SmartSequenceGenerator(urls: Union[str, List[str]], method_signatures: Dict[str, Mapping[str, Mapping[str, str]]] = None, methods_to_test: List[str] = None, input_rules: Dict[str, List[T]] = None, rand: random.Random = None, action_chars: Mapping[str, str] = None, verbose: bool = False)

Bases: agilkia.random_tester.RandomTester

Generates test sequences from an ML model that suggests what actions can come next.

execute_test(trace: agilkia.json_traces.Trace, max_retry: int = 0)

Executes the given test trace and adds the resulting trace to this set.

Note that if the given trace contains events with missing input values, then suitable input values will be generated using ‘choose_input_value’. Progress messages will be printed if self.verbose is True.

Parameters:
  • trace – the trace to execute (with or without input values).
  • max_retry – retry failed operations up to this number of times, choosing different random input values each time.
generate_all_traces(model, length=5, action_prob=0.01, path_prob=1e-12, partial=True, event_factory=None) → List[agilkia.json_traces.Trace]

Generate all traces that satisfy the given constraints.

Parameters:
  • model – the trained ML model used to predict the next action.
  • length (int) – maximum length of each generated trace.
  • action_prob (float) – only do actions with at least this probability.
  • path_prob (float) – only include paths with at least this total probability.
  • partial (bool) – True means include partial traces. False gives complete traces only.
  • event_factory (string->Event) – Optional event generator, from the string predicted by the model.
Returns:

A list of all the Trace objects that satisfy the given constraints.

Note that all complete traces will have len(tr)<length, whereas all partial traces will have len(tr)==length.

generate_trace_with_model(model, start=True, length=20, event_factory=None)

Generates one sequence test steps, choosing actions using the given model. The generated trace terminates either when the model says <end> or after length steps.

Parameters:
  • model (Classifier) – ML model that takes an Event list and predicts next action name.
  • start (bool) – True means that a new trace is started, beginning with a “Login” call.
  • length (int) – The maximum number of steps to generate in one trace (default=20).
  • event_factory (string->Event) – Optional event generator, from the string predicted by the model.
Returns:

the whole of the current trace that has been generated so far.

Agilkia Helper Functions

Most of these global functions are just helper functions for the main agilkia classes. However, some may be useful as standalone functions when building other kinds of tools.

agilkia.json_traces.xml_decode(obj: xml.etree.ElementTree.Element) → Union[str, Dict[str, Any]]

Custom XML encoder to decode XML into a Python dictionary suitable for JSON encoding.

This roughly follows the ideas from: https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html.

For simple XML objects with no attributes and no children, this returns just the text string. For more complex XML objects, it returns a dictionary.

Note that the top-level tag of ‘obj’ is assumed to be handled by the caller. That is, the caller will typically do `d[tag] = xml_decode(obj)` where xml_decode will return either a simple string, or a dictionary.

agilkia.json_traces.default_map_to_chars(actions: Set[str], given: Dict[str, str] = None) → Dict[str, str]

Tries to guess a useful default mapping from action names to single characters.

Parameters:
  • actions – the names of all the actions.
  • given – optional pre-allocation of a few action names to chars. You can use this to override the default behaviour.
Returns:

A map from every name in actions to a unique single character.

agilkia.json_traces.trace_to_string(trace: List[agilkia.json_traces.Event], to_char: Mapping[str, str], compress: List[str] = None, color_status: bool = False) → str

Converts a trace to a short summary string, one character per action.

Parameters:
  • trace – the sequence of JSON-like events, with an “action” field.
  • to_char – maps each action name to a single character. This map must include every action name that appears in the traces. A suitable map can be constructed via TraceSet.get_event_chars().
  • compress – a list of Action names. Repeated events will be compressed if in this list.
  • color_status – True means color the string red where status is non-zero. This uses ANSI escape sequences, so needs to be printed to a terminal.
Returns:

a summary string.

agilkia.json_traces.traces_to_pandas(traces: List[agilkia.json_traces.Trace]) → pandas.core.frame.DataFrame

Collects all events into a single Pandas DataFrame.

Columns include the trace number, the event number, the action name, each input parameter, the result status and error message.

TODO: we could convert complex values to strings before sending to Pandas? TODO: we could have an option to encode strings into integer properties?

agilkia.random_tester.read_input_rules(file: pathlib.Path) → Dict[str, List[str]]

Reads a CSV file of input values.

The header line of the CSV file should contain headers: Name,Frequency,Value. (but the Frequency column is optional, and missing frequencies default to 1).

For example if one line contains ‘size,3,100’ and another contains ‘size,2,200’, then the resulting input rules will define a 3/5 chance of size being 100, and a 2/5 chance of it being 200.