Agilkia API Docs


class agilkia.TraceSet(traces: List[agilkia.json_traces.Trace], meta_data: Dict[str, Any] = None)

Bases: object

Represents a set of traces, either generated or recorded.

Typical usage is to create an empty TraceSet and then add traces to it one by one:

traces = agilkia.TraceSet([], meta_data = {"author":"MarkU", "dataset":"Example 1"})
for i in ...:

Once all traces have been added, the TraceSet should be considered read-only (except for adding meta-data and clustering information). If you want to create subsets of the traces, it is recommended to create those as new TraceSet objects.

  • forall tr:self.traces (tr._parent is self) (TODO: set _parent to None when a trace is removed?)
  • self.meta_data is a dict with keys: date, source at least.
Public data fields include:
  • self.traces: List[Trace]. However, the iteration, indexing, and len(_) methods have been lifted from the trace list up to the top-level TraceSet object, so you may not need to access self.traces at all.
  • self.meta_data: MetaData. Or use get_meta(key) to get an individual meta-data value.
  • self.version: str. Version number of this TraceSet object.
  • self.cluster_labels: optional list giving a cluster number for each trace.
    That is, self.cluster_labels[i] is the number of the cluster that trace self.traces[i] (or equivalently, self[i]) belongs to.
  • self.cluster_linkage: optional hierarchical clustering (SciPy linkage matrix).
append(trace: agilkia.json_traces.Trace)

Appends the given trace into this set of traces. This also sets its parent to be this trace set.

arff_type(pandas_type: str) → Union[str, List[str]]

Maps each Pandas data type to the closest ARFF type.

create_clusters(data: pandas.core.frame.DataFrame, algorithm=None, normalizer=None, fit: bool = True) → int

Runs a clustering algorithm on the given data and remembers the clusters.

Note that clustering results are now saved into JSON files.

  • data – a Pandas DataFrame, typically from get_trace_data(), with the i’th row of the DataFrame being for the i’th trace in this set of traces.
  • algorithm – a clustering algorithm (default is MeanShift()).
  • normalizer – a normalization algorithm (default is MinMaxScaler).
  • fit – True means fit the data into clusters, False means just predict clusters assuming that the algorithm and normalizer have already been trained.

The number of clusters generated.

extend(traces: List[agilkia.json_traces.Trace])

Appends all the given traces into this set of traces. This also sets their parents to be this trace set.


Returns a sorted list (with duplicates removed) of all the keys in data.

Parameters:event_to_str (Event -> str) – an optional feature-extractor function that maps each event to a single string. The default is just to return the action name of the event. This can be used to customise the column names in the DataFrame generated by get_trace_data.
get_cluster(num: int) → List[agilkia.json_traces.Trace]

Gets a list of all the Trace objects that are in the given cluster.

get_clusters() → Optional[List[int]]

Get the list of cluster numbers for each trace.

Precondition: self.is_clustered()

classmethod get_default_meta_data() → Dict[str, Any]

Generates some basic meta-data such as date, user and command line.


Gets the event-to-char map that is used to visualise traces.

This maps each action name to a single character. If set_event_chars has not been called, this getter will calculate and cache a default mapping from action names to characters.

get_meta(key: str) → Optional[Any]

Returns requested meta data, or None if that key does not exist.

get_num_clusters() → int

Return the number of clusters. Zero means not clustered.

get_trace_data(method: Union[str, Callable[[agilkia.json_traces.Event], str]] = 'action_counts', columns: List[str] = None) → pandas.core.frame.DataFrame

Returns a Pandas table of statistics/data about each trace.

The resulting table can be used as training data for machine learning algorithms. The method specifies the feature-encoding function for each trace. Any missing data values are replaced by zeroes. The default method is Trace.action_counts, which does a bag-of-words encoding using the event action names. It is equivalent to:

lambda tr: tr.action_counts()

or (expanding out the default Event-to-string mapping of action_counts):

lambda tr: tr.action_counts(event_to_str=(lambda ev: ev.action)

As another example, if you wanted to encode pairs of events, you could do it using a method function like this:

lambda tr: Counter([f"{tr[i].action}_{tr[i+1].action}" for i in range(len(tr) - 1)])
  • method – the feature encoding method to use for each trace. This method must return a Dict[str, number] for some kind of number. The method can be specified by name, if one of the existing methods in the Trace class is what you want. The default is the action_counts() method, which corresponds to the bag-of-words algorithm, counting one string from each Event. Alternatively, method can be a user-defined function that takes a Trace and returns its features (a Dict[str, int]).
  • columns – optional list of column names. This can be used to reorder or remove or add columns. (Any added columns will be filled with zeroes).

A table of data that can be used for clustering or machine learning. If columns is not specified, the columns of the table will be in alphabetical order. The i’th row of the table is the data for the i’th trace in this set.

classmethod load_from_json(file: pathlib.Path) → agilkia.json_traces.TraceSet

Load traces from the given file.

This upgrades older trace sets to the current version if possible.

message(msg: str)

Print a progress message.

save_to_arff(file: pathlib.Path, name=None) → None

Save all the events in all traces into an ARFF file for machine learning.

  • filename – the name of the file to save into. Should end with ‘.arff’.
  • name – optional relation name to identify this data inside the ARFF file. The default is the base name of ‘file’.
save_to_json(file: pathlib.Path) → None

Saves this TraceSet into the given file[.json] in JSON format.

The file extension is forced to be .json if it is not already that. The file includes a version number so that older data files can be updated if possible.

set_clusters(labels: List[int], linkage: numpy.ndarray = None)

Record clustering information for the traces in this TraceSet.

The set of flat clusters must be given - one cluster number for each Trace.

If hierarchical clusters are supplied (as a linkage array), then the flat clusters are typically a cut through that tree.

After this method has been called, the flat clusters will be saved in self.cluster_labels. If the linkage argument is not None, then the hierarchical clustering information will be saved in self.cluster_linkage which records the binary clustering tree in a compact format. This SciPy linkage array is directly useful for drawing dendograms and calculating various statistics (see This can be converted to an explicit tree of ClusterNode objects if needed, via:

  • labels (List[int]) – an array of cluster numbers (0..), containing one number for each Trace.
  • linkage (np.ndarray, optional) – an optional scipy linkage array that encodes a binary hierarchical tree. The default is None, as hierarchical clustering is optional.

Exception – if labels is not the same length as the number of traces, or the cluster label numbers are not contiguous in the range 0..n for some n, or if any arguments are malformed.


Return type:


set_event_chars(given: Mapping[str, str] = None)

Sets up the event-to-char map that is used to visualise traces.

This will calculate a default mapping for any actions that are not in given. For good readability of the printed traces, it is recommended that extremely common actions should be mapped to ‘small’ characters like ‘.’ or ‘,’.

If given is None, then meta data “action_chars” will be used as a basis instead. If that is also None, then all action characters will be calculated using the global default_map_to_chars() function.

Parameters:given – optional pre-allocation of a few action names to chars.
set_meta(key: str, value: Any) → Optional[Any]

Sets the requested meta data, and returns the old value if any.

to_pandas() → pandas.core.frame.DataFrame

Converts all the traces into a single Pandas DataFrame (one event/row).

The first three columns are ‘Trace’ and ‘Event’ which give the number of the trace and the position of the event within that trace, and ‘Action’ which is the name of the action of the event.

Each named input value is recorded in a separate column. For outputs, by default there are just ‘Status’ (int) and ‘Error’ (str) columns.

visualize_clusters(algorithm=None, fit: bool = True, xlim=None, ylim=None, cmap=None, markers=None, markersize=None, filename: str = None, block: bool = True)

Visualize the clusters from create_clusters().

  • algorithm – the visualization algorithm to map data into 2D (default TSNE).
  • fit – True means fit the data, False means algorithm is pre-trained, so use it to just transform the data into 2D without fitting the data first. Note that TSNE does not support fit=False yet. If you want fit=False, use another dimension-reduction algorithm like PCA(…).
  • xlim (Pair[float,float]) – optional axis limits for the X axis.
  • ylim (Pair[float,float]) – optional axis limits for the Y axis.
  • cmap (Union[ColorMap,str]) – optional color map for the cluster colors, or the name of a color map. See Default is ‘brg’, which has a wide range of colors going from blue through red to green, and prints in black and white okay - though very non-linear - because it does not go all the way to white.
  • markers (matplotlib.markers.MarkerStyle) – optional marker styles for clusters. If this is a string, then the i’th character in the string will be used for the i’th marker style. See for the available marker characters. Note that clusters will be drawn from 0 up to n-1, so later clusters will be on top. Also, the earlier clusters tend to have more elements. One approach to improve readability is to use line-based shapes (from “1234+x|_”) for the first few clusters (which have many points), and then filled shapes (from “.o<^>vsphPXd*”) for the later clusters (which have few points). Note also that you can use a space for the marker character of a cluster if you want to not display that cluster at all. However, if your markers string is shorter than the number of clusters, all remaining clusters will be displayed using the “o” marker.
  • markersize (float) – size of the markers in points (only when markers is a str). The default seems to be about 6 points.
  • filename (str) – optional file name to save image into, as well as displaying it.
  • block (bool) – True (the default) means wait for user to close figure before returning. False means non-blocking.
  • Limitations – if you call this multiple times with different numbers of clusters, the color map will not be exactly the same.
with_traces_grouped_by(name: str = None, key: Callable[[agilkia.json_traces.Event], str] = None, property: bool = False, allow_missing: bool = False) → agilkia.json_traces.TraceSet

Returns a new TraceSet with each trace grouped into shorter traces.

It generates a new trace for each distinct key value.

  • name – the name of an input. This is a convenience parameter that is a shorthand for key=(lambda ev: ev.inputs[name]) if property=False or for key=(lambda ev: ev.meta_data.get(name, None)) if property=True.
  • key – a function that takes an Event object and returns the groupby key string.
  • [deprecated] (property) – True means name is a meta-data field, not an input.
  • allow_missing – True allows key to return None, meaning that that event will be silently discarded. False means it is an error for key to give None.

a new TraceSet, usually with more traces and shorter traces.

with_traces_split(start_action: str = None, input_name: str = None, split: Callable[[agilkia.json_traces.Event, agilkia.json_traces.Event], bool] = None) → agilkia.json_traces.TraceSet

Returns a new TraceSet with each trace in this set split into shorter traces.

It will start a new trace whenever the split function returns True. The split function is called on each adjacent pair of events in each trace, and should return True whenever the second of those events should start a new trace.

The start_action and input_name parameters give shortcuts for common splitting criteria.

  • start_action – the name of an action that starts a new trace. This is shorthand for split=(lambda e1,e2: e2.action==start_action).
  • input_name – the name of an input. Whenever the value of this input changes, then a new trace should be started. This is shorthand for split=(lambda e1,e2: e1.inputs[input_name] != e2.inputs[input_name]).
  • split – a function that is called on each adjacent pair of events to determine if the trace should be split between those two events.

a new TraceSet, usually with more traces and shorter traces.


class agilkia.Trace(events: List[agilkia.json_traces.Event], parent: Optional[agilkia.json_traces.TraceSet] = None, meta_data: Optional[Dict[str, Any]] = None, random_state=None)

Bases: object

Represents a single trace, which contains a sequence of events.

Public data fields include:
  • List[Event]. However, the iteration, indexing, and len(_) methods have been lifted from the events list up to this Trace object, so you may not need to access at all.
  • self.meta_data: MetaData. Or use get_meta(key) to get an individual meta-data value.
action_counts(event_to_str: Callable[[agilkia.json_traces.Event], str] = None) → Dict[str, int]

Returns a bag-of-words count of all the Events in this Trace.

Firstly, each Event is mapped to a single string using the event_to_str function (the default is just the action name of the Event), and then the resulting strings are counted into a bag-of-words dictionary showing how many times each string occurs in this trace.

Parameters:event_to_str (Event->str) – optional function for converting each Event into the string that is counted. These strings become the keys of the result dictionary. The default custom is (lambda ev: ev.action).
Returns:A dictionary of counts that can be used for clustering traces.
action_status_counts() → Dict[str, int]

Counts how many times each action-status pair occurs in this trace.

Returns:A dictionary of counts that can be used for clustering traces.
get_meta(key: str) → Optional[Any]

Returns requested meta data, or None if that key does not exist.

to_string(to_char: Dict[str, str] = None, compress: List[str] = None, color_status: bool = False)

Return a one-line summary of this trace, one character per event. See ‘trace_to_string’ for details. NOTE: throws an exception if no to_char map is given and this trace has no parent.


Returns the TraceSet that this trace is part of, or None if not known.


class agilkia.Event(action: str, inputs: Dict[str, Any], outputs: Dict[str, Any], meta_data: Optional[Dict[str, Any]] = None)

Bases: object

An Event is a dictionary-like object that records all the details of an event.

Public data fields include:
  • self.action (str): the full action name.
  • self.inputs (Dict[str,Any]): the named inputs and their values.
  • self.outputs (Dict[str,Any]): the named outputs and their values.
  • self.meta_data (Dict[str,Any]): any extra properties such as “timestamp”. Note: if self.meta_data[“timestamp”] is present, it should be in ISO 8601 format. Or use get_meta(key) to get an individual meta-data value.

Read-only error message output by this operation. If no output[‘Error’] field is available, this method always returns “”.


Read-only status of the operation, where 0 means success. If output ‘Status’ is not available or is not numeric, this method still returns 0.


Read-only status of the operation, where 0.0 usually means success.

This method is useful for applications that use non-integer status values. If no output ‘Status’ is available or it is not a valid number, NaN is returned.


class agilkia.RandomTester(urls: Union[str, List[str]], methods_to_test: List[str] = None, input_rules: Dict[str, List[T]] = None, rand: random.Random = None, action_chars: Mapping[str, str] = None, verbose: bool = False)

Bases: object

Does random testing of a given web service.

Give it a URL to a web service (or a list of URLs if there are several web services), and it will read the WSDL specifications from those web services and generate any number of random test sequences to test the methods.

For more sophisticated (user-directed) testing you can also: * supply a username and password if login credentials are needed. * supply the subset of method names that you want to focus on testing (default is all). * supply a set of default input values (or generation functions) for each data type. * supply a set of input values (or generation functions) for each named input parameter.

add_web_service(url: str)

Add another web service using the given url.

call_method(name: str, args: Dict[str, Any] = None, meta_data: Optional[Dict[str, Any]] = None)

Call the web service name(args) and add the result to trace.

  • name (str) – the name of the method to call.
  • args (dict) – the input values for the method. If args=None, then this method uses ‘choose_input_value’ to choose appropriate values for each argument value of the method.
  • meta_data – optional meta data to add to the resulting Event.

Returns: Before the call, this method replaces some symbolic inputs by actual concrete values. For example the correct password token is replaced by the real password – this avoids recording the real password in the inputs of the trace.

Returns:The whole Event object created by this method call.
choose_input_value(arg_name: str) → str

Choose an appropriate value for the input argument called ‘arg_name’. If no set of input rules is defined for ‘arg_name’, then ‘generate_input_value’ is called to generate a suitable input value. Subclasses can override this.

Parameters:arg_name (str) – the name of the input parameter.
Returns:a string if successful, or None if no suitable value was found.
decode_outputs(raw) → Dict[str, Any]

Decode the outputs from a web service/site call into a dictionary.

This adds a ‘Status’ entry in the output dictionary, to say if the operation was successful (0) or not (non-zero).

generate_input_value(arg_name: str) → Any

Can be overridden in subclasses to generate smart values for an input argument.

generate_trace(start=True, length=20, methods: List[str] = None) → agilkia.json_traces.Trace

Generates the requested length of test steps, choosing methods at random.

  • start (bool) – True means that a new trace is started (unless current one is empty).
  • length (int) – The number of steps to generate (default=20).
  • methods (List[str]) – only these methods will be chosen (None means all are allowed)

the whole of the current trace that has been generated so far.

get_methods() → Mapping[str, Mapping[str, Mapping[str, str]]]

Return the set of all method names in all the web services.

set_username(username: str, password: str = None)

Set the username and (optional) password to be used for the subsequent operations. If password is not supplied, this method will immediately interactively prompt for it.

summary(value) → str

Returns a one-line summary of the given value.


class agilkia.SmartSequenceGenerator(urls: Union[str, List[str]], method_signatures: Dict[str, Mapping[str, Mapping[str, str]]] = None, methods_to_test: List[str] = None, input_rules: Dict[str, List[T]] = None, rand: random.Random = None, action_chars: Mapping[str, str] = None, verbose: bool = False)

Bases: agilkia.random_tester.RandomTester

Generates test sequences from an ML model that suggests what actions can come next.

execute_test(trace: agilkia.json_traces.Trace, max_retry: int = 0)

Executes the given test trace and adds the resulting trace to this set.

Note that if the given trace contains events with missing input values, then suitable input values will be generated using ‘choose_input_value’. Progress messages will be printed if self.verbose is True.

  • trace – the trace to execute (with or without input values).
  • max_retry – retry failed operations up to this number of times, choosing different random input values each time.
generate_all_traces(model, length=5, action_prob=0.01, path_prob=1e-12, partial=True, event_factory=None) → List[agilkia.json_traces.Trace]

Generate all traces that satisfy the given constraints.

  • model – the trained ML model used to predict the next action.
  • length (int) – maximum length of each generated trace.
  • action_prob (float) – only do actions with at least this probability.
  • path_prob (float) – only include paths with at least this total probability.
  • partial (bool) – True means include partial traces. False gives complete traces only.
  • event_factory (string->Event) – Optional event generator, from the string predicted by the model.

A list of all the Trace objects that satisfy the given constraints.

Note that all complete traces will have len(tr)<length, whereas all partial traces will have len(tr)==length.

generate_trace_with_model(model, start=True, length=20, event_factory=None)

Generates one sequence test steps, choosing actions using the given model. The generated trace terminates either when the model says <end> or after length steps.

  • model (Classifier) – ML model that takes an Event list and predicts next action name.
  • start (bool) – True means that a new trace is started, beginning with a “Login” call.
  • length (int) – The maximum number of steps to generate in one trace (default=20).
  • event_factory (string->Event) – Optional event generator, from the string predicted by the model.

the whole of the current trace that has been generated so far.


class agilkia.TracePrefixExtractor(event_to_str=None)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Encodes all the prefixes of all traces into (X,y) features for machine learning.

This feature encoder follows the standard scikit-learn Estimator conventions, so a typical usage might look like:

prefixes = TracePrefixExtractor()
X = prefixes.transform(traceset)
y = prefixes.get_labels()

The default implementation uses bag-of-words to build X.

That is, each event is converted into a single string (see event_to_str parameter), and then bag-of-words is used to count the number of times each of those strings appears. This gives one row of the X matrix, while the corresponding y value (see get_labels()) is just the result of applying event_to_str to the next event.

The default event_to_str function just uses the action string for each Event, which is useful for learning to predict the next action. If ‘tr’ is the first Trace in the traceset in the above example, then the first few rows in (X,y) will effectively be:

X[i] = Counter([ev.action for ev in tr[0:i]])
y[i] = tr[i].action

For more complex feature encoding than simple bag-of-words, you can create a subclass of this class and override the generate_row and generate_feature_names methods.

  • vocabulary (dict) – A dictionary mapping feature names to feature indices.
  • feature_names (list) – A list of length n_features containing the feature names.
fit(X: agilkia.json_traces.TraceSet, y=None)

Fit uses the given TraceSet to calculate the feature names.

It takes the union of generate_feature_names() over all the traces, sorts the resulting set of feature names, and passes that list to set_feature_names.

  • X (TraceSet) – the set of traces to be fitted.
  • y (None) – unused.


Note that fit() must be called before transform() or get_feature_names().

generate_feature_names(trace: agilkia.json_traces.Trace) → Set[str]

Generate the column names required for the given trace.

By default this just applies the event_to_str function to every Event in trace.

generate_prefix_features(events: List[agilkia.json_traces.Event], current: Optional[agilkia.json_traces.Event]) → Tuple[Dict[str, float], Any]

Encodes a sequence of events into one row of the (X,y) training data.

Subclasses can override this to change the feature encoding (X), or change what is being learned (y). If they want to change the feature names, they should also override generate_feature_names.

  • events – the prefix of the trace.
  • current – the next event in the trace. None usually means the end of the trace.

Gets the list of column names for the generated data tables.


Get the output labels (action names) corresponding to the last transform() call.

get_prefix_features(events: List[agilkia.json_traces.Event]) → Dict[str, float]

Use bag-of-words to count the various features in a sequence of Events.

set_feature_names(names: List[str])

Sets the output column names to the given list of feature names.

Also calculates the inverse mapping (names to position) for internal use.

Parameters:names – For consistent results, this list should be sorted in a consistent way.
transform(X: Union[agilkia.json_traces.TraceSet, List[agilkia.json_traces.Event]], curr: agilkia.json_traces.Event = None)

Transforms a set of traces, or an event sequence, into a Pandas DataFrame of training data. Note that the columns of the resulting DataFrame are fixed during the fit() method (which calls set_feature_names), so any new kinds of actions appearing in this X input will be ignored.

There are two different behaviors, depending upon the input type of X.
  • if traces is a TraceSet, all prefixes of all traces are converted into training data, and the corresponding expected y labels (e.g. action name) for all those prefixes are available from get_labels().
  • if traces is a list of events, then the result will contain just a single row, which will be the data for that whole trace. In this case, the optional parameter curr may be used to pass the partially-complete current event to the feature encoding if desired.


class agilkia.TraceEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: json.encoder.JSONEncoder

An internal class used by TraceSet to encode objects into JSON format.

We use a custom JSON encoder because objects from zeep could not be serialised.

Based on ideas from this blog entry by ‘The Fellow’ (Ouma Rodgers):

This does not handle XML objects, as they should be decoded via xml_decode first.


Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
        iterable = iter(o)
    except TypeError:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

Agilkia Helper Functions

Most of these global functions are just helper functions for the main agilkia classes. However, some may be useful as standalone functions when building other kinds of tools.

agilkia.json_traces.xml_decode(obj: xml.etree.ElementTree.Element) → Union[str, Dict[str, Any]]

Custom XML encoder to decode XML into a Python dictionary suitable for JSON encoding.

This roughly follows the ideas from:

For simple XML objects with no attributes and no children, this returns just the text string. For more complex XML objects, it returns a dictionary.

Note that the top-level tag of ‘obj’ is assumed to be handled by the caller. That is, the caller will typically do `d[tag] = xml_decode(obj)` where xml_decode will return either a simple string, or a dictionary.

agilkia.json_traces.default_map_to_chars(actions: Set[str], given: Dict[str, str] = None) → Dict[str, str]

Tries to guess a useful default mapping from action names to single characters.

  • actions – the names of all the actions.
  • given – optional pre-allocation of a few action names to chars. You can use this to override the default behaviour.

A map from every name in actions to a unique single character.

agilkia.json_traces.trace_to_string(trace: List[agilkia.json_traces.Event], to_char: Mapping[str, str], compress: List[str] = None, color_status: bool = False) → str

Converts a trace to a short summary string, one character per action.

  • trace – the sequence of JSON-like events, with an “action” field.
  • to_char – maps each action name to a single character. This map must include every action name that appears in the traces. A suitable map can be constructed via TraceSet.get_event_chars().
  • compress – a list of Action names. Repeated events will be compressed if in this list.
  • color_status – True means color the string red where status is non-zero. This uses ANSI escape sequences, so needs to be printed to a terminal.

a summary string.

agilkia.json_traces.traces_to_pandas(traces: List[agilkia.json_traces.Trace]) → pandas.core.frame.DataFrame

Collects all events into a single Pandas DataFrame.

Columns include the trace number, the event number, the action name, each input parameter, the result status and error message.

TODO: we could convert complex values to strings before sending to Pandas? TODO: we could have an option to encode strings into integer properties?

agilkia.random_tester.read_input_rules(file: pathlib.Path) → Dict[str, List[str]]

Reads a CSV file of input values.

The header line of the CSV file should contain headers: Name,Frequency,Value. (but the Frequency column is optional, and missing frequencies default to 1).

For example if one line contains ‘size,3,100’ and another contains ‘size,2,200’, then the resulting input rules will define a 3/5 chance of size being 100, and a 2/5 chance of it being 200.