GRASPy

GRASPy is the Python API for the Graphical Representation of Ancestral Sequence Predictions.

Read the GRASP paper here

Requests

Retrieves any information about a particular job or the output from a job.

PlaceInQueue

g_requests.PlaceInQueue(job_id:str)

Requests the place in queue of a submitted job.

Parameters:

job_id(str): The ID of the job

Returns:

str: {"Job": job-number, "Result":{result-JSON}}

Example


>>> g_requests.requestPlaceInQueue(job_id=19)

Socket created...

Connecting to server...

Socket connected to 10.139.1.21 on IP 4072

Closing socket...

{"Job":19,"Place":0}

JobOutput

g_requests.JobResult(job_id: str)

Requests the output of a submitted job. Request will be denied if the job is not complete.

Parameters:

job_id(str): The ID of the job

Returns:

str: {"Job": job-number, "Result":{RESULT}}

Example


>>>g_requests.JobResult(19)

Socket created...

Connecting to server...

Socket connected to 10.139.1.21 on IP 4072

Closing socket...

ViewQueue

g_requests.ViewQueue()

Lists all the jobs currently being performed by the server

Parameters:

job_id(str): The ID of the job

Returns:

str: {"Job": job-number, "Result":{result-JSON}}

Example


>>>g_requests.JobResult(19)

Socket created...

Connecting to server...

Socket connected to 10.139.1.21 on IP 4072

{'Jobs': [{'Status': 'COMPLETED','Threads': 1, 'Command': 'Recon','Priority': 0,
'Memory': 1, 'Auth': 'Guest','Job': 1, 'Place': 0},{'Status': 'COMPLETED',
'Threads': 1,'Command': 'Recon','Priority': 0,'Memory': 1, 'Auth': 'Guest',
'Job': 2,'Place': 0}]}

Closing socket...

CancelJob

g_requests.CancelJob(job_id: str)

Requests the status of a submitted job.

Parameters:

job_id(str): The ID of the job

Returns:

str: {"Job": job-number}

Example


>>>g_requests.CancelJob(19)

Socket created...

Connecting to server...

Socket connected to 10.139.1.21 on IP 4072

{'Job': 19}

Closing socket...

JobStatus

g_requests.JobStatus(job_id: str)

Requests the status of job as either completed or queued.

Parameters:

job_id(str): The ID of the job

Returns:

str: {'Status': 'COMPLETED', 'Job': job-number}

Example


>>>g_requests.JobStatus(19)

Socket created...

Connecting to server...

Socket connected to 10.139.1.21 on IP 4072

Closing socket...

{'Status': 'COMPLETED', 'Job': job-number}

Commands

JointReconstruction

g_requests.JointReconstruction(aln: str, nwk: str, alphabet: str = None, auth: str = "Guest,
indels: str = "BEP", model: str = "JTT") -> str:

Queries the bnkit server for a joint reconstruction. Will default to standard bnkit reconstruction parameters which use BEP for indel model and JTT for the substitution model.

Parameters:

aln(str) = file name of aln file
nwk(str) = file name of nwk file
auth(str) = Authentication token, defaults to Guest
indels(str) = Indel mode, defaults to BEP
model(str) = Substitution model, defaults to JTT
alphabet(str) = Sequence type. e.g. DNA or Protein. If user does not specify, it will guess based on sequence content.

Returns:

str - {"Message":"Queued","Job": job-number}

Example

>>> JointReconstruction(aln="test_aln.aln", nwk="test_nwk.nwk")

Socket created...

Connecting to server...

Socket connected to 10.139.1.21 on IP 4072

Closing socket...

{"Message":"Queued","Job": job-number}

ExtantPOGTree

g_requests.ExtantPOGTree(aln: str, nwk: str, auth: str = "Guest")

Queries the server to turn an alignment and a nwk file into the POGTree format with POGraphs for extants. This output JSON can be converted into a POGTree object via POGTreeFromJSON()

Parameters:

aln(str) = file name of aln file
nwk(str) = file name of nwk file

Returns:

Dict: POGTree and POGraphs in JSON format

Example

>>> ExtantPOGTree(aln="test_aln.aln", nwk="test_nwk.nwk")

Socket created...

Connecting to server...

Socket connected to 10.139.1.21 on IP 4072

Closing socket...

{"Job":<job-number>, "Result":{result-JSON}}

LearnLatentDistributions

LearnLatentDistributions(nwk: str, states: list[str], csv_data: str, auth: str = "Guest")

Learns the distribution of an arbitrary number of discrete states. The output from the job will be a new/refined distribution.

CSV input requirements:

-CSV files containing data MUST be formatted as outlined below. The column with names of extant sequences must be named "Headers"

-The column with data points MUST be named "Data".

- Data with multiple observations must be spaced with whitespace e.g "8.8 1.2"

Headers,Data
A5ILB0,8.5
P08144,7.35 3.3
H9B4I9

Parameters:

nwk(str) = path to file name of nwk
states(list) = a list of names for states
csv_data(str) = path to csv with data
auth(str) = Authentication token, defaults to Guest

Returns:

str: {"Message":"Queued","Job": job-number}

Example

>>> LearnLatentDistributions(nwk="training.nwk", states=["A", "B"], data="train_data.csv")

Socket created...

Connecting to server...

Socket connected to 10.139.1.21 on IP 4072

Closing socket...

{'Message': 'Queued', 'Job': 42}

Completed job output:

- The distributions for each Condition(e.g. "A" or "B") are ordered based on the "Condition" list.

- Lists within "Pr" contain the mean and variance in that order for each condition.

{ "Distrib":
    { "Condition":[["A"],["B"]],
      "Pr":[[3.784926135903969,0.056738891699391655],
      [2.5324588293595744,0.056738891699391655]],
      "Index":[0,1],
      "Domain":"dat.Continuous@3bd5adde"
    }
}

MarginaliseDistOnAncestor

MarginaliseDistOnAncestor(nwk: str, states: list[str], csv_data: str, distrib: dict, ancestor: int, leaves_only: bool = True, auth: str = "Guest"):

Marginalises on an ancestral node using the latent distributions determined from LearnLatentDistributions(). Although its possible, I have not added parameters for rate, seed or gamma values.

CSV input requirements:

-CSV files containing data MUST be formatted as outlined below. The column with names of extant sequences must be named "Headers"

-The column with data points MUST be named "Data".

- Data with multiple observations must be spaced with whitespace e.g "8.8 1.2"

Headers,Data
A5ILB0,8.5
P08144,7.35 3.3
H9B4I9

Parameters:

nwk(str) = path to file name of nwk
states(list) = a list of names for states
csv_data(str) = path to csv with data
distrib(dict) = a previously trained distribution from data
ancestor(int) = Specify which ancestor to marginalise on
leaves_only(bool) = ...
auth(str) = Authentication token, defaults to Guest

Returns:

str: {"Message":"Queued","Job": job-number}

Example:

>>> MarginaliseDistOnAncestor(nwk="training.nwk", states=["A", "B"], data="train_data.csv", ancetor=0)

Socket created...

Connecting to server...

Socket connected to 10.139.1.21 on IP 4072

Closing socket...

{'Message': 'Queued', 'Job': 42}

Completed job output:

- The marginalised distribution for each state are ordered in the list according to "Values".

- Values within "Pr" represent the mean and variance respectively.

{ "N0":[
    { "Pr":[0.6652270145537978,0.3347729854462022],
      "Domain":{"Size":2,"Values":["A","B"],"Datatype":"String"}},
    { "Pr":[0.649968113685095,0.350031886314905],
    "Domain":{"Size":2,"Values":["A","B"],"Datatype":"String"}}
    ]
}

Data Structures

GRASPy has a number of data structures that can be used to interact with output from the bnkit server responses.

POGTreeFromJointReconstruction

parsers.POGTreeFromJSON(nwk: Union[str, dict], POG_graphs: dict)

Creates an instance of the POGTree data structure. A nwk file OR output from JointReconstruction() can be used to create tree topology with the second option also creating POGraphs for extants.

Parameters:

nwk (str or dict): Users can input a nwk file path or can provide the output from g_requests.requestPOGTree().
POG_graphs(dict): The POGraphs for ancestors generated from output from g_requests.requestJointReconstruction().

Returns:

<POGTree>

Example:

>>> tree = POGTreeFromJointReconstruction(nwk="example.nwk", POG_graphs=graphs)

POGTree

POGTree(nBranches: int, branchpoints: dict[str, BranchPoint],
             parents: list[int], children: list[list[Union[int, None]]],
             indices: dict[str, int], distances: list[float],
             POGraphs: dict[str, POGraph])

The Partial Order Graph Tree (POGTree), is a phylogenetic tree made up of branchpoints which represent nodes on the tree. Each branchpoint is assigned an index and a BranchPoint object, allowing easy access of information via the sequence name of an extant or an ancestor. 2) A POGraph object which describes the graph of the sequence at that branchpoint.

Parameters:

idxTree: Instance of the IdxTree class
POGraphs(dict): maps sequence IDs to POGraph class
nBranches(int): number of branch points in the tree
branchpoints(dict[str, BranchPoint]): Contains BranchPoint objects
parents(list[int]): maps the index of the child to the index of the parent
children(list[list[Union[int, None]]]): maps the index of the parent to an array containing the indexes the child/children
indices(dict[str, int]): Maps the sequence ID to the index on the tree
distances(list[float]): Maps the branchpoint to the distance to its parent

writeToNwk

writeToNwk(file_name: str, root: str = "N0")

Converts the POGTree object into a nwk string and writes this to a file

Parameters:

file_name(str) : name of nwk file
root(str): Default set to N0 at the "root" ancestor but can be changed to internal nodes to create subtrees if desired.

Returns:

str: The POGTree in nwk format

Example:

>>> tree = POGTree(nwk.nwk, aln.aln)
>>> nwk = tree.writeNwk(test_nwk.nwk)
>>> print(nwk)
(XP_004050792.2:0.040380067,XP_005216113.1:0.028035396,(XP_018963554.1:0.016721581,XP_016357833.1:0.024301326)N1:0.347992941)N0:0;

writeToFasta

writeToFasta(file_name: str)

Writes all sequences of the tree to file. Sequence for ancestors are based on a joint reconstruction and each symbol is the most likely at each position.

Parameters:

file_name(str) : name of fasta file

Returns:

None

Example:

>>>POGTree("test_fasta")
>>>

BranchPoint

BranchPoint(id: str, parent: Union[str, None], dist: float,
             children: list[str], seq: Optional[Sequence] = None)

Represents a branchpoint on a phylogenetic tree. Can contain information about the parents or children of that point and how long that branch point is.

Parameters:

id(str): Sequence ID
parent(str or None): ID of parent
dist(float): Distance to parent
children(list): IDs children of current BranchPoint
seq(Sequence): The sequence based on a joint reconstruction if the BranchPoint is an ancestor otherwise it is just the sequence of an extant. Contains a Sequence object.

SymNode

SymNode(name: int, symbol: str, edges: list)

Only implemented for output from joint reconstruction. Stores the most likely character at a given sequence position and all of the edges at this position.

Parameters:

name(int): index position in sequence
symbol(str): Most likely amino acid based on joint reconstruction
edges(list): Contains all outgoing edges at this position

Edge

Edge(start: int, end: int, edgeType: Optional[str] = None, recip: Optional[bool] = None, backward: Optional[bool] = None, forward: Optional[bool] = None, weight: Optional[float] = None)

Creates instance of an edge between two positions in a sequence. Currently only implemented for bidirectional edges.

Parameters:

start(int): position of beginning of edge
end(int): position of end of edge
edgeType(str): Currently only supports bidirectional edge
recip(bool): ASK ABOUT THIS
backward(bool): Direction of edge
forward(bool): Direction of edge
weight(float): Support of the edge