Basic classes for working with genomic data (genometools.basic
)¶
GeneSet |
A gene set. |
GeneSetCollection |
A collection of gene sets. |
-
class
genometools.basic.
GeneSet
(id, name, genes, source=None, collection=None, description=None)[source]¶ A gene set.
A gene set is just what the name implies: A set of genes. Usually, gene sets are used to group genes that share a certain property (e.g., genes that perform related functions, or genes that are frequently co-expressed). The genes in the gene set are not ordered.
GeneSet instances are hashable and should therefore be considered to be immutable.
Parameters: -
id_
¶ str – The (unique) ID of the gene set.
-
name
¶ str – The name of the gene set.
-
genes
¶ set of str – The list of genes in the gene set.
-
source
¶ None or str – The source / origin of the gene set (e.g., “MSigDB”)
-
collection
¶ None or str – The collection that the gene set belongs to (e.g., “c4” for gene sets from MSigDB).
-
description
¶ None or str – The description of the gene set.
-
classmethod
from_list
(l)[source]¶ Generate an GeneSet object from a list of strings.
Note: See also
to_list()
.Parameters: l (list or tuple of str) – A list of strings representing gene set ID, name, genes, source, collection, and description. The genes must be comma-separated. See also to_list()
.Returns: The gene set. Return type: genometools.basic.GeneSet
-
hash
¶ MD5 hash value for the gene set.
-
size
¶ The size of the gene set (i.e., the number of genes in it).
-
to_list
()[source]¶ Converts the GeneSet object to a flat list of strings.
Note: see also
from_list()
.Returns: The data from the GeneSet object as a flat list. Return type: list of str
-
-
class
genometools.basic.
GeneSetCollection
(gene_sets)[source]¶ A collection of gene sets.
This is a class that basically just contains a list of gene sets, and supports different ways of accessing individual gene sets. The gene sets are ordered, so each gene set has a unique position (index) in the database.
Parameters: gene_sets (list or tuple of GeneSet
) – Seegene_sets
attribute.-
gene_sets
¶ tuple of
GeneSet
– The list of gene sets in the database. Note that this is a read-only property.
-
get_by_id
(id_)[source]¶ Look up a gene set by its ID.
Parameters: id (str) – The ID of the gene set. Returns: The gene set. Return type: GeneSet Raises: ValueError
– If the given ID is not in the database.
-
get_by_index
(i)[source]¶ Look up a gene set by its index.
Parameters: i (int) – The index of the gene set. Returns: The gene set. Return type: GeneSet Raises: ValueError
– If the given index is out of bounds.
-
index
(id_)[source]¶ Get the index corresponding to a gene set, identified by its ID.
Parameters: id (str) – The ID of the gene set. Returns: The index of the gene set. Return type: int Raises: ValueError
– If the given ID is not in the database.
-
n
¶ The number of gene sets in the database.
-
classmethod
read_msigdb_xml
(path, entrez2gene, species=None)[source]¶ Read the complete MSigDB database from an XML file.
The XML file can be downloaded from here: http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/msigdb_v5.0.xml
Parameters: Returns: The gene set database containing the MSigDB gene sets.
Return type:
-