Using PPI Networks in hierarchical multi-label classification trees for gene function prediction
published: Oct. 23, 2012, recorded: September 2012, views: 3800
Slides
Related content
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
Motivation: Catalogs, such as Gene Ontology (GO) and MIPS-FUN, assume that functional classes
are organized hierarchically (general functions include more specific functions). This has recently
motivated the development of several machine learning algorithms under the assumption that instances
may belong to multiple hierarchy organized classes. Besides relationships among classes,
it is also possible to identify relationships among examples. Although such relationships have been
identified and extensively studied in the in the area of protein-to-protein interaction (PPI)
networks, they have not received much attention in hierarchical protein function prediction. The
use of such relationships between genes introduces autocorrelation and violates the assumption
that instances are independently and identically distributed, which underlines most machine
learning algorithms. While this consideration introduces additional complexity to the learning
process, we expect it would also carry substantial benefits.
Results: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation
in multi-class gene function prediction. We develop a tree-based algorithm for considering
network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). The
empirical evaluation of the proposed algorithm, called NHMC, on 24 yeast datasets using MIPSFUN
and GO annotations and exploiting three different PPI networks, clearly shows that taking
autocorrelation into account improves performance.
Conclusions: Our results suggest that explicitly taking network autocorrelation into account increases
the predictive capability of the models, especially when the underlying PPI network is
dense. Furthermore, NHMC can be used as a tool to assess network data and the information it
provides with respect to the gene function.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !
Write your own review or comment: