To assess the performance of PolySearch2, we conducted a speed test comparing only the speed of the original PolySearch with PolySearch2 on various queries with equivalent parameters. We then performed four evaluations to compare their accuracy. Finally three additional evaluations were conducted to assess the performance of PolySearch2 on several novel search tasks. Performance statistics including precision, recall, f-measure, and accuracy are presented in Table 1 for the 7 evaluations. Table 1 also lists the feature differences between PolySearch and PolySearch2.
Table 1 summarizes the performance evaluation and feature comparison of PolySearch 2.0 versus the original PolySearch. Evaluation #1 assesses PolySearch2’s ability to identify disease-gene association. Evaluation #2 assesses PolySearch2’s ability to identify drug-gene/protein associations. Evaluation #3 assesses PolySearch2’s ability to identify protein-protein interactions. Evaluation #4 assesses PolySearch2’s metabolite-gene associations. Evaluation #5 assesses PolySearch2’s ability to identify drugs with significant adverse effects, or ‘dangerous drugs’. Evaluation #6 assesses PolySearch2’s ability to identify toxin-disease association. Finally Evaluation #7 evaluates PolySearch2’s ability to identify toxin-adverse effect associations. Analysis speed is calculated based on multiple runs on query with 10,000 relevant documents.
All Evaluation datasets are available at the Downloads Page.
Prediction Accuracy | Precision | Recall | F-measure | Accuracy | Precision | Recall | F-measure | Accuracy |
---|---|---|---|---|---|---|---|---|
#1 Disease/Gene | 0.6533 | 1.0000 | 0.7903 | 0.6533 | 0.8708 | 0.9091 | 0.8895 | 0.8525 |
#2 Drug/Gene | 0.7490 | 1.0000 | 0.8565 | 0.7490 | 0.9701 | 0.8351 | 0.8975 | 0.8571 |
#3 Protein/Protein | 0.8396 | 1.0000 | 0.9128 | 0.8396 | 0.9432 | 0.9326 | 0.9379 | 0.8962 |
#4 Metabolite/Gene | 0.7834 | 1.0000 | 0.8785 | 0.7834 | 0.9579 | 0.8619 | 0.9074 | 0.8614 |
#5 Drug/Adverse Effect | - | - | - | - | 0.9233 | 0.8022 | 0.8585 | 0.7737 |
#6 Toxin/Disease | - | - | - | - | 0.9054 | 0.7864 | 0.8417 | 0.7810 |
#7 Toxin/Adverse Effect | - | - | - | - | 0.8808 | 0.6822 | 0.7689 | 0.7854 |
System Features | ||||||||
Thesaurus Size | 9 categories 57,706 terms with 353,862 synonyms | 20 categories 1,131,328 terms with 2,848,936 synonyms | ||||||
Filter words | 7011 | 29,718 | ||||||
Database Numbers | 1 corpus and 6 databases | 6 corpora and 14 databases | ||||||
Num. of Search Types | 66 query combinations | 273 query combinations | ||||||
Analysis Speed | 6.5 documents per second | 165 documents per second | ||||||
Mobile Friendly? | No | Yes |
To assess the flexibility of PolySearch2, we conducted an association test using BioASQ, a biomedical semantic Question Answering challenge's gold standard training dataset (Task 3B Training Set, released March 2015), and assessed PolySearch2's performance in finding associated disease concepts when presented with free-text sentences.
Table 2: Performance evaluation using the BioASQ Task 3B (biomedical semantic QA) gold standard training dataset. The search queries are question sententences from BioASQ and PolySearch2's disease association results are compared with tagged disease concepts in the BioASQ 3B gold standard training data set.
Prediction Accuracy | Precision | Recall | F-measure | Accuracy | Precision | Recall | F-measure | Accuracy |
---|---|---|---|---|---|---|---|---|
#8 BioASQ Question / Disease | - | - | - | - | 0.7284 | 0.6052 | 0.6611 | 0.7212 |
Liu Y., Liang Y., Wishart D.S. (2015) PolySearch 2.0: A significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins, and more. Nucleic Acids Res. 2015 Jul 1;43(Web Server Issue):W535-42.
Cheng D., Knox C., Young N., Stothard P., Damaraju S., Wishart D.S. (2008) PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res. 2008 Jul 1;36(Web Server Issue):W399-405.
This project is supported by the Canadian Institutes of Health Research (award #111062), Alberta Innovates - Health Solutions, and by The Metabolomics Innovation Centre (TMIC), a nationally-funded research and core facility that supports a wide range of cutting-edge metabolomic studies. TMIC is funded by Genome Alberta, Genome British Columbia, and Genome Canada, a not-for-profit organization that is leading Canada's national genomics strategy with $900 million in funding from the federal government.