> top > projects

Projects (159)

Name Description # Ann. Author Maintainer updated at Status
tmVarCorpus Wei C-H, Harris BR, Kao H-Y, Lu Z (2013) tmVar: A text mining approach for extracting sequence variants in biomedical literature, Bioinformatics, 29(11) 1433-1439, doi:10.1093/bioinformatics/btt156. 1,431 Chih-Hsuan Wei , Bethany R. Harris , Hung-Yu Kao and Zhiyong Lu Chih-Hsuan Wei 2015-08-06 Released
CellFinder CellFinder corpus 4,754 Mariana Neves, Alexander Damaschun, Andreas Kurtz, Ulf Leser Mariana Neves 2015-11-25 Released
bionlp-st-ge-2016-reference <p>It is the <b>benchmark reference data set</b> of the BioNLP-ST 2016 GE task. It includes Genia-style event annotations to 20 full paper articles which are about NFκB proteins. The task is to develop an automatic annotation system which can produce annotation similar to the annotation in this data set as much as possible.</p> <p>For evaluation of the performance of a participating system, the system needs to produce annotations to the documents in the <b>benchmark test data set</b> (<a href="http://pubannotation.org/projects/bionlp-st-ge-2016-test">bionlp-st-ge-2016-test</a>).</p> <p>GE 2016 benchmark data set is provided as multi-layer annotations which include: <ul> <li>bionlp-st-ge-2016-reference: benchmark reference data set (this project)</li> <li><a href="http://pubannotation.org/projects/bionlp-st-ge-2016-test">bionlp-st-ge-2016-test</a>: benchmark test data set (annotations are blined)</li> <li><a href="http://pubannotation.org/projects/bionlp-st-ge-2016-test-proteins">bionlp-st-ge-2016-test-proteins</a>: protein annotation to the benchmark test data set</li> </ul> </p> <p>Following is supporting resources: <ul> <li><a href="http://pubannotation.org/projects/bionlp-st-ge-2016-coref">bionlp-st-ge-2016-coref</a>: coreference annotation</li> <li><a href="http://pubannotation.org/projects/bionlp-st-ge-2016-uniprot">bionlp-st-ge-2016-uniprot</a>: Protein annotation with UniProt IDs.</li> <li><a href="http://pubannotation.org/projects/pmc-enju-pas">pmc-enju-pas</a>: dependency parsing result produced by Enju</li> <li><a href="http://pubannotation.org/projects/UBERON-AE">UBERON-AE</a>: annotation for anatomical entities as defined in UBERON</li> <li><a href="http://pubannotation.org/projects/ICD10">ICD10</a>: annotation for disease names as defined in ICD10</li> <li><a href="http://pubannotation.org/projects/GO-BP">GO-BP</a>: annotation for biological process names as defined in GO</li> <li><a href="http://pubannotation.org/projects/GO-CC">GO-CC</a>: annotation for cellular component names as defined in GO</li> </ul></p> <p>A SPARQL-driven search interface is provided at <a href="http://bionlp.dbcls.jp/sparql">http://bionlp.dbcls.jp/sparql</a>.</p> 14,356 DBCLS Jin-Dong Kim 2016-05-23 Released
spacy-test Random set of articles used for testing in the development of the RESTful spaCy parsing web service. Since development is now finished, they are released for the community to use. 136,597 Nico Colic Nico Colic 2016-05-25 Released
SPECIES800 SPECIES 800 (S800): an abstract-based manually annotated corpus. S800 comprises 800 PubMed abstracts in which organism mentions were identified and mapped to the corresponding NCBI Taxonomy identifiers. <br> Described in: <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0065390" target="blank">The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text.</a> Pafilis E, Frankild SP, Fanini L, Faulwetter S, Pavloudi C, et al. (2013). PLoS ONE, 2013, 8(6): e65390. <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0065390" target="blank">doi:10.1371/journal.pone.0065390</a> 3,708 Evangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, Lars Juhl Jensen evangelos 2015-11-20 Released
NCBIDiseaseCorpus The NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community. 6,881 Rezarta Islamaj Doğan,Robert Leaman,Zhiyong Lu Chih-Hsuan Wei 2015-08-06 Released
2015-BEL-Sample-2 The 295 BEL statements for sample set used for the 2015 BioCreative challenge. 11,430 Fabio Rinaldi Nico Colic 2016-05-25 Released
CyanoBase Cyanobacteria are prokaryotic organisms that have served as important model organisms for studying oxygenic photosynthesis and have played a significant role in the Earthfs history as primary producers of atmospheric oxygen.<br> Publication: http://www.aclweb.org/anthology/W12-2430 1,101 Kazusa DNA Research Institute and Database Center for Life Science (DBCLS) Yue Wang 2016-05-17 Released
bionlp-st-ge-2016-test <p>It is the <b>benchmark test data set</b> of the BioNLP-ST 2016 GE task. It includes Genia-style event annotations to 14 full paper articles which are about NFκB proteins. For testing purpose, however, <b>annotations are all blinded</b>, which means users cannot see the annotations in this project. Instead, <b>annotations in any other project can be compared to the hidden annotations in this project</b>, then the annotations in the project will be automatically evaluated based on the comparison.</p> <p>A participant of GE task can get the evaluation of his/her result of automatic annotation, through following process: <ol> <li><a href="http://www.pubannotation.org/docs/create-project/">Create a new project</a>.</li> <li><a href="http://www.pubannotation.org/docs/import-document/">Import documents</a> from the project, <a href="http://pubannotation.org/projects/bionlp-st-ge-2016-test-proteins">bionlp-st-2016-test-proteins</a> to your project.</li> <li><a href="http://www.pubannotation.org/docs/import-annotations/">Import annotations</a> from the project, <a href="http://pubannotation.org/projects/bionlp-st-ge-2016-test-proteins">bionlp-st-2016-test-proteins</a> to your project.</li> <li>At this point, you may want to <a href="http://www.pubannotation.org/docs/compare-project/">compare you project to this project</a>, the benchmark data set. It will show that protein annotations in your project is 100% correct, but other annotations, e.g., events, are 0%.</li> <li>Produce event annotations, using your system, upon the protein annotations.</li> <li>Upload your event annotations to your project.</li> <li><a href="http://www.pubannotation.org/docs/compare-project/">Compare your project to this project</a>, to get evaluation.</li> </ol></p> <p>GE 2016 benchmark data set is provided as multi-layer annotations which include: <ul> <li><a href="http://pubannotation.org/projects/bionlp-st-ge-2016-reference">bionlp-st-ge-2016-reference</a>: <b>benchmark reference data set</b></li> <li>bionlp-st-ge-2016-test: benchmark test data set (this project)</li> <li><a href="http://pubannotation.org/projects/bionlp-st-ge-2016-test-proteins">bionlp-st-ge-2016-test-proteins</a>: protein annotation to the benchmark test data set</li> </ul> </p> <p>Following is supporting resources: <ul> <li><a href="http://pubannotation.org/projects/bionlp-st-ge-2016-coref">bionlp-st-ge-2016-coref</a>: coreference annotation</li> <li><a href="http://pubannotation.org/projects/bionlp-st-ge-2016-uniprot">bionlp-st-ge-2016-uniprot</a>: Protein annotation with UniProt IDs.</li> <li><a href="http://pubannotation.org/projects/pmc-enju-pas">pmc-enju-pas</a>: dependency parsing result produced by Enju</li> <li><a href="http://pubannotation.org/projects/UBERON-AE">UBERON-AE</a>: annotation for anatomical entities as defined in UBERON</li> <li><a href="http://pubannotation.org/projects/ICD10">ICD10</a>: annotation for disease names as defined in ICD10</li> <li><a href="http://pubannotation.org/projects/GO-BP">GO-BP</a>: annotation for biological process names as defined in GO</li> <li><a href="http://pubannotation.org/projects/GO-CC">GO-CC</a>: annotation for cellular component names as defined in GO</li> </ul></p> <p>A SPARQL-driven search interface is provided at <a href="http://bionlp.dbcls.jp/sparql">http://bionlp.dbcls.jp/sparql</a>.</p> DBCLS Jin-Dong Kim 2016-05-22 Released
bionlp-st-id-2011-training The training dataset from the infectious diseases (ID) task in the BioNLP Shared Task 2011. <br> Entity types: <br>- Genes and gene products: gene, RNA, and protein name mentions. <br>- Two-component systems: mentions of the names of two-component regulatory systems, frequently embedding the names of the two Proteins forming the system.<br>- Chemicals: mentions of chemical compounds such as "NaCL".<br>- Organisms: mentions of organism names or organism specification through specific properties (e.g. "graRS mutant").<br>- Regulons/Operons: mentions of names of specific regulons and operons. 5,609 University of Tokyo Tsujii Laboratory, NaCTeM and Biocomplexity Institute of Virginia Tech Yue Wang 2017-04-18 Released