Projects (159)

Name Description # Ann. Author Maintainer updated at Status
pubtator-sample Sample annotation of PubTator produced by Zhiyong Lu et al. 28 Zhiyong Lu Jin-Dong Kim 2016-01-19 Testing
Frame annotation ver1 0 Younggyun Hahm kaist_nlp 2015-11-10 Testing
PennBioIE The PennBioIE corpus (0.9) covers two domains of biomedical knowledge. One is the inhibition of the cytochrome P450 family of enzymes (CYP450 or CYP for short) , and the other domain is the molecular genetics of dance (oncology or onco for short). 23,881 UPenn Biomedical Information Extraction Project Yue Wang 2016-12-06 Released
bionlp-st-id-2011-training The training dataset from the infectious diseases (ID) task in the BioNLP Shared Task 2011. <br> Entity types: <br>- Genes and gene products: gene, RNA, and protein name mentions. <br>- Two-component systems: mentions of the names of two-component regulatory systems, frequently embedding the names of the two Proteins forming the system.<br>- Chemicals: mentions of chemical compounds such as "NaCL".<br>- Organisms: mentions of organism names or organism specification through specific properties (e.g. "graRS mutant").<br>- Regulons/Operons: mentions of names of specific regulons and operons. 5,609 University of Tokyo Tsujii Laboratory, NaCTeM and Biocomplexity Institute of Virginia Tech Yue Wang 2017-04-18 Released
PIR-corpus1 The Protein Information Resource (PIR) is not biased towards any particular biomedical domain, and is expected to provide more diverse protein names in a given sample size. Annotation category: protein, compound-protein, acronym. 4,443 University of Delaware and Georgetown University Medical Center Yue Wang 2016-11-14 Released
PIR-corpus2 The protein tag was used to tag proteins, or protein-associated or -related objects, such as domains, pathways, expression of gene.<br> Annotation guideline: http://pir.georgetown.edu/pirwww/about/doc/manietal.pdf 5,521 University of Delaware and Georgetown University Medical Center Yue Wang 2017-03-07 Released
CRAFT-treebank Penn Treebank markup for each sentence of the Colorado Richly Annotated Full Text Corpus (CRAFT). 844,123 UColorado Jin-Dong Kim 2015-11-19 Beta
PubmedHPO Human phenotype annotation to PubMed abstracts, based on the HPO ontology 12,437,742 Tudor Groza tudor 2016-12-06 Beta
BioLarkPubmedHPO 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. For more info, please see Groza et al. "Automatic concept recognition using the human phenotype ontology reference and test suite corpora", 2015. 7,236 Tudor Groza simon 2017-03-28 Released
GlycoBiology-GDGDB GDGDB-based annotation to GlycoBiology abstracts 2,458 Toshihide Shikanai shikanai 2016-02-01 Testing