> top > projects

Projects

NameTDescription# Ann.AuthorMaintainerUpdated_atStatus

21-40 / 182 show all
SPECIES800SPECIES 800 (S800): an abstract-based manually annotated corpus. S800 comprises 800 PubMed abstracts in which organism mentions were identified and mapped to the corresponding NCBI Taxonomy identifiers. Described in: The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. Pafilis E, Frankild SP, Fanini L, Faulwetter S, Pavloudi C, et al. (2013). PLoS ONE, 2013, 8(6): e65390. doi:10.1371/journal.pone.00653903.71 KEvangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, Lars Juhl Jensenevangelos2015-11-20Released
BioLarkPubmedHPO228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. For more info, please see Groza et al. "Automatic concept recognition using the human phenotype ontology reference and test suite corpora", 2015.7.24 KTudor Grozasimon2017-03-28Released
NCBIDiseaseCorpusThe NCBI disease corpus is fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community.6.88 KRezarta Islamaj Doğan,Robert Leaman,Zhiyong LuChih-Hsuan Wei2015-08-06Released
CellFinderCellFinder corpus4.75 KMariana Neves, Alexander Damaschun, Andreas Kurtz, Ulf LeserMariana Neves2015-11-25Released
CoMAGCIn order to access the large amount of information in biomedical literature about genes implicated in various cancers both efficiently and accurately, the aid of text mining (TM) systems is invaluable. Current TM systems do target either gene-cancer relations or biological processes involving genes and cancers, but the former type produces information not comprehensive enough to explain how a gene affects a cancer, and the latter does not provide a concise summary of gene-cancer relations. In order to support the development of TM systems that are specifically targeting gene-cancer relations but are still able to capture complex information in biomedical sentences, we publish CoMAGC, a corpus with multi- faceted annotations of gene-cancer relations. In CoMAGC, a piece of annotation is composed of four semantically orthogonal concepts that together express 1) how a gene changes, 2) how a cancer changes and 3) the causality between the gene and the cancer. The multi-faceted annotations are shown to have high inter-annotator agreement. In addition, the annotations in CoMAGC allow us to infer the prospective roles of genes in cancers and to classify the genes into three classes according to the inferred roles. We encode the mapping between multi-faceted annotations and gene classes into 10 inference rules. The inference rules produce results with high accuracy as measured against human annotations. CoMAGC consists of 821 sentences on prostate, breast and ovarian cancers. Currently, the corpus deals with changes in gene expression levels among other types of gene changes.1.53 KLee et alHee-Jin Lee2015-02-24Released
bionlp-st-bb3-2016-trainingEntity (bacteria, habitats and geographical places) annotation to the training dataset of the BioNLP-ST 2016 BB task. For more information, please refer to bionlp-st-bb3-2016-development and bionlp-st-bb3-2016-test. Bacteria Bacteria entities are annotated as contiguous spans of text that contains a full unambiguous prokaryote taxon name, the type label is Bacteria. The Bacteria type is a taxon, at any taxonomic level from phylum (Eubacteria) to strain. The category that the text entities have to be assigned to is the most specific and unique category of the NCBI taxonomy resource. In case a given strain, or a group of strains is not referenced by NCBI, it is assigned with the closest taxid in the taxonomy. Habitat Habitat entities are annotated as spans of text that contains a complete mention of a potential habitat for bacteria, the type label is Habitat. Habitat entities are assigned one or several concepts from the habitat subpart of the OntoBiotope ontology. The assigned concepts are as specific as possible. OntoBiotope defines most relevant microorganism habitats from all areas considered by microbial ecology (hosts, natural environment, anthropized environments, food, medical, etc.). Habitat entities are rarely referential entities, they are usually noun phrases including properties and modifiers. There are rare cases of habitats referred with adjectives or verbs. The spans are generally contiguous but some of them are discontinuous in order to cope with conjunctions. Geographical Geographical entities are geographical and organization places denoted by official names.1.29 KINRAYue Wang2017-05-22Released
bionlp-st-pc-2013-trainingThe training dataset from the pathway curation (PC) task in the BioNLP Shared Task 2013. The entity types defined in the PC task are simple chemical, gene or gene product, complex and cellular component.7.86 KNaCTeM and KISTIYue Wang2017-08-28Released
spacy-testRandom set of articles used for testing in the development of the RESTful spaCy parsing web service. Since development is now finished, they are released for the community to use.137 KNico ColicNico Colic2016-05-25Released
PIR-corpus2The protein tag was used to tag proteins, or protein-associated or -related objects, such as domains, pathways, expression of gene. Annotation guideline: http://pir.georgetown.edu/pirwww/about/doc/manietal.pdf5.52 KUniversity of Delaware and Georgetown University Medical CenterYue Wang2017-03-07Released
CyanoBaseCyanobacteria are prokaryotic organisms that have served as important model organisms for studying oxygenic photosynthesis and have played a significant role in the Earthfs history as primary producers of atmospheric oxygen. Publication: http://www.aclweb.org/anthology/W12-24301.1 KKazusa DNA Research Institute and Database Center for Life Science (DBCLS)Yue Wang2016-05-17Released
2015-BEL-Sample-2The 295 BEL statements for sample set used for the 2015 BioCreative challenge.11.4 KFabio RinaldiNico Colic2016-05-25Released
DisGeNETDisease-Gene association annotation.3.12 MNuria Queralt Jin-Dong Kim2016-01-28Beta
NEUROSESThis corpus is composed of PubMed articles containing cognitive enhancers and anti-depressants drug mentions. The selected sentences are automatically annotated using the NCBO Annotator with the Chemical Entities of Biological Interest (CHEBI) and Phenotypic Quality Ontology (PATO) ontologies, we also produced annotations using PhenoMiner ontology via a dictionary-based tagger.2.15 Mnestoralvaro2016-02-24Beta
PubCasesORDOORDO annotation in PubCases869 KToyofumi Fujiwara2017-09-14Beta
PubCasesHPOHPO annotation in PubCases3.2 MToyofumi Fujiwara2017-09-06Beta
bionlp-st-ge-2016-uniprotUniProt protein annotation to the benchmark data set of BioNLP-ST 2016 GE task: reference data set (bionlp-st-ge-2016-reference) and test data set (bionlp-st-ge-2016-test). The annotations are produced based on a dictionary which is semi-automatically compiled for the 34 full paper articles included in the benchmark data set (20 in the reference data set + 14 in the test data set). For detailed information about BioNLP-ST GE 2016 task data sets, please refer to the benchmark reference data set (bionlp-st-ge-2016-reference) and benchmark test data set (bionlp-st-ge-2016-test). 16.2 KDBCLSJin-Dong Kim2016-05-22Beta
CRAFT-treebankPenn Treebank markup for each sentence of the Colorado Richly Annotated Full Text Corpus (CRAFT).844 KUColoradoJin-Dong Kim2015-11-19Beta
craftTest bed for PubAnnotation query development.53 KKevin Bretonnel CohenKevinBretonnelCohen2015-10-13Beta
PubmedHPOHuman phenotype annotation to PubMed abstracts, based on the HPO ontology12.4 MTudor Grozatudor2017-07-24Beta
QFMC_MEDLINEQuaero French Medical Corpus: Annotation of MEDLINE titles5.97 KAurélie NévéolPierre Zweigenbaum2016-02-03Beta
NameT# Ann.AuthorMaintainerUpdated_atStatus

21-40 / 182 show all
SPECIES8003.71 KEvangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, Lars Juhl Jensenevangelos2015-11-20Released
BioLarkPubmedHPO7.24 KTudor Grozasimon2017-03-28Released
NCBIDiseaseCorpus6.88 KRezarta Islamaj Doğan,Robert Leaman,Zhiyong LuChih-Hsuan Wei2015-08-06Released
CellFinder4.75 KMariana Neves, Alexander Damaschun, Andreas Kurtz, Ulf LeserMariana Neves2015-11-25Released
CoMAGC1.53 KLee et alHee-Jin Lee2015-02-24Released
bionlp-st-bb3-2016-training1.29 KINRAYue Wang2017-05-22Released
bionlp-st-pc-2013-training7.86 KNaCTeM and KISTIYue Wang2017-08-28Released
spacy-test137 KNico ColicNico Colic2016-05-25Released
PIR-corpus25.52 KUniversity of Delaware and Georgetown University Medical CenterYue Wang2017-03-07Released
CyanoBase1.1 KKazusa DNA Research Institute and Database Center for Life Science (DBCLS)Yue Wang2016-05-17Released
2015-BEL-Sample-211.4 KFabio RinaldiNico Colic2016-05-25Released
DisGeNET3.12 MNuria Queralt Jin-Dong Kim2016-01-28Beta
NEUROSES2.15 Mnestoralvaro2016-02-24Beta
PubCasesORDO869 KToyofumi Fujiwara2017-09-14Beta
PubCasesHPO3.2 MToyofumi Fujiwara2017-09-06Beta
bionlp-st-ge-2016-uniprot16.2 KDBCLSJin-Dong Kim2016-05-22Beta
CRAFT-treebank844 KUColoradoJin-Dong Kim2015-11-19Beta
craft53 KKevin Bretonnel CohenKevinBretonnelCohen2015-10-13Beta
PubmedHPO12.4 MTudor Grozatudor2017-07-24Beta
QFMC_MEDLINE5.97 KAurélie NévéolPierre Zweigenbaum2016-02-03Beta