NIF is developing annotation standards for neuroscience-relevant data and databases to make it easier to search across and integrate data from multiple sources. For our perspective on why such standards are needed, please read our blog . Our motto in the creation of these standards is "arbitrary but defensible", i.e., we will never be able to develop a standard that is 100% agreed upon but it should be based on a set of clearly defined and well reasoned criteria. As the NIF project is built upon a unified semantic framework based on the NIFSTD ontologies and Neurolex lexicon, all standards will be defined according to these ontologies. However, NIF is moving beyond simple mapping of content to providing a more consistent user experience for navigating among very heterogeneous data sources. The more we can do to make sources look and "behave" the same, the better we serve our users.
All sources should look and "act" the same to the extent possible. Thus, each view through the NIF should adhere to a consistent set of guidelines for creating these views. The order of columns should be uniform and all services and views should adhere to these standards. The order of the columns should generally follow the same format so that users have a consistent experience when they change from source to source. Obviously, the exact number of columns will change depending on the source.
NIF is providing both column and value mapping to enhance the semantic search and to pave the way for export of the NIF linked data graph. The purpose of the column mapping is to set the ontological domain of the entities contained within. Each of these domains generally corresponds to one of the NIFSTD modules: organism, anatomical entity, cell, subcellular entity, molecule, function, disease, technique, resource. We do not want to map the column at too granular level so as to avoid consistency problems with the contents. At this point, we are also not mapping column roles. So the fact that an organism serves as the subject of a study will not be reflected in the mapping: any column containing an organism should be mapped to organism. Similarly, even if a column contains brain parts, we will map to anatomical entity, to ensure that if the source later adds parts of the spinal cord or parts of the peripheral nervous system, that we will not be in conflict. This policy may be revisited as the ontologies evolve.
NIF should facilitate interlinking of data, literature and tools wherever possible. When any entity has a unique ID, NIF should provide the ID in its views. The following practices should be implemented for all sources:
Form of Identifier: Source:ID, e.g., PMID:000000000
Literature citations: Where possible, add references as PMIDs to each record. When PMIDs not available, then use DOIs.
External database/data set reference: Add database identifiers like DatabaseShortName:identifier (e.g., GEO:GSE12345)
Organism: "Organism" is the preferred name for any column containing organism name, with the exception of human subjects where not deemed appropriate. For any other organism terms such as species, animal, host, use organism (unless this is not sufficiently specific). Use the NCBI tax id as identifier. If the animal is a transgenic then add the identifier from the species specific database (e.g., MGI identifier).
Brain region: For any brain region, use the NIFSTD label.
Cell: use an existing cell type listed in the NeuroLex
Protein: For any protein, use PRO ids.
Small molecule: use ChEBI ids.
Gene: Use Gene Symbol and Gene ID for Genes. Gene name and probe identifier should not be used through the NIF interface, although they should be searchable through the view.
Current NIF annotation standards are maintained in the neurolex Wiki under NIF Annotation Standard