Download as a PDF

Note: Please pay more attention to the text in red.  

What is a NIF Resource? 

NIF defines as a resource any data, databases, software/web-based tool, material, models, networks or information that would accelerate the pace of neuroscience research and discovery.

NIF has 4 core indexes through which resources are searched: the NIF Registry, the NIF Data Federation, Literature, and Grants. Grants and the Registry are databases within the Data Federation but also have their own tab. Using the DISCO protocol, the Data Federation queries independently maintained databases and datasets to return relevant content but database providers must register their data with the NIF Data Federation and specify permissions. Grants exposes funding opportunities from the grants.gov database, a central storehouse for information on over 1,000 grant programs open for application and provides access to approximately $500 billion in annual awards. Literature exposes the contents of PubMed and Open Access databases, including PubMed Central.  

The NIF community is welcome and encouraged to add resources to the NIF Registry and NIF Data Federation. The goal of NIF is to bring people to your site; NIF does not maintain any resources locally. NIF has a full time curator and technical support to assist with the registration process. 

NIF Annotation Standards

NIF is developing annotation standards for neuroscience-relevant data and databases to make it easier to search across and integrate data from multiple sources. For our perspective on why such standards are needed, please read our blog. Our motto in the creation of these standards is "arbitrary but defensible", i.e., we will never be able to develop a standard that is 100% agreed upon but it should be based on a set of clearly defined and well reasoned criteria.  As the NIF project is built upon a unified semantic framework based on the NIFSTD ontologies and Neurolex lexicon, all standards will be defined according to these ontologies. However, NIF is moving beyond simple mapping of content to providing a more consistent user experience for navigating among very heterogeneous data sources. The more we can do to make sources look and "behave" the same, the better we serve our users.    

Standards NIF has developed include a standard column order, standardized column mapping; standards for age classification, differential expression, treatment paradigm, and dementia severity; as well as guidelines for entity mapping and linking data. See Annotation Standards below.

NIF Registry Resource

The NIF Registry, a core resource of NIF, is a catalog of web resources that have been selected by NIF curators, or contributed by the community, as valuable tools for researchers and students in the field of neuroscience. The NIF Registry contains a listing of a variety of resources including databases, software tools, brain atlases, granting agencies, tissue banks, and many others. This list of resources is being continuously added to and updated by NIF's staff, affiliates, and people who recommend their resources to NIF. 

The NIF Registry uses NIF vocabularies to provide high level descriptions of the nature of the resource and its contents.  However, unless the resource is a database or data set and has registered with the NIF data integration tools, the NIF Registry does not search the contents of these databases directly. For example, searching for global key words such as "genes" or "tissue bank" will bring up the various resources that have those descriptors, whereas "GRM1" or " C57BL/6J-rcw3J/J" will not bring up results, as the specific gene name or strain names are not tagged for each resource. The NIF Registry is a place where there is a list of Alzheimer's disease tissue banks, but it will not tell the user which types of tissues are found in each tissue bank. This type of "drill down" search is provided for a subset of databases through NIF's Data Federation . 

Community involvement is encouraged. Anyone may add a new resource or edit existing resources. All additions and edits are curated by NIF staff to comply with NIF standards and policies.

It shall be considered an individual resource if it is maintained by a single entity, and has the properties of one or more individual web pages that are related by a theme and HTML links. Most often the individual pages share portions of the URL, however, unrelated URLs may be incorporated into a single web resource. In the event that a subgroup of pages represents a sufficient shift in theme, it should be classified as an independent resource. For example, the department of neuroscience of a university (resource 1) may have a lab led by a researcher (resource 2).  

Registering a resource to the NIF Registry is the simplest form of registration. The registration form asks for the name of the resource, URL, and some additional basic information, including resource type(s).

The resource becomes immediately available through NeuroLex (a wiki containing the resource form) where it is assigned a NIF ID. It is also be included in the NIF Registry (updated weekly), where it is available through direct query (through NIF's search results) with links back to the original source. 

 Anyone, whether it is the resource owner or not, may register any non-commercial (exceptions apply) neuroscience related resource. If you are the resource owner you may add the “Registered with NIF” icon to your site. 

 Resource owners may also place the fully customizable NIF Navigator on your own website to search NIF’s holdings. 

 After a resource is registered as a NIF Registry resource, resource owners may desire to create a sitemap or register their resource as part of the Data Federation to provide direct access to dynamic content or structure of the content – see Sitemap and Data Federation below.

What resources are included in the NIF Registry?

The NIF Registry is not exclusive to any one type of resource. Rather, it contains a myriad of resources that are deemed valuable to the neuroscience community. Most of these are freely available on the Web, although some are restricted to a small community of users due to commercial interests, or laws governing the sharing of sensitive data. We are relying on feedback from the community as to what types of resources they would like to see. For example, would you like to see more commercial resources? Should we be including well known general resources such as GenBank and NCBI? Should we be listing journals and scientific organizations? This Wiki page (bottom of page) allows for commenting on our curation policies. The rule of thumb that NIF staff tend to have is that if the resource is conceivably useful to some neuroscientists, then it should be included. 

 Two classes of resources generally not included in the NIF Registry are electronic journals and commercial products. In general, NIF is not yet ready to include journals and commercial sites, although exceptions have been made in both cases. Journals are already searched in the Literature section, so providing additional access to them seems redundant unless there is some specific reason for doing so, such as a database of supplemental materials published in the journal. Literature searches typically do not search these types of materials adequately, and so they are valuable additions to the NIF. We have not yet created a list of commercial interests due to staffing issues, but we understand that many neuroscientists would like to have a list of commercial software tools, for example, that can be annotated and perhaps reviewed. Again, we encourage you to let us know what you need by providing feedback. 

 NIF places a high priority on resources that are recommended by their owners, so these resources are typically included in the NIF Registry relatively quickly. To register a resource, please visit our registration tool . All registered resources are reviewed by NIF curators.

What makes a good resource?    

A good resource is one that is determined to be of value to the greater neuroscience/biomedical community (including: scientists, students, teachers, clinicians etc.). As the field of neuroscience is broad and can be affected by all aspects of biomedicine, we catalog all biomedically relevant resources. NIF is only interested in "actionable" resources, i.e., resources that the public can take and do something with, such as a database, tool or service. Although we currently include departments and labs within universities, this rather pushes the limit of the "it must be actionalble" rule.  Within universities, there are numerous Institutes, such as the Unilever Center for Molecular Informatics within the Department of Chemistry at Cambridge. These types of groupings within universities we would say are non-actionable and should not be in the Registry.  We are starting to move Universities and departments from the resource registry into the resource ontology.  

Additional criteria include:

Adding a NIF Registry Resource

To add a resource to the NIF Registry, we have created a special form that can be accessed by anyone by using the Register A Resource form. (This form can be found under "NIF Data Sharing" in the left panel of NIF's home page.)  Alternatively, users can add a resource to the NIF Registry by visiting NeuroLex and entering the name of the resource in the "Create a New Resource" box on the home page.  If using this method, please remember to first check to see if the resource is already in the system. 

 Both of these options require filling out a simple form with facts about the resource. The most important field to include when filling out the form is the URL of the resource. Additional fields are requested and we encourage the community to do the best they can to fill these out; all fields may not be applicable, e.g., a software resource may not have an associated organism. The better the record is filled out the more it benefits the user/owner as NIF processes this text and relationships  to determine relevancy and rank in the search results. Resource owners or knowledgeable patrons are encouraged to keep the resource's content up-to-date; e.g., if the database now covers a new organism, add the new organism's name to the organism field. If a new paper about the resource has been published, add the PubMed ID to the PMID field. 

 Remember that the goal of these entries is to be used for search across resources. Information, as far as possible, must be machine readable and human readable. Therefore, do not just copy terms, but curate them so that they are machine readable and human readable. Two examples will illustrate:

 Remember that all entries will be curated by NIF staff, so it is not imperative you get everything just right. 

 The resource is immediately available and searchable through NeuroLex, with the tag "Curated" or "Uncurated" depending on its status of the resource. All resources will be deposited into the NIF Registry database the following Monday (provided the previous Thursday evening deadline is met). 

 Generally speaking, all new resources will be curated by a NIF Curator within 7 days. 

 Other methods of resource nomination include emailing curation@neuinfo.org, or notifying NIF staff directly. 

 Due to increased  levels of spam, the ability to create new pages has been restricted to only allow logged in users to do this. Therefore you will need to create an account (top right), / Login to NeuroLex, prior to adding a new resource. You can still edit existing pages without an account or being logged in. 

 Curators should always be logged in when adding new resources/editing.

 Naming a Resource 

Generally speaking, try to name resources just as they are presented on the website e.g., PubMed would not be Pubmed, or Pub Med. Additional exceptions include:

Changing a resource's name

See Changing a resource's name below.

Requested Resource Form Fields (Basic tab)

Description

The description can often be obtained by reading/copying the "about us" section of the resource or from the home page. In the first snippet it is important to convey what the resource offers.  The following paragraphs can provide more detail for interested visitors. 

 How to write good resource descriptions

When writing a resource description, please keep in mind that the first snippet will be used in the NIF Registry listing.  With that in mind, the main thing to consider for the resource is:

****What is the primary product offered at this website?  Software tool?  A data set?  A service?**** 

 While it is tempting to copy the description of the resource verbatim from the web site, please do not do this indiscriminately rather turn them into informative, pithy, machine-readable resource descriptions. These descriptions will be displayed as snippets by many tools that access the resource registry. Thus, the first line of the description should be as informative as possible. 

 Example:

Let's say we wanted to add a resource that is called: Cow brain gene expression atlas 

 Good leading sentence:  Atlas detailing the three dimensional expression of 20,000 genes across major regions of the cow brain... 

Bad leading sentence:  The Cow Brain Gene Expression Atlas was developed by the University of X and aims to provide an increased understanding of ..."

The following are guidelines and best practices for reviewing and writing resource descriptions:

Other names (Synonyms)

Include variations of the name that are used in the website or associated paper. Save abbreviations for the Abbreviation field (Below) 

Parent Organization

The physical institution that houses/maintains the resource, example the Allen Brain Atlas is housed at the Allen Brain Institute. A physical address can be associated with the parent resource.

We will provide more of a top level view rather than delve into department or lab level. E.g., the NIF's parent organization would be the University of California at San Diego; California; USA, rather than the Center for Research in Biological Systems; University of California San Diego; California; USA, or the California Institute for Telecommunications and Information Technology (Calit2). In other words, where applicable, do not use department or lab for this field, rather the institution, unless it is already entered as a Resource or it is felt it would make a good resource (See above, What makes a good resource?). 

 Top level institutions are generally in the following format: University name; State; Country. In order for this field - or any other field - to match in the NeuroLex (Semantic Media Wiki), it will have to match just as it was entered. Due to minor variability, you may just want to search for it, then copy it.  If there is no parent organization, such as the Jackson Labs, then leave the field blank. 

 If the Parent organization is a resource within the system, "Resource:" will need to be appended to the beginning of it for the system to recognize it.E .g., Jackson Laboratory must be written Resource:Jackson Laboratory. (Note there is no space between “Resource:” and the name) 

 If the parent organization is not obvious, the URL can be of use because it is often telling. E.g., For the resource, "National Long Term Care Survey," you can see the URL, http://www.nltcs.aas.duke.edu/index.htm shows the resource's association with Duke University. If it is not as blatant as this, you can try shortening the URL and seeing where is resolves. Alternatively, you can search PubMed for the resource to see if there is an associated paper and get the information from the paper. When using this last method, as collaborations are common, use the PI's (last author's ) parent organization using the most recent available paper. 

Parent Organization Naming Scheme

Use the English version of any organization name if it is readily available and state the country where the organization is located, and the laboratories therein. 

 If the parent organization is a university, please follow the format listed below: 

 Universities located in the United States should be written as follows:

University Name; State; USA

Example: Cornell University; New York; USA 

 Universities located outside the United States should be written as follows:

University Name; City/State/Province; Country (1st choice=state, 2nd=province, 3rd=city)

Example: University of Oxford; Oxford; United Kingdom

Example: University of Alberta; Alberta; Canada 

 Please discard the following formatting word selections:

Supporting Agency

Look for supporting agency(s) on the website. This will often be available at the bottom of the page or in an acknowledgements section. When this information is not found on the website, it can often be obtained from a paper(s) about the resource. Papers that describe the resource can often be obtained by searching PubMed for the name of the resource. Verify the paper is describing the resource not just mentioning it. The information can be found in the "Acknowledgements" or "Funding" section. 

 Separate multiple funders by a comma.

Funding support

Look for the grants(s) funding the resource on the website. This information will often be available at the bottom of the page or in an acknowledgements section. When this information is not found on the website, it can often be obtained from a paper(s) about the resource. Papers that describe the resource can often be obtained by searching PubMed for the name of the resource. Verify the paper is describing the resource not just mentioning it. The information can be found in the "Acknowledgements" or "Funding" section. Contracts should be listed in this field too, just add the word “Contract” beforehand, e.g., Contract HHSN27120080035C

Separate multiple grant numbers, or contract numbers by a comma. 

Information, as far as possible, must be machine readable and human readable. Therefore, do not just copy terms, but curate them so that they are machine readable and human readable. "Contract #s N01-HD02-3343, N01-MH9-0002, N01-NS-9- 2314, -2315, -2316, -2317, -2319, -2320" was entered into the "supported by" field. Note, that if I were looking for N01-NS-9-2316, I would get zero results. A human knows what that list means but to get the computer to know requires additional programming. Curate by supplying the complete grant number. 

Resource Type(s)

Resource types have been created to provide a standardized method of classifying resources. Resources should be labeled with the main thing(s) that the resource offers using terms within the Resource Type Hierarchy. 

What is the primary product offered at this web address? Software tool? A data set? A service? That is, what would a user expect to take away from this site? Avoid assigning resource types for very minor functions. For example, if a site offering a database on nucelolar proteins has a discussion tab where they advertise a position for hire, do not characterize this resource as a job resource. A user should always understand why he or she was taken to a site, i.e., they shouldn't have to dig for information-it should be obvious. You may use the keywords to add additional resource descriptors if you think they are highly relevant. 

The goal of resource type categorization is separate from keywords or other properties in that the resource type should inform the user as to the central purpose of the resource and not the particulars. A resource type is the “product” that is offered. For example, MGI is a database of mouse genes and is labeled as a "Database;" the Mutant Mouse Regional Resource Center accepts and distributes mutant mice and is labeled as an "organism repository;” and the Michael J. Fox Foundation for Parkinson's Research funds grants so it is labeled a "Funding resource."

To do this categorization, we created an internal consensus resource descriptor list based on the interactions with the library community and the BRO resource ontology, as well as several NIF partners. 

The NIF Resource module was created within the NIFSTD ontology as a separate module. This module fixes the set of high level categories, adding classes like "Service resource", and also attempts to harmonize with the Biomedical Resource Ontology (BRO), NITRC resource types and OBI classes. 

 All of the individual resource types currently fall into at least 1 of the 8 major categories below, and the user may search by these categories,

A complete up-to-date listing of the Resource Type Hierarchy and their definitions is available through NeuroLex. An alternate view can be found through Bioportal's NIF Resource Type Hierarchy View. 

These resource descriptors are meant to narrow search results by the type of thing that a neuroscientist is looking for. We believe that they are useful as general categories because they are in common English and tend to be understood by Neuroscientists quickly. The question that is to be answered is "what is the end user looking for?" For example, if the user is looking for a transgenic mouse, they should not be bombarded with software tools that hit the same keywords or data sources that talk about the mouse.h3.

The resource should be tagged with all applicable resource types, but not resource types that pertain to sub-resources if they will become separate NIF resources. In general, if you have to assign too many labels, you are probably better off creating separate pages for some of the tools, rather than trying to characterize everything a particular resource has to offer in total.   In general, the trend in NIF has been to use less granular resource types to simplify choices by the user. Thus, we now favor just “database” over “web-accessible database”.  Any additional characteristics can be covered by the keywords.

Assigning resource type or types can be challenging, as many websites offer multiple products and an individual product can serve multiple roles. All resources are curated by the NIF curator, so do not be concerned if you have difficulty. 

 Many times, portal sites have a lot of valuable resources that aren't apparent from the home page.  In this case, NIF has to decide whether to create a separate entry for the resource or tag the general resource with a bunch of tags.  One of our guiding principles is that the user should know why they are taken to a site.  For example, an organization claims that it has a training program for Ph.D. students, but it takes the NIF curator significant time to find out where on the site this information is listed.  That page may not be particularly useful without going through the home page of the organization.  In this case, NIF would tag the organization home page with "Graduate program", but would include in the description that such a program is offered and how users can find out about it within the resource.  In contrast, a model organism database may have an ontology that is available through their home page, but is difficult to find.  As the ontology page can be considered a self-contained resource, that is, you don't need to read the home page to understand it, NIF would list the ontology as a separate resource.

E.g., just because a resource has images, it does not mean you should tag it with image. You would only tag it with image if that was one of the main things the resource offers. Image can always be added as a keyword for these types of scenarios.  

E.g., Model organism databases such as Xenbase should only be tagged as a Database and Repository. You can add other resource types such as Organism-related portal, Data analysis service and Organism supplier as keywords.

E.g., Databases that offer data analysis services such as BLAST should only be marked as a database - not data analysis service. This can be added as a keyword.

Multiple entries are to be separated by a comma. Anytime you get more than 3 resource types begin thinking about breaking the resource up into more resources. 

Keywords and resource types will be treated in a special way by the NIF search systems allowing them to be ranked higher than other search results. 

Abbreviation

This is a required field. The abbreviated name will be used to identify your resource in publications and it is currently unique. Please make sure that a page with the name does not currently exist. 

URL  

Add the URL of the resource including the http:// portion of it; i.e., adding a URL such as www.neuinfo.org will not link.  You must provide the full URL, http://www.neuinfo.org/.

Some websites, have multiple URLs point to the same resource. Keep track of these URLs at the bottom of the page below Notes. Separate multiple alternate URLs with a break.

Notes

This page uses this default form:Resource

 Alt. URL: http://service.3dbar.org <BR /> 

Alt. URL: http://www.3dbar.org:8080/<BR /> 

 Old URLs should be dealt with in the same way, when a new one can be found. 

Notes 

This page uses this default form:Resource 

 Old URL: http://service.3dbar.org <BR /> 

   Old URL: http://www.3dbar.org:8080/<BR /> 

These alternate or old URL’s may often be found in the abstract section of the resource’s associated paper. 

Refer to NIF’s Link checking policy below when a new URL cannot be found for the resource. 

Occasionally, registries, etc., also host a resource with a dedicated URL for the resource. In these cases, add the URL of their website as an alternate URL at the bottom of the page below the Notes section, but include the name of this parent resource.

Notes

This page uses this default form:Resource

 Alt. NITRC URL: http://www.nitrc.org/projects/bar3d   

 In these cases, also add the name of the resource in the “Related To” field, e.g., Resource:NITRC 

 When dealing with mirrors, add "Alt. URL (Mirror): http:// " to the bottom of the page below Notes. Separate multiple alternate URLs using a break <BR>.

ID

The NIF ID will be automatically generated once you Save your entry. Please use this ID when referencing your resource.

PMID

PubMed ID's from papers about the resource should be added to this field. Papers that reference the resource or only mention the resource in their paper should not be added here. We have an automated service (beta) that collects these mentions that is available through the “Mentioned in Literature” column.

The PubMed ID field can be obtained from the website, when available, or by searching PubMed for the resource. This later method doesn't always work, especially with resources with common names. If the name is a phrase (multiple words), putting the name in quotes often helps and/or adding [Title] next to the name of the resource. This will of course only find resources with the name in the title.

Multiple id's may be separated by a comma but only the first entry will be linked to PubMed.

Publication Link

Occasionally there is a PDF or other publication that is about the resource. You may also link directly to the paper, even if it is in PubMed. This field will only accept one URL. Please include the full URL that includes the http:// (or the like) part for the link to work. 

Related condition/disease

If the data resource concerns a disease, set of diseases, or condition, make sure that they are stated

(e.g., Parkinson’s disease, neurodegenerative disorder, Batten’s disease, Aging, Normal control, etc.)

Additional entries are to be separated by a comma. 

Related application

This field was created primarily for the biospecimen resources to state if the bioresources were to be used for research, transplantation, therapy, education etc. 

 Multiple entries are to be separated by a comma. 

Processing

This field is mainly used for biobanks and what sort of processing the biospecimens have been put through, e.g., Frozen, paraffin, slide, cryopreserved, stained, Fresh, etc.

Separate multiple entries by a comma. 

Availability

State the availability of the resource/licensing information. E.g., if the resource is a biobank, can anyone request biomaterials? Is it public, open source, BSL license, freely available but must cite, freely available to non-commercial, what is the license for the software, etc. If available, this information can be obtained either from the website or the related article. 

 When the information is available, the field should cover access of the resource: can you add to the resource, can you take from the resource / what are the terms, and is the resource still available. If the resource is no longer available, please add the following to this field as well as to the top of the description: THIS RESOURCE IS NO LONGER IN SERVICE, documented on ‘full month’ ‘day’, full year. (e.g., THIS RESOURCE IS NO LONGER IN SERVICE, documented on September 05, 2013.). If the user can add to the resource, add "The community can contribute to this resource" in the field. 

 Separate each standardized entity with a comma (delimiter for the wiki), Open unspecified license, Acknowledgement requested, THIS RESOURCE IS NO LONGER IN SERVICE, documented on September 05, 2013., The community can contribute to this resource. For a full list see Availability values in the NeuroLex.

Organism

Add organisms represented in the resource, E.g., if the resource is a database of mouse gene expression, add Mouse to this field. Not all resources will have an associated organism; e.g., like many software resources. Some resources are not forthcoming with this information. For instance, if the database is a clinical trial database, make sure that it is labeled human. Many resources mention the organism(s) in the description, but as some do not, it becomes very important to capture this information.

Multiple organisms may be added just separate by a comma.

The organism's age(s) should be classified using the NIF annotation standards for age classification but this should be added as a keyword., e.g., Late adult human, Embryonic mouse, as it includes more information than just the organism (the age).

Related To

Resources may be related to one another. You can do that here by adding all the related resources, just separate by a comma. Resources will require the prefix "Resource:" prior to the name.

Address

The address can be added for a resource - where it lives. As the Parent Organization field is very "top level" here is a place you can include the institution, department, lab, group, etc. As this information is not always readily available, sometimes it can be obtained from the associated paper(s), but it should not state the collaborating institutions, rather where the resource is maintained- refer to the PI. (Collaborators can be added to the description field).

Keywords

Keywords should be used to supply related terms that characterize the resource that may or may not be in the ontology. Keywords should be "meaty", that is, they should convey specific information content and be likely search terms, e.g., proteomics, is a good keyword, as people are likely to search for that and we do not list it as a resource type. 

 Avoid generic keywords like "experiment", "species", "science", "discipline". Use the keywords for ancillary functions of a resource, e.g., ModelDB imports and exports models for simulation. It isn't a "simulation resource", as it is not used as a platform for simulation. It is, however, related to simulation and that would be an acceptable keyword. 

 Keywords should use the singular form for the word and, if applicable, cover things such as:

  1.  Adult organism 
  2.  Adolescent organism 
  3.  Juvenile organism  
  4. Newborn organism 
  5.  Infant organism
  6.  Embryonic organism 

Information, as far as possible, must be machine readable and human readable. Therefore, do not just copy terms, but curate them so that they are machine readable and human readable. E.g., I came across this example the other day where someone had copied a list of anatomical regions into the keywords from the website, including the abbreviations. Longitudinal fasciculus (LF), medulla oblongata (MO), etc.  All were red links, because the Neurolex did not recognize the string "Longitudinal fasciculus (LF)", whereas when I removed the abbreviation, it became a blue link. Also, abbreviations like LF cause more harm than good, as they are very general and non-specific.

Requested Resource Form Fields (Advanced tab) 

Example image:

You may add your resource's icon to the "Example image" field 

Comment 

For Biospecimen resources, please add the available Sample types to this field in the following format:

Sample type: Blood, DNA, Urine, Cell, etc. Please add this information as keywords too. 

Additional info: Twitter and Map

The Semantic MediaWiki can incorporate additional info such as Twitter and map information. If desired, insert the following code below the Notes section and adding the twitter handle where the space is: 

 ==News==

| Twitter Handle: @[[Twitter::________]]

{{#Widget:Twitter

|user={{#show: {{FULLPAGENAME}} | ?Twitter}}

|count=5

|shell.background=#013C41

|tweets.background=#F6F6F6

|tweets.color=#605C4F

|tweets.links=#013C41

}} 

 e.g., 

 ==News==

| Twitter Handle: @[[Twitter::neuinfo]]

{{#Widget:Twitter

|user={{#show: {{FULLPAGENAME}} | ?Twitter}}

|count=5

|shell.background=#013C41

|tweets.background=#F6F6F6

|tweets.color=#605C4F

|tweets.links=#013C41

}} 

Scope of Curation for NIF Registry Resources.

The goal of curation is to establish a set of identifiers that will help the end user find relevant resources, but not overwhelm the user. 

 Here is a concrete example, a user was looking in the NIF Registry for any resources that were annotated with the term "Locus Ceruleus" and the CNS Forum was returned. There is no mention of "Locus Ceruleus" on any page within the CNS Forum, but one of its subcomponents is called brain explorer. This feature contains a set of images of brain regions that were pulled by curators and annotated. Thus, the main site CNS Forum was returned for the "Locus Ceruleus" query. This addition of annotation is not helpful, but confusing in this case because to find any mention of the "Locus Ceruleus" the user would need to navigate down four link levels from the main page to a list of brain structures. Most users would not do this and simply believe that the result was an error. Therefore, the annotation should be narrow enough that it captures the main features of the site, but not information that is too deep within the site to easily find. For resources with deep structured content, consider exposing them through the NIF data federation.

 Another case that is difficult to assess is the case of protein or gene databases. It is often possible to obtain a so called "data dump" of the individual records from a database and one possible curation method is to take the data dump, strip the tags and place a cleaned list of terms in the registry file. Thus, a database registered in this way would always return if the end user queried for any of the proteins or genes within the data dump. Several problems arise with this strategy, including updating of information and also preferential treatment in search of databases that can dump data. 

To address the first issue of updating the information, a single data dump will create a snapshot of the data as it was when the data dump occurred. This may be a good idea for relatively static web entities, such as an atlas from an individual experiment. Data will not be added to the atlas, but it is a good reference resource. However, most scientific databases are not static entities, for example GENSAT is updated daily at 6AM EST. Therefore to stay current with new developments any data dump would need to be done with a frequency of the newly available data. The ability to accomplish this task manually on a daily frequency is not a reasonable expectation of a human curator, rather it is more amenable to an automated program. So any web resource that has a significant and changing component should be annotated generally and added as a possible level 2/3 resource candidate. 

The second problem that arises with a data dump model is the preferential finding of databases that allow their contents to be easily dumped. The contents of the PubChem or UniProt databases may be too vast for a human to easily parse them, so these sites tend to be left out of the data dump class, but they are more likely to contain any protein data than easily parsed databases like KARG (with only thousands of entries). Again this creates a problem in searching for data, because while PubChem is certain to have relevant data to the query, a smaller database will come up preferentially because its data has been dumped and parsed. 

 Thus, the scope of annotation should be relatively superficial for level 1 resources, and also should be consistent in scope. 

 

Incorporating Outside Registries and Accommodating Their Tags

The NIF Registry incorporates outside registries and, after adding and tagging every resource within the NIF Registry, we can expose a separate view of these through the NIF and NeuroLex. The "Related To" field of each resource within the parent registry should include the tag of the parent registry resource. E.g., for the outside registry Gene Ontology Tools, the curator would add the resource Gene Ontology Tools and then add every resource within the Gene Ontology Tools Registry, adding "Resource:Gene Ontology Tools" (no quotes) to the 'Related To' field of each one. The classification tags of the originating resource should be included in the Keywords of each participating resource. With these tags in place, we can create tables within the NeuroLex and pull this resource out in the NIF as a separate resource. View the Gene Ontology Tools database in the NIF. View tables in NeuroLex. (To view the code to create these tables, go to the 'More' tab and select 'Edit Source'. Modify the code to as required to accommodate incoming new resources.)

We are currently in the process of adding another property to the NeuroLex to better accommodate the classification tags of the incoming Registries. With this new property we can also expose this classification through the NIF interface.

Additional NIF Staff Responsibilities 

NIF Link checking policy

NIF's website links are checked regularly and invalid links are sent to curators weekly. A set of scripts, accessible here constitutes a pipeline that checks links weekly producing an invalid link file. Curators have access to this file, which includes the number of weeks that the resource has been pinged and found to be invalid. The curators manually check the links and attempt to determine if the resource is invalid, in which case the description text is updated to say "THIS RESOURCE IS NO LONGER IN SERVICE" or if a suitable replacement link can be found then the URL is updated. 

Invalid URLs:

Testing

If the resource's URL is no longer functioning, the resource should be tested 3 times over the course of 3 weeks to ensure that the site's server is just not down. 

 Searching 

Resources that are no longer present at the existing address are manually searched for and updated when the curator can find the web resource that is indeed the same one that has been cataloged. Frequently these resource’s entire record will need updating. 

 Before tagging something non-operational, THIS RESOURCE IS NO LONGER IN SERVICE, documented on Month (June), day (09), year (2013), 1st try to find a valid URL for that resource by doing a Google (or other search engine) search for that resource. (Sometimes searching for content in the description is helpful.) If this does not produce a valid URL, curators can look at the invalid URL and frequently determine the general location of the resource, e.g., which department, unit, etc., it comes from, and search it directly for the resource. 

 Several scenarios can arise from doing this, but frequently the department, etc., changing the structure of their URL is to blame. The chunk of the old URL that represents the department will sometimes morph into the new department’s URL. This new URL can be used to search for the resource within that department; sometimes searching that portion of the URL will tell you where the new location has moved to; and sometimes you will get an error message that the XYZ department is no longer available here. At that point you can search (Google, etc) for the department and then search for the resource. Sometimes, the department itself changes, is renamed or included within another department or center. These are all scenarios curators can take advantage of to search for the resource. Occasionally the name of the resource will change a bit, depending on the type of resource; e.g., the XYZ Tissue bank is now the XYZ Tissue and Cell Bank. The name of the resource or parts of the name of the resource can then be searched to see if you can find the resource or its equivalent. If the equivalent resource has been re-named, at the very minimum, a synonym should be added; but preferably, a new page created with the new valid name. Both the resource's content and nif id will need to be redirected to the new page and the old name added as a synonym. The content of the record will also likely need updating. 

Found a valid URL

If a new URL is found, place the old URL at the bottom of the page under Notes in the following format: 

 Notes

This page uses this default form:Resource  

 Old URL:  http://dx.doi.org/10.3886/ICPSR09681.v5 

Did not find a valid URL

If you have exhausted all of the methods above, place the following label at the very top of the resource's description:

THIS RESOURCE IS NO LONGER IN SERVICE, documented on September 05, 2013. Also add this information to the Availability field. 

Duplicate Resources

If a resource is found to be a duplicate of another resource, it is crucial to keep the oldest record of any resource (lowest number NIF-ID of resource) because the resource number forms the unique id for that resource and is used by developers. Any pertinent information to make a description, or other field, more complete is transferred over from the one about to be redirected. It is important to not delete but redirect both the resource and the id to the new resource. 

If the URL differs, the alternate URL, or old URL, is recorded at the bottom of the page

Notes 

This page uses this default form:Resource  

 Old URL: http://dx.doi.org/10.3886/ICPSR09681.v5 <BR />

Alt. URL: http://dx.doi.org/10.3886/ICPSR09681.v5 

Redirecting pages in NeuroLex 

In NeuroLex, category pages, resource pages and id pages often need to be, or would benefit from being redirected. A duplicate resource should be redirected, along with its NIF ID to the oldest resource. Creating a redirect page for commonly abbreviated categories such as NIH is also beneficial. Below are steps on how to do this. 

Redirect to an existing page

To redirect to an existing page, choose “Edit source” from the “More” tab of the resource you want to redirect. Transfer any existing information to the main resource that you want to keep. Redirect the NIF ID before you delete it (See below). Then replace the existing contents with #REDIRECT[[:Category:Resource:XXXXX]] where XXXXX is the resource page you want to redirect it to. Save the page.
 

If you would like to redirect it to a non resource page, use the following format without “Resource:”.
#REDIRECT[[:Category:XXXXX]] 

Curate the redirected page or the system will not recognize it.

Redirect a NIF ID to another page

Search for the NIF ID. (Make sure there are no spaces in front of it.) A search for nlx_151325 will show,  "You searched for nlx_151325 (all pages starting with "nlx_151325" | all pages that link to "nlx_151325""
above the search field. Click on the id. It will take you to the page it belongs to. At the top of the page you will see (Redirected from Nlx 151325). Click on the id. You should see the redirect page of its corresponding resource.  To redirect this to another page, click on “Edit source” from the “More” tab and update the name of the resource, e.g., #REDIRECT[[:Category:Resource:BBBBB DDDDD]] ---> #REDIRECT[[:Category:Resource:XXXXX]] Then save.
 
Alternatively, go to the NIF ID’s URL, e.g.,  http://neurolex.org/wiki/Nlx_151325 (always in the same format / replace your id with the ID shown). At the top of the page you will see (Redirected from Nlx 151325). Click on the id. You should see the redirect page of its corresponding resource.  To redirect this to another page, click on “Edit source” from the “More” tab and update the name of the resource, e.g., #REDIRECT[[:Category:Resource:BBBBB DDDDD]] ---> #REDIRECT[[:Category:Resource:XXXXX]] Then save.
 
To redirect to a non resource page the format will be as follows without the “Resource:”
#REDIRECT[[:Category:XXXXX]] 

Redirect to a new page / Changing a resource's name

You may either create a new category / resource from the main page, or, if the contents of the page you want to redirect is the correct contents, e.g., you just want to give it a different name, create the new page using the URL:
http://neurolex.org/w/index.php?title=Category:Resource:XXXXX XXX, XXXXX XXX representing the name of your resource.
(http://neurolex.org/w/index.php?title=Category:XXXX XXX for non resource pages)

Copy the contents to the resource you want to redirect over to this new resource: From the More tab, hit "Edit Source", Select all the contents (Ctrl A), Copy it (Ctrl C), then paste it on the new page (Ctrl V): From the More tab of the new page, select "Edit Source" and paste in the contents. Save the new page with the pasted in contents of the old page. 

Redirect the id of the old page (the one you are redirecting) to the new page. (see above).

Then redirect the old page to the new page:

Click on “Edit source” from the “More” tab and add the redirect information. 
#REDIRECT[[:Category:Resource:XXXXX]] Then save.
 
To redirect to a non resource page the format will be as follows without the “Resource:”
#REDIRECT[[:Category:XXXXX]] 

Curate the redirected page or the system will not recognize it.

Maintaining Discontinued Resources

Other scenarios may arise such as resources that no longer provide the service, data, software, etc. that they once did, including the reason they are in the NIF Registry. Curators typically see discontinued resources for software where certain software resources become obsolete or replaced by another software. If such a resource was upgraded, curators need to update the resource accordingly.

It is important for curators to NOT DELETE these resources from the registry as it is useful keep them. Similar action needs to be taken for all discontinued resources, not just the software kinds. Other discontinued resources can include:

Tracking of NIF Logo

If the NIF logo is displayed on resource pages add it to the NIF Stats Google doc., under the NIF Referring Sites tab. 

Contact Resource Providers  

It is the policy of NIF to contact a random subset of resource providers to accomplish several goals:

An outreach letter template has been crafted and lives in the NIF Project Private Wiki under NIF Curation and Outreach. Curators should use this template to establish contact with the resource owners. The date contacted, the date updated, date approved, resource name, NIF ID, resource owner(s) contacted, and pertinent notes, are recorded in the NIF Stats Google doc., under the Registry Contacts tabs. Notes should include information such as if the resource owner had you do the updates, etc. All correspondence should be followed up and nif-curators@neuinfo.org Cc’ed. 

 The outreach letter template should be updated as necessary. 

 Email addresses should be periodically passed on to be added to the Registry resource owner’s group mailing list.

NIF Level 2 (DISCO/BiositeMaps) Integration Policy

NIF incorporates all resources that have registered themselves to Biositemaps and DISCO, which fall within the general domain of neuroscience, are aligned with NIF's curation policies, and contain sufficient information to be found. 

 The minimum set of information includes accurate: Name, URL, contact information and a short description. 

 NIF will curate the minimal information provided by the Biositemaps or DISCO files and include additional descriptive information, including keywords, resource type categories, useful abbreviations and institutional information.  NIF will also contact the resource provider when the information is included in the NIF Registry to allow the resource provider to add to the description, keywords, or other pertinent fields. 

 NIF will crawl the automated information at regular intervals to test for changes in the DISCO or Biositemaps files, and curators will be alerted when changes occur so they can confirm and update the public records.  However, no automated information will be released without prior approval by curation staff. 

 NIF maintains the authority to remove any resource that is not deemed to be appropriate for NIF, no longer functional or no longer aligned with NIF's curation policies. 

NIF Data Federation Resource (Deeply integrated resources)

Using the DISCO protocol, the NIF Data Federation provides the ability to drill down into individually hosted databases and data sets and return relevant content.  This type of content, part of the so called “hidden Web,” is typically not indexed by existing web search engines. 

 In order for NIF to directly query these independently maintained databases and datasets, database providers must register their database or dataset with the NIF Data Federation and specify permissions. Several interoperability capabilities are offered.

NIF integrated virtual databases integrate related data from multiple databases and combines them into one view for easier browsing. The following integrated databases are available for similar data types: 

NIF simultaneously queries all the federated databases and datasets through its search interface. The results are displayed under the Data tab and are categorized by data type and nervous system level. In this way, users can easily step through the content of multiple resources, all from the same interface. 

Each federated resource individually displays their query results with links back to the relevant datasets within the host resource. This allows users to take advantage of additional views on the data and tools that are available through the host database. The NIF site provides tutorials for each resource, indicated by the "Professor Icon" showing users how to navigate the results page once directed there through the NIF. Additionally, query results may be exported as an Excel document. 

NIF's full listing of federated data can be found here, http://neurolex.org/wiki/Category:Resource:NIF_Data_Federation or here, https://neuinfo.org/mynif/databaseList.php

Note: NIF is not responsible for the availability or content of these external sites, nor does NIF endorse, warrant or guarantee the products, services or information described or offered at these external sites. 

Registering a database

If you are a data provider and would like to make your database or dataset available through the NIF, we are happy to work with you.  The NIF data federation tools are designed to be easy to use and require minimal effort from the data provider.  NIF can work with many types of data resources, regardless of underlying technology.  If you have a data set that you would like to see registered, please email curation@neuinfo.org.

This DISCO integration capability utilizes a data integration framework to knit independently maintained databases or datasets into a virtual data federation through registration of schema information and database views with the NIF mediator. A concept mapping tool is available to map tables, fields and values to the NIFSTD ontology. Resource providers do not need to change their resource in any way and may control the content that is exposed to the NIF database mediator.

Registration with the NIF mediator will require technical knowledge of the database capabilities and network settings of the resource using a specialized tool available through the NIF. The mapping of content to the NIF vocabularies is performed by a domain expert in consultation with a database administrator and is accomplished using the NIF Concept Mapping tool.

Advantages of exposing your data through the Data Federation include:

  1. Mapping to the NIF vocabularies provides the means to provide a standardized terminology and also to search through the relationships contained in the NIF ontologies.

  2. Data within a source database can be combined with that from other databases by defining an integrated view across databases.

  3. Through aggregation of many resources, creating a large, dynamic virtual source increases the exposure of the content of individual database resources.

  4. LinkOut  your database content to Entrez Databases (PubMed, etc.)

  5. Users are able to query across distributed databases as if they were a single database.

View additional Benefits of Data Federation  

To expose your data in the Data Federation, begin by registering your resource and creating a sitemap; then set up a consultation with NIF by contacting the NIF interoperability team.

We are actively looking for Data Federation partners to work with to continue to develop the Data Federation tools. For more information, contact Anita Bandrowski at curation@neuinfo.org. 

Eligibility for Federation resources:

1) Availability: The resource has already registered with NIF Registry.

2) Database / dataset format:  The resource should contain database or datasets that can be exported as tables.

3) Public:  The data must open to public and accessed by the community without extra requirements.

4) Voluntary: The resource owners agree to make their data public through the NIF Data Federation.

NIF Data Federation Registration Workflow

To register your resource at a deeper level (NIF Data Federation):

NIF Interoperability Best Practices 

NIF is working to establish a set of best practices for resource providers to enhance the ability of a resource to interoperate with the NIF and with each other. We are using our experiences with the Level 2-3 integration tools and literature indexing to highlight known issues. (Useful background reading.)

1) Stable identifiers 

We have noted that some databases and vocabularies use identifiers that get regenerated every time the resource is updated. This practice makes it very difficult for NIF to maintain appropriate indices and links. We recommend that identifiers be stable; if they are to be removed, they should be made obsolete rather than deleted.

2) Using common terminologies 

Using a shared terminology solves so many problems, particularly if we follow the OBO recommended practice of re-using existing terminologies (and their identifiers), rather than creating new ones where we have to maintain mappings all over the place.

3) Providing clear and consistent machine- and human-understandable definitions of concepts 

For example, if a resource groups data according to cortex, I should know the definition of cortex and a machine should be able to use that definition in a call

4) Keeping track of versions in a consistent and clear manner 

Versioning: know that the issue of how to handle versions comes up all the time in the ontology world. I think that everyone recommends that we have one URI for the current version that always points to the latest release, but that earlier versions exist at a URI which lists the version number in it, so that if someone requires a particular version, they can get to it.

5) Data Integration Best Practices 

NIF currently has 2 methods methods of data integration: The first method of integration, known as Level 2.5, involves using a series of mechanisms that allow connecting the resource with the NIF Mediator. The other method of integration, known as Level 3,  does not require these mechanisms since the database connects directly to the NIF Mediator. In fact some relational databases could be shared this way even without Web presence associated.

A. Level 2.5 Best Practices

Essential elements:

  1. Query interface: A current Web interface that allows querying of the database. (Best design practices for these interfaces will be elaborated below)

  2. Interface Definition Language: Protocol specification (e.g.: disco.nif.interop.3.1) to expose the database schema, and mappings of their elements to the interface described above. This file will provide the information necessary for automated Mediator registration.

  3. Metadata mappings: Protocol specification (e.g.: disco.nif.lexicon.3.1) that exposes List of local database terminologies (Lexicon) with mappings to standard terms (e.g.: NIFSTD). This file will provide the information necessary for TIS mappings.

Best design practices for Querying Web Interfaces for Level-2.5 

Transport

Design: It is NIF intention to reuse and enhance the Mediator technology to integrate structured data on the Web that is not accessible via standard RDBMS objects. For the data on the Web to be efficiently "relationalized" the following points must be met:

B) Level 3 Best Practices 

This is based on our current experience registering these types of resources.

• Allow domain level access rather than IP level access through the firewall to a level-3 Site, for example allow neuinfo.org rather than 198.202.95.10. 

• Allow domain level access rather than IP level access through the firewall to a level-3 Database, for example allow neuinfo.org rather than 198.202.95.10.

• Database access privileges should not be dropped during a database maintenance.

• Make the database drivers with the right version available along with the documentation and sample code.

• Provide a technical and non-technical point of contacts.

• Create a view on the tables you want to make available for access.

• Create a user interface view that you would like to show to the users when records from your database are returned in the search result.

• For general information results and data should be accessible using a static (i.e. non session based or stateless) URL.

• If you are developing new databases

6) Database design and Use  
7) Level-3 Documentation Meeting Notes 

In order to create documentation for level 3 resource registration the following questions need to be answered precisely.

What is the presentation of the data that is going to be understandable to a novice user?

  1. Define presentation view for level 3 resources
    1. Put in the registry the information the provider wants to expose
    2. Some exposed data objects should lead to products/focus/...other actionable information within the resource
  2. Relating individual data objects from each source to the category properly
    1. how to map conceptual information to schema information to facilitate conceptual queries *CCDB, gene network, sense lab cell properties database all give data on molecule in brain region
    2. Hop Skip & Jump (HSJ) query => how do I navigate from our result to other sources??
  3. Term mapping 
    1. It is imperative to let NIF know about the metadata tags within each database. For example, CCDB is a database of images and several columns fill out the metadata for each image including cell type and brain region. It is important that NIF knows that each image, which is stored in column 5 is a cell with a name that is stored in column 4 and brain region that is stored in column 3. This way NIF can give precise results, such as showing the image of a pyramidal cell if the query is "pyramidal cell", or show many cells if the query is "cell substantia nigra". 

Specific comments for BrainInfo - Expose the preferred label and where is it filed.

8) Intellectual property issues 
9) Guidelines for entity mapping and linking data.

NIF should facilitate interlinking of data, literature and tools wherever possible.  When any entity has a unique ID, NIF should provide the ID in its views.  The following practices should be implemented for all sources:

.1.    Adult organism

.2.    Adolescent organism

.3.    Juvenile organism

.4.    Newborn organism

.5.    Infant organism

.6.    Embryonic organism

.1.    Increased expression

.2.    Decreased expression

.3.    No change in expression

What is DISCO?

DISCO is an information integration approach designed to facilitate interoperation among Internet resources. DISCO was initially developed by Dr. Luis Marenco at Yale University for the Neuroscience Database Gateway and is currently being extended in the context of NIF. DISCO consists of a set of tools and services that allows resource providers who maintain information to share it with automated systems such as NIF. NIF is then able to “harvest” the information and keep those sets of information up-to-date. DISCO facilitates the automated maintenance of several distinct capabilities using a collection of files 1) that are maintained locally by the developers of participating neuroscience resources and 2) that are "harvested" on a regular basis by a central DISCO server. This approach allows central NIF capabilities to be updated as each resource's content changes over time. DISCO currently supports the following capabilities: 1) resource descriptions, aka sitemaps, 2) "LinkOut" to a resource's data items from NCBI Entrez resources such as PubMed, 3) Web-based interoperation with a resource, 4) sharing a resource's lexicon and ontology, 5) sharing a resource's database schema, and 6) participation by the resource in neuroscience-related RSS news dissemination.

Resource Description (Sitemap)  - 2 functions 

This DISCO capability allows you to create and manage your own sitemap. An XML-based script provides a wrapper around a website that allows NIF to search for key details about the web site and some information about dynamic content. An advantage of a sitemap is that the content is dynamically updated from the source file,  ensuring that all content is up to date.  NIF provides a user-friendly tool for generating the necessary XML files.

The benefits of a sitemap are that it will keep your NIF Registry description up-to-date and will inform search engines about your resource. Many formats can be ingested, such as the native "disco.rd", the NIF "disco.rd.nif.rdf" and the National Center for Biomedical Computing's Biositemap formats. 

If you do not want to manage your own sitemap, DISCO will manage it for you. Just click on the "Click here to generate a sitemap" link and you are done.

Generating a sitemap is also the second step of registering your resource at a deeper level to be included into the NIF Data Federation. Clicking on the "Click here to generate a sitemap" link (see image below) will generate an entry into the DISCO Dashboard. See "Create an Interop File to share your data through NIF's Data Federation" for the next step.

Create a DISCO Resource Description (Sitemap) File
  1.  Register Your Resource with the NIF Registry (if it is not already) (if it is not already)
  2. Generate DISCO Resource Description (Sitemap) file
    1. Once the resource is curated (within 7 days or email curation@neuinfo.org for faster service), search for your resource in NeuroLex, click on your resource's link, and create an account / login (upper right).  

    2. Below the main information (surrounded by a box), and just below the “Curated” tag, you will see the, "For Resource Owners" section. If you are indeed the resource owner, click on the "Click here to generate sitemap" link.

    3. Follow steps 1 - 4 to complete the process. If you add your Technical Contact information, step 2 will be generated on the following page.

 You may edit or append these files at any time from the DISCO Dashboard. Just click on your resource's name and then the [Edit] button. Alternatively, you may go through the process above again and simply replace your old files with the new ones.


Create an Interop File to share your data through NIF's Data Federation

The Data Federation provides the ability to drill down into individual databases and data sets and return relevant content. This type of content, part of the so-called "hidden web," is typically not indexed by existing web search engines. NIF will work with your resource in its current form using tools to integrate it as thoroughly as we can with little to no hassle on your part. The process is quite easy and you won't have to change anything to participate.

The types of resources that we are currently working with:

File format for Interoperation

DISCO describes the interoperation with a resource and supports any type of interface description language (IDL). For NIF, specific DISCO web interoperation and database schema formats have been defined and continue to be revised to accommodate new interoperation scenarios.

Web Interoperation: This DISCO capability provides information necessary to systematically extract portions of data in semi-structured web resources. 

 Database Schema: This capability provides information on your database's schema and identifies the specific fields in the database to be shared. 

 Click on the links below to view samples of files:

 Integration of your data with NIF's federated data is made possible by placing a copy of the information you provide in central NIF Mediator servers - or indexing your database or web service directly. Federated data can then be searched by NIF queries and presented to NIF users with links back to the originating site for additional information.  

What is a LinkOut file? 

The National Center for Biotechnology Information (NCBI) has implemented a capability called "LinkOut" that allows users of NCBI Entrez (who might for example be looking at an article in PubMed) to link to related information in resources external to the NCBI.

Entrez LinkOut is a DISCO capability that provides a mechanism to collect resource's data links related to Entrez objects and forward them to NCBI. NCBI users will find these links when using the Entrez LinkOut feature. 

 These LinkOut files provide links between PubMed, or other NCBI databases available or linking (e.g., Gene, Protein, Nucleotide, etc.), and your data when you register to the NIF Data Federation through the LinkOut Broker. To enable this feature, your data must include PubMed ids (e.g., 12345, not Bob et al, 2010).

Create a LinkOut file

There are three ways to generate linkout files:

The linkout files format

The DISCO LinkOut format is encoded in XML format, as described below.

The root node should contain the following information:

<disco format="disco.linkout" format-version="1.0">

... the first node contains brief site information (e.g.:)

<site-info site-name="CCDB" />

... the second node contains brief site information (e.g.:)

<technical_contact name="Willy WaiHo Wong" email="wawong@sdsc.edu" /

... the next node is the required data container node:

<linkout_list>

... and inside this node one or more child "<linkout>" nodes are used to describe one LinkOut item at the time. The content in bold in the XML sample below shows the type of content that can be provided from a resource.

<linkout

  db="PubMed"

  oid="15988042"

  linkcategory="Electron microscopy product"

  linkname="Development of a model for microphysiological simulations: ... electron tomography"

  linkurl="http://ccdb.ucsd.edu/sand/main?event=displayAllProjectProds&amp;mpid=48&amp;ptype=sproject"

  />

The contents of the attributes of the <oid> node above may contain the following information:

 In addition to this LinkOut format, your resource needs a main "disco.xml" to identify  its location. That file and its purpose is described in DISCO dashboard page (http://disco.med.yale.edu/webportal/discoDashboardShow.do). For more information or help regarding this format please contact the NIF Interoperability team. 

Implementing DISCO Web Interoperation with data that includes LinkOut information. For resources registered with DISCO Web Interoperation (NIF level 2.5 integration), NIF developers may be able to create data views to extract LinkOut data for that resource.

Linkcategory types

Per National Library Service request, the categories that NIF uses have been standardized to the following types:

Resource: Registry
Resource: Software
Reagent: Plasmid
Reagent: Antibodies
Data: Clinical Trials
Data: Gene Expression
Data: Taxonomy
Data: Images
Data: Animal Model
Data: Microarray
Data: Brain connectivity
Data: Volumetric observation
Data: Activation Foci
Data: Neuronal properties
Data: Neuronal reconstruction
Data: Chemosensory receptor
Data: Electrophysiology
Data: Computational model

Please modify your linkcategory into one of following types based on your data characters. If there is none suitable for your database, please inform us and we can discuss a possible solution together.

Advertise your terminology or ontological information.

This capability facilitates semantic data integration with the resource. A list of terms used by the resource (with mappings to standardized terms) is defined for use by NIF central servers. Without this functionality term mappings have to be made manually by a resource curator or knowledge integrator. 

DISCO terminology allows free use of any format (XML, RDF, OWL, etc) to provide mapping between terms and meta/data elements in a resource.

Check the DISCO dashboard Terminology summary page for current resource examples of these formats.

Please contact the NIF interoperability team for more information.

Share your resource's news with the NIF community.

Coordinates reporting of important changes in the resource to interested users through RSS feeds. This can also be done through the RSS wiki page.

Newsfeeds are traditionally represented in RSS and atom formats. DISCO provides a mechanism to refer to news in a resource using the main DISCO file. View the SenseLab news.

Also, see the DISCO dashboard News summary page for current examples of these formats.

Please contact the NIF interoperability team for more information.

DISCO Dashboard

The Dashboard is the place for resource owners interoperating with NIF to manage their resource. This includes editing or appending files and setting the crawl frequency.  Additionally, the dashboard provides general information about the resource such as the status of the resource (parsed), where the DISCO files are stored (locally at NIF or remotely at site), and which services each resource is participating in. Clicking on a particular service, e.g., Resource Description (AKA Sitemap), retrieves a summary of participating resources. 

Managing resources in DISCO 

Data updating

The updating frequency is based on how often the data come in and the original resource get updated. There are three updating frequencies currently, which include weekly, monthly and three monthly.

Concept mapping tool

The Concept Mapping tool is used by NIF curators to manage Data Federation resources, including setting up the table and view, as well as the column definitions and mapping. The tool also allows for the database column contents to be exported to Google Refine for the mapping of the concepts and provides back-end support services. 

 The concept mapping tool is accessible at http://grefine-dev.neuinfo.org:8080/cm/ 

Registering a new source and its view

Open the URL http://grefine-dev.neuinfo.org:8080/cm/, enter your user name and password.

 Click the “Add” button in the main menu of the tool, choose “New Relational Source” to add a new resource into the Data Federation. Choose “New Table or View” to add a new table or view into the NIF Data Federation. 

Defining a new source

If you are adding a new source, you will see the screen below. Enter the new source or table / view name (See naming rules below), NIF ID, connection information, description, etc. The source type will always be Relational DB.

The description of the resource should be relatively consistent with that of the NIF Registry; however, it may need to be shortened and more brief until the NIF Registry snippets can be implemented. For the most part, the length of description should be less than 2 lines and should link to the main resource.

The connection information include data location URL, user name and password. 

Creating an eligible source name
  1. View name should be consistent with the resource name/database name.
  2. No dash, comma, or other punctuations and special characters.
  3. View name should be less than 30 characters (including spaces), if it is too long, use the abbreviation instead. For example, a resource called “Avian Brain Circuitry Database” should be named “ABCD.”

 When finished editing, click “Save.”

Defining the view
Basic Information

Begin by entering the Basic Information of the new view, including the name, alias, source, schema name, description and category (See below). 

 The name here refers to the index name of the view, which should not be the same as the source name. The index name briefly describe the characteristics of the source. e.g., the view name for BMAS is “BrainRegions”, while the source name of BAMS is ‘BAMS’. Therefore, the whole display name of the resource we see in data federation will be “BAMS:BrainRegions”, which is the combination of both source name and index name. 

 The “alias” is always the same as the name, select “owner source” from the pull-down menu. 

 Schema name is the acronym of the source name in most cases. 

  Description will be a paragraph briefly stating the information of view, such as the purpose the the view, the content of the view etc, and this will be displayed on NIF website.

 Set the resource’s “Indexable” radio button to “Yes if this is a view.

Under Categories, click “Add New,” and fill in the child category and parent category for the particular view separately (See available categories below). A resource can be assigned to multiple categories - up to three. The category has to precisely match the content of the data. 

NIF Data Federation categories (Parent categories are bolded):

Type of Data

Animals

Annotation

Antibodies

Atlas

Biospecimen

Brain Activation Foci

Clinical Trials

Connectivity

Dataset

Disease

Drugs

Expression

Grants

Images

MRI

Microarray

Models

Multimedia

Pathways

People

Phenotype

Plasmids

Registries


System Level

Gross Anatomy

Cell

Gene

Molecule

Function  


When finished editing, click “Save.” 

Create the view definition

Click “View Definition” on the left menu, and then write sql to crawl the data from a specific resource.

SQL is a standardized query language for requesting information from a database. Here are some examples:

“  select  

       e_uid,

       e_id,

       trim(trailing '"' from e_name) as e_name,

       e_definition,

round(value_mean) as value_mean,

round(value_sd) as value_sd,

       n_id,

       n_name,

       nelx_id,

       num_articles

       from l2_nlx_151885_data_summary

“select

a.e_uid, (Note: e_uid should always be included in a view)

a.eudract_number as id,

a.sponsor_protocol_number,

a.full_title as title,

'' as official_title,

'Country: '|| a.country as recruitment,

a.medical_condition as conditions,

'' as intervention,

a.sponsor_name as sponsored_by,

a.gender,a.population_age as age_groups,

'' as phases,

'' as study_type,

'' as brief_summary,

case when b.level is null then ''

when  b.level like 'LLT' then 'MedDRA levels: Lowest Level Term'

when b.level like 'HLGT' then 'MedDRA levels: High Level Group Term'

when b.level like 'HLT' then 'MedDRA levels: High Level Term'

when b.level like 'PT' then 'MedDRA levels: Preferred Term'

else '' end  as detailed_description,

cast (a.start_date as varchar) as date,

'nlx_151313' as other_ids,

'https://www.clinicaltrialsregister.eu/ctr-search/search?query='|| a.eudract_number as url 

 from l2_nlx_151313_clinicaltrial_summary a, l2_nlx_151313_clinicaltrial_summary_disease b

where a.eudract_number like b.eudract_number” 

 You can always check the script by clicking “Check View & Update Columns” on the right hand side.

     

When finished editing, click “Save.” 

Table mappings 

The keywords identify terms or concepts that should be included in NeuroLex, and to keep track of keywords that are being used by curators. Click “Mapping” from the left menu to add/map keywords for the view.

Rules for making keywords

1. Mandatory keywords: A name of resource, abbreviation, data type, a resource ID and view ID as strings.

2. Optional keywords: the organization name.

3. Every concept in the table is about the particular term.  (See annotation Standards below.)

4. Add concept ID for the Ontology terms.

When finished editing, click “Save.”

Define the display

Click the “Display” option on the left menu to set the display of column headers and the order. Under “Display Information” the column display name and column name in the sql. are filled in “Name” and “Primary Column” respectively, and the value of the column is assigned under

Template”; e.g., Name: Gene, Primary column:gene_name, Template: ${gene_name}. (Note the naming conventions below.) 

 Under “Display Templates”, the order of the columns may be modified using the “First”, “Last”, “Up”, and “Down” buttons. (Note the column order conventions below.)

The left “Show Columns” function will help curator to check if the template works well and shows which columns  are related to the defined column.

When finished defining, click “Save.”  

 Column header naming convention  

1  Uniform header for similar content across resources/views, such as gene/gene symbol

2. Examples of uniformal headers are listed below:

 Database

Title

ID

Image

Description

Notes

Comments

Reference

Mentions 

3. No numbers, special characters or punctuations are allowed to be part of column headers! If you include one the view will not be generated.

4) Column headers must be unique for each column with one single review. 

Note about html 

 

If you're going to use characters in a text portion of your template

that are specific to HTML markup (in this case just '<' and '>')

 

please use the proper HTML escape sequences for them ("&lt;" and

 

"&gt;", respectively). The data doesn't need to change and any actual

 HTML markup in templates should be left as is. For instance use:

 

<em>${x} &lt; ${y}</em>

 

instead of:

 

<em>${x} < ${y}</em>


Column order

All sources should look and "act" the same to the extent possible.  Thus, each view through the NIF should adhere to a consistent set of guidelines for creating these views.  The order of columns should be uniform and all services and views should adhere to these standards. The order of the columns should generally follow the same format so that users have a consistent experience when they change from source to source.  Obviously, the exact number of columns will change depending on the source.

 Column number

The number of columns should be less than 10 in order to achieve the best display effects. 

 Table formatting 

Select the “Table/View Browser” tab along the bottom, and click each row to view the Table/view display details.

 To make modifications, click on “Update”.

 

 When finished editing, click “Save.” 

 Columns formatting 

From the "Column Browser" tab along the bottom, select from the available rows for the column details. Click “Update” to add/edit the Name, Alias, Data Type, Weight, Indexable, Facet, Key, and column mappings.


   
 

  1. Index: All columns, expect e_uid, should be marked indexable.
  2. Export: Both exposed and unexposed columns should be marked exportable, e_uid should not be exported.
  3. Weight: there are 5 levels of weight setting ranging from 0.5 to 4.0. The database name, the most important entity, is weighted as 4.0. The second important entity and various ids are weighted as 2.0. The description, notes or comments are weighted as 1.0. The URLs or least important content are weighted as 0.5.
  4. Facet: Facet data should be repeated many times, such as: disease, phenotype inheritance, gene name, etc.; however, URL and text are not able to be faceted. 
  5. Is key - In most cases, the key variable should always be ‘e_uid’. Therefore, the e_uid column should be set as ‘Yes” for “Is Key” value. 
  6. Column level mapping - NIF is providing both column and value mapping to enhance the semantic search and to pave the way for export of the NIF linked data graph.  The purpose of the column mapping is to set the ontological domain of the entities contained within.  Each of these domains generally corresponds to one of the NIFSTD modules:  organism, anatomical entity, cell, subcellular entity, molecule, function, disease, technique, resource.  We do not want to map the column at too granular level so as to avoid consistency problems with the contents.  At this point, we are also not mapping column roles.  So the fact that an organism serves as the subject of a study will not be reflected in the mapping:  any column containing an organism should be mapped to organism.  Similarly, even if a column contains brain parts, we will map to anatomical entity, to ensure that if the source later adds parts of the spinal cord or parts of the peripheral nervous system, that we will not be in conflict.  This policy may be revisited as the ontologies evolve.

The current column concepts include:

Anatomy

Antibody

Catalog Number

Cell

Coordinate

Disease

Environment

Full Text

Function

Gene

Gene-target Reagent

Genomic Locus

Genomic locus variant

Genotype

Identifier

Interaction

Interaction Type

Molecular Domain

Molecule

Organism

Pathway

Phenotype

Protocol

Publication

Quality

Resource

Sequence

Specimen

Stage

Strain

Sub-cellular Anatomy

Mapping rules

Overall rules:

  1. One column can be mapped to multiple terms (e.g. publication and identifier).
  2. One column cannot be mapped to two of the same term (e.g. organism and organism)

Specific rules for individual mapping terms:

  1. Description, notes and comments should be mapped to ‘Full Text’.
  2. Protein and Chemical substance should be mapped to ‘Molecule’.
  3. Gene allele should be mapped to ‘Genomic locus variant’.
  4. Ids such as e_uid, gene id, NIF ids, PubMed ids and accession numbers should be mapped to ‘Identifiers’.
  5. Reference and  PubMed ids should also be mapped to ‘Publication’.
  6. Gene symbol and gene name should be be mapped to ‘Gene’
  7. Brain structure should be mapped to ‘Anatomy’
  8. Organism and spices should be mapped to ‘Organism’

When finished mapping, click “Save.” 

Entity mapping using Google Refine

The Google Refine tool is used to create concept mappings of individual columns, and it is accessible at http://code.google.com/p/google-refine/

Common Issues are listed below, if in need of more info. go to  https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users for a full User and Developer documentation made by Refine.  (Warning: In Refine’s documentation not everything is listed such as all the components needed to add a new Web-Server and make it work.)

In what format can Google Refine export and how can we import that data back into Google Refine if we need to update it?

The formats currently supported (in version 2.0) include:

  1. TSV, CSV, or values separated by a custom separator you specify

  2. Excel (.xls, xlsx)

  3. XML, RDF as XML

  4. JSON

  5. Google Spreadsheets

SIZE of Data:

Refine has been tested mainly with data ranging around 10 to 50 MB and all the process work fast and well, however, Refine was tested with data ranging from 100MB to even 321MB (which is the max. tested so far). Problems that erupt are the speed and process of making simple commands like Splitting Columns, Coping Columns, Reconciliation and etc...  

If in need of using bigger files than 321MB:

321MB is the approximate max. of data that can be used. A data file of 500MB was tested but an error appeared “"java.lang.OutOfMemoryError: Java heap space”. This can be fixed (not tested so far, but Refine claims it’s possible) Refine states that:

 “There is no hardcoded limit in how much data Refine can load... however, the

        underlying Java virtual machine (JVM) that runs refine starts with a fixed

        memory size limit. When the JVM runs out of memory (and yes, Refine eats

        *LOTS* of memory), weird things can happen and we're not exactly awesome in

        reporting errors from the server side of refine to the client running in

        your browser so what the client side might perceive as the load being

        finished, it's really the server side hitting the memory limit and giving

        up.”

        Add more memory to the JVM: (follow these instructions here):  

         http://code.google.com/p/google-refine/wiki/FaqAllocateMoreMemory

For keeping all data after the reconciliation process: Clone the column that is going to be reconciled in the Excel file and then do the reconciliation process for one of the columns. This is not-so important for image but is needed for further references of knowing what the data was and what it became.

Adding a Web-Server: 

By connecting your data with other databases, you get more value out of your data (Reconciliation)

The SERVER

<whatever this is>({"schemaSpace":"http://ontology.neuinfo.org/NIF.id","name":"NIFSTD Reconciliation Service","identifierSpace":"http://ontology.neuinfo.org/NIF","view":{"url":"http://neurolex.org/wiki/{{id}}"}})

If having a problem:

Post on the Issue site and one of the developers will get back to you as soon as possible.

http://code.google.com/p/google-refine/issues/list

Extra info:

“If you have data in a very peculiar text format, just import it without splitting lines into columns, and then once it's imported, do your own custom column splitting.” 

 “Once imported, the data is stored in Google Refine's own format, and your original data file is left undisturbed.”

“You can also point Google Refine at a URL to a data file or a Google Spreadsheet. The mime-type of that URL tells Google Refine which format the data is in. Currently only Google Spreadsheets which are published publicly are supported.”

“Fetching URLs From Web Services - grabbing from the Web more data related to the data you already have”

Tutorial: 

1. Generate a data file, in csv, xls or format, by following Pavel’s instructions;

2. Open the data file in Google Refine (figure 1);

  

Figure 1: open a data file in Google Refine.

3. (Optional) Once we choose a column to refine, copy the column to a new one. Click arrow next to the column header, select “Edit column”->”Add column based on this column” (figure 2);

  

Figure 2: copy column

 

In the pop-up dialog, enter a new column name, choose “keep original” or “Set to blank” for “On error”, then click OK (figure 3);

  

Figure 3: copy column (2)

 

4. For the column we want to refine, click its column header, select “Reconcile -> Start reconciling” from the menu (figure 4);

  

Figure 4: Select “Start reconciling…” from menu

 

5.  Add NIFSTD Reconciliation Service if we haven’t done so. Click the “Add Standard Service” button at the bottom of the pop-up window. Enter NIFSTD Reconciliation Service URL (http://nif-services.neuinfo.org/ontoquest/reconcile) in the dialog (figure 5); If the NIFSTD Reconciliation Service already exists in the left panel, just select it.

  

Figure 5: Add Ontoquest Reconciliation Service

6. Select “Reconcile against no particular type”, then click “Start reconciling”. If you want to reconcile against a particular type, select the type from the list of “Reconcile each cell to an entity of one of these types:” (figure 6);

Tips: If the type list is not available, do “Reconcile against no particular type” first, then clear the reconciliation result, and reconcile against a particular type now. Usually, the types become available at this time (figure 7).

  

Figure 6: Reconcile against no particular type

  

Figure 7: Reconcile against a particular type

7. View results (figure 8);

  

Figure 8: After reconciliation

8. Export data by clicking “Export” on the upper-left corner, then choose the desired format (figure 9).

  

Figure 9: Export data

9. Import the results back to concept mapping tool (Figure 10).

Go to the main page of concept mapping tool, click the import button on the top menu, and choose ‘load reconciled  CSV’. 

(Figure 10: Import data to concept mapping tool) 

 Then select the documents that you downloaded from google refine, click ‘OK’. The new mapped CSV file will be imported to concept mapping tool now. 

Vocabulary Source Creation


Vocabulary sources, like other sources require a view to be created, see view creation above, but the difference is that the vocabulary source should have a defined columns.


When the source has been selected, several columns need to be defined.

Provider: This is the ontology or vocabulary source. For NIF, Ontoquest in this example serves the NIFSTD ontology, so NIFSTD is the provider of the data.

Category: This column can be filled in if the source view has only one type of term, for example NCBI Gene as a vocabulary source has genes or the registry has only resource names, but can be left blank and defined by the Category Column if the category of each term is specified in the view, for example NIFSTD has both organisms and biological processes from two ontolgoies, these categories are specified in a column.

Synonym delimeter: The character that separates the synonyms, if multiple synonyms exist in the data

ID Column: The column that specifies the identifier of the term.

Term Column: The column that specifies the label or preferred label of the term.

Definition Column: The column that holds the definition of the term.

Category Column: This is a category assigned to the terms in this part of the vocabulary source. For example, if the vocabulary source contains organisms and diseases the column that specifies these.

Inferred Column: The column that specifes whether the term is inferred or standard. Inferred terms expand to not only synonyms, but also other relationships. For example GABAergic neuron is defined as any neuron that releases GABA, a class of neuron which changes if the neurotransmitter GABA is found to be released by a new cell type.

Abbreviation Column: The column that contains the abbreviation of the term.

Acronym Column: The column that contains the acronym of the term.

Synonym Column: The column that contains the synonyms of the term.


Make sure to save your mappings!


*Note, DISCO currently does not control the indexing process of vocabulary sources, so the definition has to end with an email to the systems team to start indexing manually.

Annotation Standards

Current NIF annotation standards are maintained in the Neurolex Wiki under NIF Annotation Standard. They should be used when mapping in Google Refine. 

 NIF annotation standard for age classification

  1. Adult organism
  2. Adolescent organism
  3. Juvenile organism
  4.    Newborn organism
  5.   Infant organism
  6.  Embryonic organism

  NIF annotation standard for expression level

  1.   Increased expression
  2.   Decreased expression
  3.   No change in expression

   NIF annotation standard for treatment paradigm

Additional NIF Curation Policies

 

NITRC integration policyNITRC Integration Policy
DISCO and BiositeMaps Integration Policy
Level-3 Database Licenses

Curation How-To's

Redirecting pages in NeuroLex

Known Curation Issues:

Link to curation issues page

Keywords, that are not in NIFSTD:

Link to Key Words, that are not in NIFSTD

Resource Hierarchy: