Skip to end of metadata
Go to start of metadata

Download as a PDF


  • Note: Please pay more attention to the text in red.  
  • What is a SciCrunch Resource? 
  • SciCrunch Annotation Standards
  • SciCrunch Registry Resource
  • Additional SciCrunch Curation Policies
  • Curation How-To's
  • Known Curation Issues:
  • Keywords, that are not in NIFSTD:
  • Resource Hierarchy:
  • Note: Please pay more attention to the text in red.  

    What is a SciCrunch Resource? 

    SciCrunch defines as a resource any data, databases, software/web-based tool, material, models, networks or information that would accelerate the pace of scientific research and discovery.

    SciCrunch has 4 core indexes through which resources are searched: the SciCrunch Registry, the SciCrunch Data Federation, Literature, and Grants. Grants and the Registry are databases within the Data Federation but also have their own tab. Using the DISCO protocol, the Data Federation queries independently maintained databases and datasets to return relevant content but database providers must register their data with the SciCrunch Data Federation and specify permissions. Grants exposes funding opportunities from the database, a central storehouse for information on over 1,000 grant programs open for application and provides access to approximately $500 billion in annual awards. Literature exposes the contents of PubMed and Open Access databases, including PubMed Central.  

    The SciCrunch community is welcome and encouraged to add resources to the SciCrunch Registry and SciCrunch Data Federation. The goal of SciCrunch is to bring people to your site; SciCrunch does not maintain any resources locally. SciCrunch has a full time curator and technical support to assist with the registration process. 

    SciCrunch Annotation Standards

    SciCrunch is developing annotation standards for neuroscience-relevant data and databases to make it easier to search across and integrate data from multiple sources. For our perspective on why such standards are needed, please read our blog. Our motto in the creation of these standards is "arbitrary but defensible", i.e., we will never be able to develop a standard that is 100% agreed upon but it should be based on a set of clearly defined and well reasoned criteria.  As the SciCrunch project is built upon a unified semantic framework based on the NIFSTD ontologies and Neurolex lexicon, all standards will be defined according to these ontologies. However, SciCrunch is moving beyond simple mapping of content to providing a more consistent user experience for navigating among very heterogeneous data sources. The more we can do to make sources look and "behave" the same, the better we serve our users.    

    Standards SciCrunch has developed include a standard column order, standardized column mapping; standards for age classification, differential expression, treatment paradigm, and dementia severity; as well as guidelines for entity mapping and linking data. See Annotation Standards below.

    SciCrunch Registry Resource

    The SciCrunch Registry, a core resource of SciCrunch, is a catalog of web resources that have been selected by SciCrunch curators, or contributed by the community, as valuable tools for researchers and students in the field of neuroscience. The SciCrunch Registry contains a listing of a variety of resources including databases, software tools, brain atlases, granting agencies, tissue banks, and many others. This list of resources is being continuously added to and updated by SciCrunch's staff, affiliates, and people who recommend their resources to SciCrunch. 

    The SciCrunch Registry uses SciCrunch vocabularies to provide high level descriptions of the nature of the resource and its contents.  However, unless the resource is a database or data set and has registered with the SciCrunch data integration tools, the SciCrunch Registry does not search the contents of these databases directly. For example, searching for global key words such as "genes" or "tissue bank" will bring up the various resources that have those descriptors, whereas "GRM1" or "C57BL/6J-rcw3J/J" will not bring up results, as the specific gene name or strain names are not tagged for each resource. The SciCrunch Registry is a place where there is a list of Alzheimer's disease tissue banks, but it will not tell the user which types of tissues are found in each tissue bank. This type of "drill down" search is provided for a subset of databases through SciCrunch's Data Federation . 

    Community involvement is encouraged. Anyone may add a new resource or edit existing resources. All additions and edits are curated by SciCrunch staff to comply with SciCrunch standards and policies.

    It shall be considered an individual resource if it is maintained by a single entity, and has the properties of one or more individual web pages that are related by a theme and HTML links. Most often the individual pages share portions of the URL, however, unrelated URLs may be incorporated into a single web resource. In the event that a subgroup of pages represents a sufficient shift in theme, it should be classified as an independent resource. For example, the department of neuroscience of a university (resource 1) may have a lab led by a researcher (resource 2).  

    Registering a resource to the SciCrunch Registry is the simplest form of registration. The registration form asks for the name of the resource, URL, and some additional basic information, including resource type(s).

    The resource becomes immediately available through NeuroLex (a wiki containing the resource form) where it is assigned a SciCrunch ID. It is also be included in the SciCrunch Registry (updated weekly), where it is available through direct query (through SciCrunch's search results) with links back to the original source. 

    Anyone, whether it is the resource owner or not, may register any non-commercial (exceptions apply) neuroscience related resource. If you are the resource owner you may add the “Registered with SciCrunch” icon to your site. 

    Resource owners may also place the fully customizable SciCrunch Navigator on your own website to search SciCrunch’s holdings. 

    After a resource is registered as a SciCrunch Registry resource, resource owners may desire to create a sitemap or register their resource as part of the Data Federation to provide direct access to dynamic content or structure of the content – see Sitemap and Data Federation below.

    What resources are included in the SciCrunch Registry?

    The SciCrunch Registry is not exclusive to any one type of resource. Rather, it contains a myriad of resources that are deemed valuable to the neuroscience community. Most of these are freely available on the Web, although some are restricted to a small community of users due to commercial interests, or laws governing the sharing of sensitive data. We are relying on feedback from the community as to what types of resources they would like to see. For example, would you like to see more commercial resources? Should we be including well known general resources such as GenBank and NCBI? Should we be listing journals and scientific organizations? This Wiki page (bottom of page) allows for commenting on our curation policies. The rule of thumb that SciCrunch staff tend to have is that if the resource is conceivably useful to some neuroscientists, then it should be included. 

    Two classes of resources generally not included in the SciCrunch Registry are electronic journals and commercial products. In general, SciCrunch is not yet ready to include journals and commercial sites, although exceptions have been made in both cases. Journals are already searched in the Literature section, so providing additional access to them seems redundant unless there is some specific reason for doing so, such as a database of supplemental materials published in the journal. Literature searches typically do not search these types of materials adequately, and so they are valuable additions to the SciCrunch. We have not yet created a list of commercial interests due to staffing issues, but we understand that many neuroscientists would like to have a list of commercial software tools, for example, that can be annotated and perhaps reviewed. Again, we encourage you to let us know what you need by providing feedback. 

    SciCrunch places a high priority on resources that are recommended by their owners, so these resources are typically included in the SciCrunch Registry relatively quickly. To register a resource, please visit our registration tool . All registered resources are reviewed by SciCrunch curators.

    What makes a good resource?    

    A good resource is one that is determined to be of value to the greater neuroscience/biomedical community (including: scientists, students, teachers, clinicians etc.). As the field of neuroscience is broad and can be affected by all aspects of biomedicine, we catalog all biomedically relevant resources. SciCrunch is only interested in "actionable" resources, i.e., resources that the public can take and do something with, such as a database, tool or service. Although we currently include departments and labs within universities, this rather pushes the limit of the "it must be actionalble" rule.  Within universities, there are numerous Institutes, such as the Unilever Center for Molecular Informatics within the Department of Chemistry at Cambridge. These types of groupings within universities we would say are non-actionable and should not be in the Registry.  We are starting to move Universities and departments from the resource registry into the resource ontology.  

    Additional criteria include:

    • Non-profit: SciCrunch is not yet prepared to catalog commercial entities as a rule, however, there are many exceptions that have been made and others can be made for resources that are of high value to the neuroscience community. In the case that the resource is commercial, SciCrunch needs to view the resource very carefully since the goals of SciCrunch and the commercial resource may not be aligned. Generally speaking SciCrunch's focus is not to provide free advertising to commercial companies, but rather to promote open source resources. 

    •  Access:  Is the resource accessible by the community at large or only individuals who are at a particular institution?  At this time, we will not be cataloging institution specific resources, but only those that may be used by the community at large. 

    •  Richness of resource:  The resource needs to be functional, i.e., we will not be listing resources that are still developing and have no product available.  These sites may be listed in the registry, and tagged for future curation. 

    •  SciCrunch is not a patient targeted resource: SciCrunch should not concern itself with making sure that all clinical trials are registered, as these are in Clinical, or add blogs from a patient's/relative of patients perspective. SciCrunch should include early phase clinical trials as SciCrunch resources, if they have methods sections that are published therefore providing information to other neuroscientists. They should be tagged as Experimental protocol and the word clinical should be included in keywords, as well as Organism=human. These resources should not be tagged as a clinical knowledge base if they do not provide data.

    • Journals: should not be added to the Registry as we have a literature section.

    Adding a SciCrunch Registry Resource

    To add a resource to the SciCrunch Registry, we have created a special form that can be accessed by anyone by using the Register A Resource form. (This form can be found under "NIF Data Sharing" in the left panel of NIF's home page.)  Alternatively, users can add a resource to the SciCrunch Registry by visiting NeuroLex and entering the name of the resource in the "Create a New Resource" box on the home page.  If using this method, please remember to first check to see if the resource is already in the system. 

    Both of these options require filling out a simple form with facts about the resource. The most important field to include when filling out the form is the URL of the resource. Additional fields are requested and we encourage the community to do the best they can to fill these out; all fields may not be applicable, e.g., a software resource may not have an associated organism. The better the record is filled out the more it benefits the user/owner as SciCrunch processes this text and relationships  to determine relevancy and rank in the search results. Resource owners or knowledgeable patrons are encouraged to keep the resource's content up-to-date; e.g., if the database now covers a new organism, add the new organism's name to the organism field. If a new paper about the resource has been published, add the PubMed ID to the PMID field. 

    Remember that the goal of these entries is to be used for search across resources. Information, as far as possible, must be machine readable and human readable. Therefore, do not just copy terms, but curate them so that they are machine readable and human readable. Two examples will illustrate:

    • "Contract #s N01-HD02-3343, N01-MH9-0002, N01-NS-9- 2314, -2315, -2316, -2317, -2319, -2320" was entered into the "supported by" field. Note, that if I were looking for N01-NS-9-2316, I would get zero results. A human knows what that list means but to get the computer to know requires additional programming. Curate by supplying the complete grant number.

    • Keywords: Longitudinal fasciculus (LF), medulla oblongata (MO), etc. I came across this example the other day where someone had copied a list of anatomical regions into the keywords from the website, including the abbreviations. All were red links, because the Neurolex did not recognize the string "Longitudinal fasciculus (LF)", whereas when I removed the abbreviation, it became a blue link. Also, abbreviations like LF cause more harm than good, as they are very general and non-specific.

    Remember that all entries will be curated by SciCrunch staff, so it is not imperative you get everything just right. 

    The resource is immediately available and searchable through NeuroLex, with the tag "Curated" or "Uncurated" depending on its status of the resource. All resources will be deposited into the SciCrunch Registry database the following Monday (provided the previous Thursday evening deadline is met). 

    Generally speaking, all new resources will be curated by a SciCrunch Curator within 7 days. 

    Other methods of resource nomination include emailing, or notifying SciCrunch staff directly. 

    Due to increased  levels of spam, the ability to create new pages has been restricted to only allow logged in users to do this. Therefore you will need to create an account (top right), / Login to NeuroLex, prior to adding a new resource. You can still edit existing pages without an account or being logged in. 

    Curators should always be logged in when adding new resources/editing.

    Naming a Resource 

    Generally speaking, try to name resources just as they are presented on the website e.g., PubMed would not be Pubmed, or Pub Med. Additional exceptions include:

    • If a resource begins with "The" drop it when naming the resource unless the abbreviation incorporates the T.

    • Do not use commas in the resource name, parent organization, or supporting agency, as the wiki sees this as a break.

    • For resources with common/general names e.g., "Alzheimer's Disease Center" include the associated University: "Boston University Alzheimer's Disease Center" in the title. E.g., Gene --> NCBI Gene. Do not name resources "Department of Pharmacology" as many universities have one. Likewise, do not have a generic abbreviation, e.g., DOP; rather, use the one provided along with the school name abbrev.

    • Do not make up names, synonyms or abbreviations unless you feel these alternate names will be searched for to find the resource.

    • For all university department sites here is the template: University; School; Department e.g., University of California San Diego; School of Medicine; Department of Pharmacology. (We have somewhat deviated from the use of the semicolon and try to represent the resource as it is presented on the website, making sure the university it is affiliated with is in there at the beginning of the name.

    • Do not use apostrophes, quotation marks, @, brackets or ampersands in a resource name as these create problems including these resources cannot be tagged "curated". Synonyms and redirect pages can be added to circumvent this problem. (@ and brackets ([ ]) cannot be used at all in any of the forms fields)

    • Avoid special characters like "/", ":"as no one will query for them.

    • Do not use version numbers as these will change over time

    • Avoid sloppy looking names with extraneous information, e.g., Neuroanatomy Lab Resource Appendices: Sectional Atlas. "Lab Resource Appendices" isn't the name of the resource. If there isn't an official, acceptable name, then create one: Sectional atlas of human brain.

    • Avoid extra description in the title. Remember, these will be used for search and for alphabetizing. The name of the resource should be the common name:

      • Yes: Flybrain

      • No: FlyBrain - An Online Atlas and Database of the Drosophila Nervous System.

    Changing a resource's name

    See Changing a resource's name below.

    Requested Resource Form Fields (Basic tab)


    The description can often be obtained by reading/copying the "about us" section of the resource or from the home page. In the first snippet it is important to convey what the resource offers.  The following paragraphs can provide more detail for interested visitors. 

    How to write good resource descriptions

    When writing a resource description, please keep in mind that the first snippet will be used in the SciCrunch Registry listing.  With that in mind, the main thing to consider for the resource is:

    ****What is the primary product offered at this website?  Software tool?  A data set?  A service?**** 

     While it is tempting to copy the description of the resource verbatim from the web site, please do not do this indiscriminately rather turn them into informative, pithy, machine-readable resource descriptions. These descriptions will be displayed as snippets by many tools that access the resource registry. Thus, the first line of the description should be as informative as possible. 


    Let's say we wanted to add a resource that is called: Cow brain gene expression atlas 

    Good leading sentence:  Atlas detailing the three dimensional expression of 20,000 genes across major regions of the cow brain... 

    Bad leading sentence:  The Cow Brain Gene Expression Atlas was developed by the University of X and aims to provide an increased understanding of ..."

    • Resource type(s) (below) selected for the resource, should be addressed/explained in the description

    The following are guidelines and best practices for reviewing and writing resource descriptions:

    • Do not repeat the name of the resource in the first sentence

    • Avoid using the personal pronoun when describing the resource, as SciCrunch's descriptions should not be from the resource providers point of view, e.g., do not say "We offer...".

    • The first word(s) should generally be a statement of the classification used by the SciCrunch curator for resource type,  If the resource is classified as a data set, the first line should read "Data set that...".  If it is a software application, it should read "Software application..."  These may be changed slightly for grammatical or readability issues, but it is good practice that the human readable definition and the machine readable definition should be the same.

    • Please remove any brackets or @ that are in the description, e.g., for references, email addresses, as these are interpreted by the Wiki and lead to errors in formatting (@ can ve replaced with (at) or _at_)

    • Make sure to eliminate any breaks or spaces that might be introduced by copying HTML.

    • Avoid using terms that have a time component, e.g., "new", "recently", because these can become out of date and misleading pretty quickly.

    • Remove any qualifiers used in the resource description that are subjective and not-informative, e.g., cutting-edge, most comprehensive available, global leader.  Pay particular attention to this policy if you are curating a commercial site.  Their descriptions have been written by public relations experts, not scientists.

    Resource URL

    Add the URL of the resource including the http:// portion of it; i.e., adding a URL such as will not link.  You must provide the full URL,

    The URL should only include the necessary portion required to function, e.g., use rather than Use rather than That said, please verify that the trivial portion of the URL is indeed not needed. Surprisingly, some URLs will not work without these.

    Some websites, have multiple URLs point to the same resource. Keep track of these URLs at the bottom of the page below Notes. Separate multiple alternate URLs with a break. These alternate or old URL’s may often be found in the abstract section of the resource’s associated paper. 

    Refer to SciCrunch’s Link checking policy below when a new URL cannot be found for the resource. 

    Occasionally, registries, etc., also host a resource with a dedicated URL for the resource. In these cases, add the URL of their website as an alternate URL at the bottom of the page below the Notes section, but include the name of this parent resource.


    This page uses this default form:Resource

     Alt. NITRC URL:   

     In these cases, also add the name of the resource in the “Related To” field, e.g., NITRC 

    When dealing with mirrors, add "Alt. URL (Mirror): http:// " to the bottom of the page below Notes. Separate multiple alternate URLs using a break <BR>.


    Keywords should be used to supply related terms that characterize the resource that may or may not be in the ontology. Keywords should be "meaty", that is, they should convey specific information content and be likely search terms, e.g., proteomics, is a good keyword, as people are likely to search for that and we do not list it as a resource type. 

     Avoid generic keywords like "experiment", "species", "science", "discipline". Use the keywords for ancillary functions of a resource, e.g., ModelDB imports and exports models for simulation. It isn't a "simulation resource", as it is not used as a platform for simulation. It is, however, related to simulation and that would be an acceptable keyword. 

     Keywords should use the singular form for the word and, if applicable, cover things such as:

      • Technique: microscopy type, assessment test, behavioral data, electrophysiology etc.

      • Structures covered: sub-cellular components, cells, brain regions, body structures, whole animals etc.

      • Topic covered: psychology, neurology, neuroscience, physiology, behavior, etc.

      • Functional level: embryonic, young, adult, aging


    Information, as far as possible, must be machine readable and human readable. Therefore, do not just copy terms, but curate them so that they are machine readable and human readable. E.g., I came across this example the other day where someone had copied a list of anatomical regions into the keywords from the website, including the abbreviations. Longitudinal fasciculus (LF), medulla oblongata (MO), etc.  All were red links, because the Neurolex did not recognize the string "Longitudinal fasciculus (LF)", whereas when I removed the abbreviation, it became a blue link. Also, abbreviations like LF cause more harm than good, as they are very general and non-specific.

    Defining Citation

    PubMed ID's from papers about the resource should be added to this field. Papers that reference the resource or only mention the resource in their paper should not be added here. We have an automated service (beta) that collects these mentions that is available through the “Mentioned in Literature” column.

    The PubMed ID field can be obtained from the website, when available, or by searching PubMed for the resource. This later method doesn't always work, especially with resources with common names. If the name is a phrase (multiple words), putting the name in quotes often helps and/or adding [Title] next to the name of the resource. This will of course only find resources with the name in the title.

    Multiple id's may be separated by a comma but only the first entry will be linked to PubMed.

    If the resource does not have a PMID number, use the DOI. Separate each paper entry by a comma. Format example: PMID 25018728, PMID 23195120 or DOI 10.1111/iwj.12345

    Related to

    Resources within SciCrunch may be related to one another. If a resource within SciCrunch is related to the presently curated resource, place the name of the related resource in this field. Separate each related resource by a comma. Related resources must be included in the SciCrunch database. If resources are related but not yet included in SciCrunch, add them. If resources are related through commercial industries, you only have to include those larger industries which offer more products that could be added to SciCrunch. 

    Ex: Resource added to SciCrunch is Parallel Computing Toolbox, a software application by MATLAB. MATLAB would be a related resource because it could be a resource in SciCrunch..

    Ex: Resource added to SciCrunch is Microsoft Academic Search, a software application by Windows. Windows would NOT be a related resource because it is not a resource that could be listed by SciCrunch.

    Specific "Related To" categories are: Is Used By, Uses, Is Listed By, Lists, Is Recommended By, Recommends, Is Affiliated With, Affiliates, Is Parent Organization Of, Has Parent Organization, and Duplicate Of. For categories Is Used By, Uses, Is Listed By, Lists, Is Recommended By, Recommends, Is Parent Organization Of, and Has Parent Organization, refer to their additional sections below. For Is Affiliated With, Affiliates, and Duplicate Of, see next paragraph.

     Is Affiliated With - a resource is associated with, or involved in, another resource. This category is typically used to connect consortium members to their respective consortiums.

    Ex: University of California at San Diego; California; USA is affiliated with International AMD Genomics Consortium.

    Affiliate - a resource has members or affiliates with other resources. This category is typically used to refer to the members of a consortium or large project.

    Ex: International AMD Genomics Consortium has University of California at San Diego; California; USA as an affiliate.

    Duplicate Of - a resource is a duplicate of another listed resource. The Duplicate box is only marked for the duplicate resource with younger/most recent Original ID.

    If no specific categories fit a resource relationship, put the relationship as Is Related To or Related To.

    Parent Organization

    The institution that houses/maintains the resource, example the Allen Brain Atlas is housed at the Allen Brain Institute. A physical address can be associated with the parent resource. A parent organization must be included as a resource in SciCrunch and is considered a type of relationship.

    We will provide more of a top level view rather than delve into department or lab level. E.g., the SciCrunch's parent organization would be the University of California at San Diego; California; USA, rather than the Center for Research in Biological Systems; University of California San Diego; California; USA, or the California Institute for Telecommunications and Information Technology (Calit2). In other words, where applicable, do not use department or lab for this field, rather the institution, unless it is already entered as a Resource or it is felt it would make a good resource (See above, What makes a good resource?). 

    Top level institutions are generally in the following format: University name; State; Country. In order for this field - or any other field - to match in the NeuroLex (Semantic Media Wiki), it will have to match just as it was entered. Due to minor variability, you may just want to search for it, then copy it.  If there is no parent organization, such as the Jackson Labs, then leave the field blank. 

    If the parent organization is not obvious, the URL can be of use because it is often telling. E.g., For the resource, "National Long Term Care Survey," you can see the URL, shows the resource's association with Duke University. If it is not as blatant as this, you can try shortening the URL and seeing where is resolves. Alternatively, you can search PubMed for the resource to see if there is an associated paper and get the information from the paper. When using this last method, as collaborations are common, use the PI's (last author's ) parent organization using the most recent available paper. 

    Parent Organization Naming Scheme

    Use the English version of any organization name(s) if it is readily available and state the country where the organization(s) is/are located, and the laboratories therein. 

    If a parent organization is a university, please follow the format listed below: 

     Universities located in the United States should be written as follows:

    University Name; State; USA

    Example: Cornell University; New York; USA 

     Universities located outside the United States should be written as follows:

    University Name; City/State/Province; Country (1st choice=state, 2nd=province, 3rd=city)

    Example: University of Oxford; Oxford; United Kingdom

    Example: University of Alberta; Alberta; Canada 

    Note: Resources can have more than one parent organization. Enter each organization separately as a parent organization.

    Relationship box of curation dashboard should read: Monarch Initiative - This resource has parent organization Oregon Health and Science University; Oregon; USA. This resource has parent organization University of California at San Diego; California; USA.

    Please discard the following formatting word selections:

    • "The" - found in front of the University Name

    • Commas, @, &, apostrophes, and quotation marks

    • Do not use commas in the resource name or parent organization as the wiki sees this as a break. You will not be able to link to other resources or categories

    • Universities, Institutions, Institutes, and Hospitals aren't considered resources and therefore do not use the Resource form.


    The abbreviated name will be used to identify your resource in publications and it is currently unique. Please make sure that a page with the name does not currently exist. 


     Include variations of the name that are used in the website or associated paper. Save abbreviations for the Abbreviation field.

    Funding Information

    Include information related to the resource's funding information (supporting agency and funding support). Separate multiple grants by a comma. Format-Supporting Agency+ space+funding support number, e.g., Office of the Director NIH 000000000, Contract HHSN27120080035C                                                             

    Supporting Agency

    Look for supporting agency(s) on the website. This will often be available at the bottom of the page or in an acknowledgements section. When this information is not found on the website, it can often be obtained from a paper(s) about the resource. Papers that describe the resource can often be obtained by searching PubMed for the name of the resource. Verify the paper is describing the resource not just mentioning it. The information can be found in the "Acknowledgements" or "Funding" section. 

    Funding support

    Look for the grants(s) funding the resource on the website. This information will often be available at the bottom of the page or in an acknowledgements section. When this information is not found on the website, it can often be obtained from a paper(s) about the resource. Papers that describe the resource can often be obtained by searching PubMed for the name of the resource. Verify the paper is describing the resource not just mentioning it. The information can be found in the "Acknowledgements" or "Funding" section. Contracts should be listed in this field too, just add the word “Contract” beforehand,

    Information, as far as possible, must be machine readable and human readable. Therefore, do not just copy terms, but curate them so that they are machine readable and human readable. "Contract #s N01-HD02-3343, N01-MH9-0002, N01-NS-9- 2314, -2315, -2316, -2317, -2319, -2320" was entered into the "supported by" field. Note, that if I were looking for N01-NS-9-2316, I would get zero results. A human knows what that list means but to get the computer to know requires additional programming. Curate by supplying the complete grant number. 


    This field is for the user to submit any comments about the submitted resource.

    For Biospecimen resources, please add the available Sample types to this field in the following format: Sample type: Blood, DNA, Urine, Cell, etc. Please add this information as keywords too. 

    Editorial Note

    This is field is for the curator to make any additional comments about this resource.

    Additional Resource Types

    Additional resource types have been created to provide a standardized method of classifying resources. Resources should be labeled with the main thing(s) that the resource offers using terms within the Resource Type Hierarchy. 

    What is the primary product offered at this web address? Software tool? A data set? A service? That is, what would a user expect to take away from this site? Avoid assigning resource types for very minor functions. For example, if a site offering a database on nucelolar proteins has a discussion tab where they advertise a position for hire, do not characterize this resource as a job resource. A user should always understand why he or she was taken to a site, i.e., they shouldn't have to dig for information-it should be obvious. You may use the keywords to add additional resource descriptors if you think they are highly relevant. 

    The goal of resource type categorization is separate from keywords or other properties in that the resource type should inform the user as to the central purpose of the resource and not the particulars. A resource type is the “product” that is offered. For example, MGI is a database of mouse genes and is labeled as a "Database;" the Mutant Mouse Regional Resource Center accepts and distributes mutant mice and is labeled as an "organism repository;” and the Michael J. Fox Foundation for Parkinson's Research funds grants so it is labeled a "Funding resource."

    To do this categorization, we created an internal consensus resource descriptor list based on the interactions with the library community and the BRO resource ontology, as well as several NIF partners. 

    The NIF Resource module was created within the NIFSTD ontology as a separate module. This module fixes the set of high level categories, adding classes like "Service resource", and also attempts to harmonize with the Biomedical Resource Ontology (BRO), NITRC resource types and OBI classes. 

     All of the individual resource types currently fall into at least 1 of the 8 major categories below, and the user may search by these categories,

    • Data or information resource: A resource that describes data or information, which may be in different forms

    • Software resource: A resource that provides software such as software programs or source code

    • Material resource: A resource that provides supplies or equipment such as reagents, instruments, tissue samples or organisms

    • Service resource: A resource that provides a type of service, e.g., making antibodies, storing data, computing services

    • Funding resource: A resource that provides funding opportunities in the form of grants or contracts

    • Training resource: A resource that provides training, educational opportunities or educational materials, such as courses, workshops, seminar materials or graduate programs.

    • Job resource: A resource that provides listings of employment opportunities

    • People resource: A resource that provides information on individual people, for example, on expertise or affiliation

    A complete up-to-date listing of the Resource Type Hierarchy and their definitions is available through NeuroLex. An alternate view can be found through Bioportal's NIF Resource Type Hierarchy View. 

    These resource descriptors are meant to narrow search results by the type of thing that a neuroscientist is looking for. We believe that they are useful as general categories because they are in common English and tend to be understood by Neuroscientists quickly. The question that is to be answered is "what is the end user looking for?" For example, if the user is looking for a transgenic mouse, they should not be bombarded with software tools that hit the same keywords or data sources that talk about the mouse.h3.

    The resource should be tagged with all applicable resource types, but not resource types that pertain to sub-resources if they will become separate SciCrunch resources. In general, if you have to assign too many labels, you are probably better off creating separate pages for some of the tools, rather than trying to characterize everything a particular resource has to offer in total.   In general, the trend in SciCrunch has been to use less granular resource types to simplify choices by the user. Thus, we now favor just “database” over “web-accessible database”.  Any additional characteristics can be covered by the keywords.

    Assigning resource type or types can be challenging, as many websites offer multiple products and an individual product can serve multiple roles. All resources are curated by the SciCrunch curator, so do not be concerned if you have difficulty. 

    Many times, portal sites have a lot of valuable resources that aren't apparent from the home page.  In this case, SciCrunch has to decide whether to create a separate entry for the resource or tag the general resource with a bunch of tags.  One of our guiding principles is that the user should know why they are taken to a site.  For example, an organization claims that it has a training program for Ph.D. students, but it takes the SciCrunch curator significant time to find out where on the site this information is listed.  That page may not be particularly useful without going through the home page of the organization.  In this case, SciCrunch would tag the organization home page with "Graduate program", but would include in the description that such a program is offered and how users can find out about it within the resource.  In contrast, a model organism database may have an ontology that is available through their home page, but is difficult to find.  As the ontology page can be considered a self-contained resource, that is, you don't need to read the home page to understand it, SciCrunch would list the ontology as a separate resource.

    E.g., just because a resource has images, it does not mean you should tag it with image. You would only tag it with image if that was one of the main things the resource offers. Image can always be added as a keyword for these types of scenarios.  

    E.g., Model organism databases such as Xenbase should only be tagged as a Database and Repository. You can add other resource types such as Organism-related portal, Data analysis service and Organism supplier as keywords.

    E.g., Databases that offer data analysis services such as BLAST should only be marked as a database - not data analysis service. This can be added as a keyword.

    Multiple entries are to be separated by a comma. Anytime you get more than 3 resource types begin thinking about breaking the resource up into more resources. 

    • Do not tag Faculty within university departments as People resource.
    • Department level institutions or organizations - and above (Universities, governments) - should be categorized "organizational portal." (departmental portal where applicable)

    • Lab level institutions or organizations - and below - should be categorized as "Laboratory portal" or "topical portal."

    • The "Medical school program resource" tag should only be used when a class or classes towards a medical degree are provided in a medical school setting.

    • If a resource type is selected for a resource, it should be addressed/explained in the description.

    • Software tools are categorized by function, e.g., simulation software is software that is used to perform simulation. Do not use a specific function term unless the software is designed for that function. If the site has a database of models that may be derived from or used for simulation, it is not simulation software. In this case, use the term "simulation" as a keyword.

    • A database of data sets should be marked Database when it serves the individual data sets as elements.

    Keywords and resource types will be treated in a special way by the SciCrunch search systems allowing them to be ranked higher than other search results. 

    Is Listed by

    A resource is listed by another resource. Must be in the SciCrunch database. 

    Ex.) CiteULike is a digital tool listed by Connected Researchers.


    A resource lists another resourceMust be in the SciCrunch database. 

    Ex.) Connected Researchers lists CiteULike as a digital tool.

    Is Used by

    A resource is used by another resourceMust be in the SciCrunch database. Data or information may be pulled from this resource. Must be in the SciCrunch database.

    Ex.) KEGG and its pathways are used by METLIN.


    A resource uses another resource to fulfill its purposeTypically in reference to tools, but can also be a category for information used by another resource. Must be in the SciCrunch database.

    Ex.) Oncogenomic Database of Hepatocellular Carcinoma uses information from Ensembl. 

    Is Recommended by

    A resource is recommended by another resourceMust be in the SciCrunch database. 


    A resource recommends another resourceMust be in the SciCrunch database. 


    State the availability of the resource/licensing information. E.g., if the resource is a biobank, can anyone request biomaterials? Is it public, open source, BSL license, freely available but must cite, freely available to non-commercial, what is the license for the software, etc. If available, this information can be obtained either from the website or the related article. 

    When the information is available, the field should cover access of the resource: can you add to the resource, can you take from the resource / what are the terms, and is the resource still available. If the resource is no longer available, please add the following to this field as well as to the top of the description: THIS RESOURCE IS NO LONGER IN SERVICE, documented on ‘full month’ ‘day’, full year. (e.g., THIS RESOURCE IS NO LONGER IN SERVICE, documented on September 05, 2013.). If the user can add to the resource, add "The community can contribute to this resource" in the field. 

    Separate each standardized entity with a comma (delimiter for the wiki), Open unspecified license, Acknowledgement requested, THIS RESOURCE IS NO LONGER IN SERVICE, documented on September 05, 2013., The community can contribute to this resource. For a full list see Availability values in the NeuroLex. 

    Terms of Use URLs

    The URL(s) where the resource posts under which conditions you may use the resource. This can include other titles, e.g., copyright page, citation policy, policy, terms and conditions, etc., (comma separated for multiple entries). This field should not include the url of the actual license. (The actual license has its own entry with associated url (s)).

    Alternative URL

    Other/Alternate URLs that retrieve the source (comma separated for multiple entries).

    Old URLs

    URLs that used to retrieve the source but no longer work (comma separated for multiple entries).

    Alternative IDs

    This field field does not need to be altered.

    Related Application

    This field was created primarily for the biospecimen resources to state if the bioresources were to be used for research, transplantation, therapy, education etc. 

    Multiple entries are to be separated by a comma. 

    Related Disease

     If the data resource concerns a disease, set of diseases, or condition, make sure that they are stated (e.g., Parkinson’s disease, neurodegenerative disorder, Batten’s disease, Aging, Normal control, etc.)

     Additional entries are to be separated by a comma. 

    Located in

    The physical location that the resource is located in, if known.


    This field is mainly used for biobanks and what sort of processing the biospecimens have been put through, e.g., Frozen, paraffin, slide, cryopreserved, stained, fresh, etc.

    Separate multiple entries by a comma. 


    Add organisms represented in the resource, E.g., if the resource is a database of mouse gene expression, add Mouse to this field. Not all resources will have an associated organism; e.g., like many software resources. Some resources are not forthcoming with this information. For instance, if the database is a clinical trial database, make sure that it is labeled human. Many resources mention the organism(s) in the description, but as some do not, it becomes very important to capture this information.

    Multiple organisms may be added just separate by a comma.

    The organism's age(s) should be classified using the NIF annotation standards for age classification but this should be added as a keyword., e.g., Late adult human, Embryonic mouse, as it includes more information than just the organism (the age).


    This is the primary resource type. Entries will either be a resource, commercial, an institution or a university.

    Occasionally there is a PDF or other publication that is about the resource. You may also link directly to the paper, even if it is in PubMed. This field will only accept one URL. Please include the full URL that includes the http:// (or the like) part for the link to work. 

    Social URL

    Add the social url for this  resource, for example FaceBook, Google+ and WordPress.


    This field is dedicated to the terms under which the content is made available. Place a short description about the licenses used.

    Twitter Handle

    Only the twitter handle should be used for this field. For example, @[[Twitter::neuinfo]]. The twitter handle “neuinfo” should be used.

    Scope of Curation for SciCrunch Registry Resources

    The goal of curation is to establish a set of identifiers that will help the end user find relevant resources, but not overwhelm the user. 

    Here is a concrete example, a user was looking in the SciCrunch Registry for any resources that were annotated with the term "Locus Ceruleus" and the CNS Forum was returned. There is no mention of "Locus Ceruleus" on any page within the CNS Forum, but one of its subcomponents is called brain explorer. This feature contains a set of images of brain regions that were pulled by curators and annotated. Thus, the main site CNS Forum was returned for the "Locus Ceruleus" query. This addition of annotation is not helpful, but confusing in this case because to find any mention of the "Locus Ceruleus" the user would need to navigate down four link levels from the main page to a list of brain structures. Most users would not do this and simply believe that the result was an error. Therefore, the annotation should be narrow enough that it captures the main features of the site, but not information that is too deep within the site to easily find. For resources with deep structured content, consider exposing them through the SciCrunch data federation.

     Another case that is difficult to assess is the case of protein or gene databases. It is often possible to obtain a so called "data dump" of the individual records from a database and one possible curation method is to take the data dump, strip the tags and place a cleaned list of terms in the registry file. Thus, a database registered in this way would always return if the end user queried for any of the proteins or genes within the data dump. Several problems arise with this strategy, including updating of information and also preferential treatment in search of databases that can dump data. 

    To address the first issue of updating the information, a single data dump will create a snapshot of the data as it was when the data dump occurred. This may be a good idea for relatively static web entities, such as an atlas from an individual experiment. Data will not be added to the atlas, but it is a good reference resource. However, most scientific databases are not static entities, for example GENSAT is updated daily at 6AM EST. Therefore to stay current with new developments any data dump would need to be done with a frequency of the newly available data. The ability to accomplish this task manually on a daily frequency is not a reasonable expectation of a human curator, rather it is more amenable to an automated program. So any web resource that has a significant and changing component should be annotated generally and added as a possible level 2/3 resource candidate. 

    The second problem that arises with a data dump model is the preferential finding of databases that allow their contents to be easily dumped. The contents of the PubChem or UniProt databases may be too vast for a human to easily parse them, so these sites tend to be left out of the data dump class, but they are more likely to contain any protein data than easily parsed databases like KARG (with only thousands of entries). Again this creates a problem in searching for data, because while PubChem is certain to have relevant data to the query, a smaller database will come up preferentially because its data has been dumped and parsed. 

     Thus, the scope of annotation should be relatively superficial for level 1 resources, and also should be consistent in scope. 


    Incorporating Outside Registries and Accommodating Their Tags

    The SciCrunch Registry incorporates outside registries and, after adding and tagging every resource within the SciCrunch Registry, we can expose a separate view of these through the SciCrunch and NeuroLex. The "Related To" field of each resource within the parent registry should include the tag of the parent registry resource. E.g., for the outside registry Gene Ontology Tools, the curator would add the resource Gene Ontology Tools and then add every resource within the Gene Ontology Tools Registry, adding "Gene Ontology Tools" (no quotes) to the 'Related To' field of each one. The classification tags of the originating resource should be included in the Keywords of each participating resource. With these tags in place, we can create tables within the NeuroLex and pull this resource out in the SciCrunch as a separate resource. View the Gene Ontology Tools database in the SciCrunch. View tables in NeuroLex. (To view the code to create these tables, go to the 'More' tab and select 'Edit Source'. Modify the code to as required to accommodate incoming new resources.)

    We are currently in the process of adding another property to the NeuroLex to better accommodate the classification tags of the incoming Registries. With this new property we can also expose this classification through the SciCrunch interface.

    Additional SciCrunch Staff Responsibilities 

    Verifying an Account

    Verify that the user is a valid user , e.g., represented by a university or other valid entity by searching the Web. (They are instructed to send their user name, and institutional / organizational affiliation that can be verified and/or specifics for what they would like to add.) If they check out, click on 'Special pages' at the bottom of the NeuroLex home page.

    Select 'User rights management' (under 'Users and Rights')

    Add the username where it indicates, and click on 'Edit User Groups'.


    Select the 'trusted' box, then 'Save user groups'.

    Send an email letting them know that they have been verified, e.g.,

    Dear Isabel,

    Thank you for creating a NeuroLex account. You have been verified as a trusted user and now have permission to create new pages.
    Please let me know if I can help with anything else.


    Cc the group group so that the rest of the team knows they have been taken care of and we do not replicate efforts.

    SciCrunch Link checking policy

    SciCrunch's website links are checked regularly and invalid links are sent to curators weekly. A set of scripts, accessible here constitutes a pipeline that checks links weekly producing an invalid link file. Curators have access to this file, which includes the number of weeks that the resource has been pinged and found to be invalid. The curators manually check the links and attempt to determine if the resource is invalid, in which case the description text is updated to say "THIS RESOURCE IS NO LONGER IN SERVICE" or if a suitable replacement link can be found then the URL is updated. 

    Invalid URLs:


    If the resource's URL is no longer functioning, the resource should be tested 3 times over the course of 3 weeks to ensure that the site's server is just not down. 


    Resources that are no longer present at the existing address are manually searched for and updated when the curator can find the web resource that is indeed the same one that has been cataloged. Frequently these resource’s entire record will need updating. 

     Before tagging something non-operational, THIS RESOURCE IS NO LONGER IN SERVICE, documented on Month (June), day (09), year (2013), 1st try to find a valid URL for that resource by doing a Google (or other search engine) search for that resource. (Sometimes searching for content in the description is helpful.) If this does not produce a valid URL, curators can look at the invalid URL and frequently determine the general location of the resource, e.g., which department, unit, etc., it comes from, and search it directly for the resource. 

     Several scenarios can arise from doing this, but frequently the department, etc., changing the structure of their URL is to blame. The chunk of the old URL that represents the department will sometimes morph into the new department’s URL. This new URL can be used to search for the resource within that department; sometimes searching that portion of the URL will tell you where the new location has moved to; and sometimes you will get an error message that the XYZ department is no longer available here. At that point you can search (Google, etc) for the department and then search for the resource. Sometimes, the department itself changes, is renamed or included within another department or center. These are all scenarios curators can take advantage of to search for the resource. Occasionally the name of the resource will change a bit, depending on the type of resource; e.g., the XYZ Tissue bank is now the XYZ Tissue and Cell Bank. The name of the resource or parts of the name of the resource can then be searched to see if you can find the resource or its equivalent. If the equivalent resource has been re-named, at the very minimum, a synonym should be added; but preferably, a new page created with the new valid name. Both the resource's content and SciCrunch id will need to be redirected to the new page and the old name added as a synonym. The content of the record will also likely need updating. 

    Found a valid URL

    If a new URL is found, place the old URL at the bottom of the page under Notes in the following format: 


    This page uses this default form:Resource  

     Old URL: 

    Did not find a valid URL

    If you have exhausted all of the methods above, place the following label at the very top of the resource's description:

    THIS RESOURCE IS NO LONGER IN SERVICE, documented on September 05, 2013. Also add this information to the Availability field. 

    Duplicate Resources

    If a resource is found to be a duplicate of another resource, it is crucial to keep the oldest record of any resource (lowest number SciCrunch-ID of resource) because the resource number forms the unique id for that resource and is used by developers. Any pertinent information to make a description, or other field, more complete is transferred over from the one about to be redirected. It is important to not delete but redirect both the resource and the id to the new resource. 

    If the URL differs, the alternate URL, or old URL, is recorded at the bottom of the page


    This page uses this default form:Resource  

     Old URL: <BR />

    Alt. URL: 

    Redirecting pages in NeuroLex 

    In NeuroLex, category pages, resource pages and id pages often need to be, or would benefit from being redirected. A duplicate resource should be redirected, along with its SciCrunch ID to the oldest resource. Creating a redirect page for commonly abbreviated categories such as NIH is also beneficial. Below are steps on how to do this. 

    Redirect to an existing page

    To redirect to an existing page, choose “Edit source” from the “More” tab of the resource you want to redirect. Transfer any existing information to the main resource that you want to keep. Redirect the SciCrunch ID before you delete it (See below). Then replace the existing contents with #REDIRECT[[:Category:Resource:XXXXX]] where XXXXX is the resource page you want to redirect it to. Save the page.

    If you would like to redirect it to a non resource page, use the following format without “Resource:”.

    Curate the redirected page or the system will not recognize it.

    Redirect a SciCrunch ID to another page

    Search for the SciCrunch ID. (Make sure there are no spaces in front of it.) A search for nlx_151325 will show,  "You searched for nlx_151325 (all pages starting with "nlx_151325" | all pages that link to "nlx_151325""
    above the search field. Click on the id. It will take you to the page it belongs to. At the top of the page you will see (Redirected from Nlx 151325). Click on the id. You should see the redirect page of its corresponding resource.  To redirect this to another page, click on “Edit source” from the “More” tab and update the name of the resource, e.g., #REDIRECT[[:Category:Resource:BBBBB DDDDD]] ---> #REDIRECT[[:Category:Resource:XXXXX]] Then save.
    Alternatively, go to the SciCrunch ID’s URL, e.g., (always in the same format / replace your id with the ID shown). At the top of the page you will see (Redirected from Nlx 151325). Click on the id. You should see the redirect page of its corresponding resource.  To redirect this to another page, click on “Edit source” from the “More” tab and update the name of the resource, e.g., #REDIRECT[[:Category:Resource:BBBBB DDDDD]] ---> #REDIRECT[[:Category:Resource:XXXXX]] Then save.
    To redirect to a non resource page the format will be as follows without the “Resource:”

    Redirect to a new page / Changing a resource's name

    You may either create a new category / resource from the main page, or, if the contents of the page you want to redirect is the correct contents, e.g., you just want to give it a different name, create the new page using the URL: XXX, XXXXX XXX representing the name of your resource.
    ( XXX for non resource pages)

    Copy the contents to the resource you want to redirect over to this new resource: From the More tab, hit "Edit Source", Select all the contents (Ctrl A), Copy it (Ctrl C), then paste it on the new page (Ctrl V): From the More tab of the new page, select "Edit Source" and paste in the contents. Save the new page with the pasted in contents of the old page. 

    Redirect the id of the old page (the one you are redirecting) to the new page. (see above).

    Then redirect the old page to the new page:

    Click on “Edit source” from the “More” tab and add the redirect information. 
    #REDIRECT[[:Category:Resource:XXXXX]] Then save.
    To redirect to a non resource page the format will be as follows without the “Resource:”

    Curate the redirected page or the system will not recognize it.

    Maintaining Discontinued Resources

    Other scenarios may arise such as resources that no longer provide the service, data, software, etc. that they once did, including the reason they are in the SciCrunch Registry. Curators typically see discontinued resources for software where certain software resources become obsolete or replaced by another software. If such a resource was upgraded, curators need to update the resource accordingly.

    It is important for curators to NOT DELETE these resources from the registry as it is useful keep them. Similar action needs to be taken for all discontinued resources, not just the software kinds. Other discontinued resources can include:

    • resources that are no longer operational

    • resources that have been replaced by something new or merged with some other resource (typically seen for databases)

    If the NIF logo is displayed on resource pages add it to the NIF Stats Google doc., under the NIF Referring Sites tab. 

    Contact Resource Providers  

    It is the policy of SciCrunch to contact a random subset of resource providers to accomplish several goals:

    • To establish communication leading to higher levels of data integration

    • To let providers know about SciCrunch

    • To establish interaction with providers that have an interest in co-development with SciCrunch.  For example, using Neurolex and other tools developed for SciCrunch, using SciCrunch's registry, metadata infrastructure etc.

    • To establish a clear two way communication with ontology-related projects.

    An outreach letter template has been crafted and lives in the NIF Project Private Wiki under SciCrunch Curation and Outreach. Curators should use this template to establish contact with the resource owners. The date contacted, the date updated, date approved, resource name, SciCrunch ID, resource owner(s) contacted, and pertinent notes, are recorded in the NIF Stats Google doc., under the Registry Contacts tabs. Notes should include information such as if the resource owner had you do the updates, etc. All correspondence should be followed up and Cc’ed. 

     The outreach letter template should be updated as necessary. 

     Email addresses should be periodically passed on to be added to the Registry resource owner’s group mailing list.

    SciCrunch Level 2 (DISCO/BiositeMaps) Integration Policy

    SciCrunch incorporates all resources that have registered themselves to Biositemaps and DISCO, which fall within the general domain of neuroscience, are aligned with SciCrunch's curation policies, and contain sufficient information to be found. 

     The minimum set of information includes accurate: Name, URL, contact information and a short description. 

     SciCrunch will curate the minimal information provided by the Biositemaps or DISCO files and include additional descriptive information, including keywords, resource type categories, useful abbreviations and institutional information.  SciCrunch will also contact the resource provider when the information is included in the SciCrunch Registry to allow the resource provider to add to the description, keywords, or other pertinent fields. 

     SciCrunch will crawl the automated information at regular intervals to test for changes in the DISCO or Biositemaps files, and curators will be alerted when changes occur so they can confirm and update the public records.  However, no automated information will be released without prior approval by curation staff. 

     SciCrunch maintains the authority to remove any resource that is not deemed to be appropriate for SciCrunch, no longer functional or no longer aligned with SciCrunch's curation policies. 

    SciCrunch Data Federation Resource (Deeply integrated resources)

    Using the DISCO protocol, the SciCrunch Data Federation provides the ability to drill down into individually hosted databases and data sets and return relevant content.  This type of content, part of the so called “hidden Web,” is typically not indexed by existing web search engines. 

     In order for SciCrunch to directly query these independently maintained databases and datasets, database providers must register their database or dataset with the SciCrunch Data Federation and specify permissions. Several interoperability capabilities are offered.

    SciCrunch integrated virtual databases integrate related data from multiple databases and combines them into one view for easier browsing. The following integrated databases are available for similar data types: 

    • Integrated Animal View : Indexes data from Zebrafish International Research Center, Strain reports from Rat Genome Database, Caenorhabditis Genetics Center, and International Mouse Strain Resource. [Full record]
    • Integrated Brain Gene Expression View : Indexes data from Gene Expression Nervous System Atlas, Allen Mouse Brain Atlas, and Mouse Genome Institute. [Full record]

    • Integrated Disease View : Indexes data from NINDS Disorder List and PubMed Health. [Full record]

    • Integrated Nervous System Connectivity View : Indexes data from Brain Architecture Management System (BAMS), Collations of Connectivity data on the Macaque brain (CoCoMac), BrainMaps, Connectome Wiki, the Hippocampal-Parahippocampal table of, UCLA Multimodal Connectivity Database, and Avian Brain Circuitry Database. [Full record]

    • Integrated Podcasts View : Indexes data from Brain Science Podcast , Nature Podcast, NeuroPod, Science Podcast, American Journal of Psychiatry Podcast, 60-Second Mind, Science Talk, Gray Matters, This Week in Science, Neurology Podcast, Montreal Neurological Institute (MNI) Podcast, Biointeractive, National Academy of Sciences Podcast, BrainPod, Science Talk, Royal College of Psychiatrists Podcasts (AKA- Let Wisdom Guide), The Guardian: Science Weekly, All in the Mind. [Full record]

    • Integrated Software View : Indexes data from Neuroimaging Informatics Tools and Resources Clearinghouse, Visiome Platform, Cerebellar Platform, Brain Machine Interface Platform, and Genetic Analysis Software. [Full record]

    • Integrated Video View : Indexes data from NIH VideoCasting and Podcasting, JoVE: Journal of Visualized Experiments, The Guardian: Science Videos, and Biointeractive. [Full record]

    • Integrated Jobs : Indexes data from Naturejobs, Monster, Indeed, Hays,, New Scientist Jobs, Science Careers,,, ScienceBlogs: Jobs, and It Takes 30. [Full record]

    • Integrated Blogs : Indexes data from Nature Network Blogs, Wired Science Blogs, The Guardian: Science, It Takes 30, Scientific American Cross-Check, Scientific American Bering in Mind, Research Blogging, CENtral Science, ScienceBlogs: Medicine and Health, ScienceBlogs: Brain and Behavior, ScienceBlogs: Life Science, Scientific American Guest Blog, Scientific American Observations, LabSpaces, PLoS Blogs, Daring Nucleic Adventures - genegeek, H2SO4Hurts - Brian Krueger PhD, Sciblogs, New York Times - Well,, Wired Science, and Genomes Unzipped. [Full record]

    SciCrunch simultaneously queries all the federated databases and datasets through its search interface. The results are displayed under the Data tab and are categorized by data type and nervous system level. In this way, users can easily step through the content of multiple resources, all from the same interface. 

    Each federated resource individually displays their query results with links back to the relevant datasets within the host resource. This allows users to take advantage of additional views on the data and tools that are available through the host database. The SciCrunch site provides tutorials for each resource, indicated by the "Professor Icon" showing users how to navigate the results page once directed there through the SciCrunch. Additionally, query results may be exported as an Excel document. 

    SciCrunch's full listing of federated data can be found here, or here,

    Note: SciCrunch is not responsible for the availability or content of these external sites, nor does SciCrunch endorse, warrant or guarantee the products, services or information described or offered at these external sites. 

    Data Ingestion Workflow

    The process to add a new resource to Data Federation requires the following sequence of steps.

    1. Registering a database

    The new resource must first be added to the SciCrunch Registry.

    2. Import data

    Data from the new resource is added through DISCO.

    3. Curation or view building

    The new data is curated into a view table using the concept mapper tool.

    4. Review

    After the view is built, it is deployed onto the beta website for review. An email should be sent out informing other curators, and if applicable, the data owner, of the new view and providing the specific link to the beta website. The curator should wait about 3 to 4 days for other people to provide feedback and make necessary changes.

    5. Release

    Each newly curated view is approved for production by a curator. The process of deploying a new view to production (which is done by IT personnel) occurs on Friday evenings, so the new data should be available on Monday mornings.

    6. Posting to Social Media

    After new resource is released on SciCrunch, updates are posted to social media accounts, including Twitter, Google+, Facebook, Youtube, a weblog, and a live feed.

    Registering a database

    If you are a data provider and would like to make your database or dataset available through the SciCrunch, we are happy to work with you.  The SciCrunch data federation tools are designed to be easy to use and require minimal effort from the data provider.  SciCrunch can work with many types of data resources, regardless of underlying technology.  If you have a data set that you would like to see registered, please email

    This DISCO integration capability utilizes a data integration framework to knit independently maintained databases or datasets into a virtual data federation through registration of schema information and database views with the SciCrunch mediator. A concept mapping tool is available to map tables, fields and values to the NIFSTD ontology. Resource providers do not need to change their resource in any way and may control the content that is exposed to the NIF database mediator.

    Registration with the Scicrunch mediator will require technical knowledge of the database capabilities and network settings of the resource using a specialized tool available through SciCrunch. The mapping of content to the NIF vocabularies is performed by a domain expert in consultation with a database administrator and is accomplished using the Concept Mapping tool.

    Advantages of exposing your data through the Data Federation include:

    1. Mapping to the NIF vocabularies provides the means to provide a standardized terminology and also to search through the relationships contained in the NIF ontologies.

    2. Data within a source database can be combined with that from other databases by defining an integrated view across databases.

    3. Through aggregation of many resources, creating a large, dynamic virtual source increases the exposure of the content of individual database resources.

    4. LinkOut  your database content to Entrez Databases (PubMed, etc.)

    5. Users are able to query across distributed databases as if they were a single database.

    View additional Benefits of Data Federation  

    To expose your data in the Data Federation, begin by registering your resource and creating a sitemap; then set up a consultation with SciCrunch by contacting the SciCrunch interoperability team.

    We are actively looking for Data Federation partners to work with to continue to develop the Data Federation tools. For more information, contact Anita Bandrowski at 

    Eligibility for Federation resources:

    1) Availability: The resource has already registered with SciCrunch Registry.

    2) Database / dataset format:  The resource should contain database or datasets that can be exported as tables.

    3) Public:  The data must open to public and accessed by the community without extra requirements.

    4) Voluntary: The resource owners agree to make their data public through the SciCrunch Data Federation.

    SciCrunch Data Federation Registration Workflow

    To register your resource at a deeper level (SciCrunch Data Federation):

    SciCrunch Interoperability Best Practices 

    SciCrunch is working to establish a set of best practices for resource providers to enhance the ability of a resource to interoperate with the SciCrunch and with each other. We are using our experiences with the Level 2-3 integration tools and literature indexing to highlight known issues. (Useful background reading.)

    1) Stable identifiers 

    We have noted that some databases and vocabularies use identifiers that get regenerated every time the resource is updated. This practice makes it very difficult for SciCrunch to maintain appropriate indices and links. We recommend that identifiers be stable; if they are to be removed, they should be made obsolete rather than deleted.

    • URI's: SciCrunch has been following the discussions on Universal Resource Identifiers (URI's) that has been going on in the Semantic Web and other communities. SciCrunch will act in an advisory capacity to the Common Naming Project of NeuroCommons, which seeks to provide a URI method for online data resources. Should this standard receive support from the community, SciCrunch will adopt it.

    • Related to this is the use of sessions to retrieve data pages instead of stable URI's. Under this practice the application allows a user to access data only in a linear manner, i.e. the main page showing the cerebellum must be accessed before any of its' subcomponents. Each session generates a temporary pointer or 'session identifier', which makes it difficult for a system such as SciCrunch to make use of much of the specific data elements inside of resources like Brain Info.

    2) Using common terminologies 

    Using a shared terminology solves so many problems, particularly if we follow the OBO recommended practice of re-using existing terminologies (and their identifiers), rather than creating new ones where we have to maintain mappings all over the place.

    3) Providing clear and consistent machine- and human-understandable definitions of concepts 

    For example, if a resource groups data according to cortex, I should know the definition of cortex and a machine should be able to use that definition in a call

    4) Keeping track of versions in a consistent and clear manner 

    Versioning: know that the issue of how to handle versions comes up all the time in the ontology world. I think that everyone recommends that we have one URI for the current version that always points to the latest release, but that earlier versions exist at a URI which lists the version number in it, so that if someone requires a particular version, they can get to it.

    5) Data Integration Best Practices 

    SciCrunch currently has 2 methods methods of data integration: The first method of integration, known as Level 2.5, involves using a series of mechanisms that allow connecting the resource with the SciCrunch Mediator. The other method of integration, known as Level 3,  does not require these mechanisms since the database connects directly to the SciCrunch Mediator. In fact some relational databases could be shared this way even without Web presence associated.

    A. Level 2.5 Best Practices

    Essential elements:

    1. Query interface: A current Web interface that allows querying of the database. (Best design practices for these interfaces will be elaborated below)

    2. Interface Definition Language: Protocol specification (e.g.: disco.nif.interop.3.1) to expose the database schema, and mappings of their elements to the interface described above. This file will provide the information necessary for automated Mediator registration.

    3. Metadata mappings: Protocol specification (e.g.: disco.nif.lexicon.3.1) that exposes List of local database terminologies (Lexicon) with mappings to standard terms (e.g.: NIFSTD). This file will provide the information necessary for TIS mappings.

    Best design practices for Querying Web Interfaces for Level-2.5 


    • Should accept queries using HTTP/XML GET or POST

    • Designed as Web Services

    Design: It is SciCrunch intention to reuse and enhance the Mediator technology to integrate structured data on the Web that is not accessible via standard RDBMS objects. For the data on the Web to be efficiently "relationalized" the following points must be met:

    • Query interface should allow multiple parameters. Not just a single box like "google"

    • Allow Boolean operators. This allows SciCrunch systems to fabricate a specific query when an end user formulates a multi keyword question

    • Define output parameters. Will minimize the bandwidth and data filtering by the Mediator.

    • Provide results in easy to parse formats (XML, delimited text, XHTML, well formed HTML, ...)

    B) Level 3 Best Practices 

    This is based on our current experience registering these types of resources.

    • Allow domain level access rather than IP level access through the firewall to a level-3 Site, for example allow rather than 

    • Allow domain level access rather than IP level access through the firewall to a level-3 Database, for example allow rather than

    • Database access privileges should not be dropped during a database maintenance.

    • Make the database drivers with the right version available along with the documentation and sample code.

    • Provide a technical and non-technical point of contacts.

    • Create a view on the tables you want to make available for access.

    • Create a user interface view that you would like to show to the users when records from your database are returned in the search result.

    • For general information results and data should be accessible using a static (i.e. non session based or stateless) URL.

    • If you are developing new databases

    • Make sure you use standard terminology for example use Male and Female for sex rather that M and F or 1 and 2.

    • Make sure you use standard neurolex terminology to represent neuroscience terms in the database. For example use Cerebellum rather that CBM.

    • Use proper integrity constraints between database tables.

    6) Database design and Use  
    7) Level-3 Documentation Meeting Notes 

    In order to create documentation for level 3 resource registration the following questions need to be answered precisely.

    What is the presentation of the data that is going to be understandable to a novice user?

    1. Define presentation view for level 3 resources
      1. Put in the registry the information the provider wants to expose
      2. Some exposed data objects should lead to products/focus/...other actionable information within the resource
    2. Relating individual data objects from each source to the category properly
      1. how to map conceptual information to schema information to facilitate conceptual queries *CCDB, gene network, sense lab cell properties database all give data on molecule in brain region
      2. Hop Skip & Jump (HSJ) query => how do I navigate from our result to other sources??
    3. Term mapping 
      1. It is imperative to let SciCrunch know about the metadata tags within each database. For example, CCDB is a database of images and several columns fill out the metadata for each image including cell type and brain region. It is important that SciCrunch knows that each image, which is stored in column 5 is a cell with a name that is stored in column 4 and brain region that is stored in column 3. This way SciCrunch can give precise results, such as showing the image of a pyramidal cell if the query is "pyramidal cell", or show many cells if the query is "cell substantia nigra". 

    Specific comments for BrainInfo - Expose the preferred label and where is it filed.

    8) Intellectual property issues 
    9) Guidelines for entity mapping and linking data.

    SciCrunch should facilitate interlinking of data, literature and tools wherever possible.  When any entity has a unique ID, SciCrunch should provide the ID in its views.  The following practices should be implemented for all sources:

    • Rule #1:  All references to identifiers need to be bidirectional!!!!!!!!!!!!!

    • Rule #2:  Use of identifiers for external references

    • Form of Identifier:  Source:ID, e.g., PMID:000000000

    • Literature citations:  Where possible, add references as PMIDs to each record. When PMIDs not available, then use DOIs.

    • External database/data set reference:  Add database identifiers like DatabaseShortName:identifier (e.g., GEO:GSE12345)

    • Organism:  "Organism" is the preferred name for any column containing organism name, with the exception of human subjects where not deemed appropriate.  For any other organism terms such as species, animal, host, use organism (unless this is not sufficiently specific). Use the NCBI tax id as identifier. If the animal is a transgenic then add the identifier from the species specific database (e.g., MGI identifier).

    • Brain region:  For any brain region, use the NIFSTD label.

    • Cell:  use an existing cell type listed in the NeuroLex

    • Protein:  For any protein, use PRO ids.

    • Small molecule: use ChEBI ids.

    • Gene:   Use Gene Symbol and Gene ID for Genes.  Gene name and probe identifier should not be used through the SciCrunch interface, although they should be searchable through the view.

      • The following SciCrunch annotation standards should be used for age, expression level and treatment paradigm

    .1.    Adult organism

    .2.    Adolescent organism

    .3.    Juvenile organism

    .4.    Newborn organism

    .5.    Infant organism

    .6.    Embryonic organism

    .1.    Increased expression

    .2.    Decreased expression

    .3.    No change in expression

    What is DISCO?

    DISCO is an information integration approach designed to facilitate interoperation among Internet resources. DISCO was initially developed by Dr. Luis Marenco at Yale University for the Neuroscience Database Gateway and is currently being extended in the context of SciCrunch. DISCO consists of a set of tools and services that allows resource providers who maintain information to share it with automated systems such as SciCrunch. SciCrunch is then able to “harvest” the information and keep those sets of information up-to-date. DISCO facilitates the automated maintenance of several distinct capabilities using a collection of files 1) that are maintained locally by the developers of participating neuroscience resources and 2) that are "harvested" on a regular basis by a central DISCO server. This approach allows central SciCrunch capabilities to be updated as each resource's content changes over time. DISCO currently supports the following capabilities: 1) resource descriptions, aka sitemaps, 2) "LinkOut" to a resource's data items from NCBI Entrez resources such as PubMed, 3) Web-based interoperation with a resource, 4) sharing a resource's lexicon and ontology, 5) sharing a resource's database schema, and 6) participation by the resource in neuroscience-related RSS news dissemination.

    Resource Description (Sitemap)  - 2 functions 

    This DISCO capability allows you to create and manage your own sitemap. An XML-based script provides a wrapper around a website that allows SciCrunch to search for key details about the web site and some information about dynamic content. An advantage of a sitemap is that the content is dynamically updated from the source file,  ensuring that all content is up to date.  SciCrunch provides a user-friendly tool for generating the necessary XML files.

    The benefits of a sitemap are that it will keep your SciCrunch Registry description up-to-date and will inform search engines about your resource. Many formats can be ingested, such as the native "disco.rd", the SciCrunch "disco.rd.nif.rdf" and the National Center for Biomedical Computing's Biositemap formats. 

    If you do not want to manage your own sitemap, DISCO will manage it for you. Just click on the "Click here to generate a sitemap" link and you are done.

    Generating a sitemap is also the second step of registering your resource at a deeper level to be included into the SciCrunch Data Federation. Clicking on the "Click here to generate a sitemap" link (see image below) will generate an entry into the DISCO Dashboard. See "Create an Interop File to share your data through SciCrunch's Data Federation" for the next step.

    Create a DISCO Resource Description (Sitemap) File
    1.  Register Your Resource with the SciCrunch Registry (if it is not already) (if it is not already)
    2. Generate DISCO Resource Description (Sitemap) file
      1. Once the resource is curated (within 7 days or email for faster service), search for your resource in NeuroLex, click on your resource's link, and create an account / login (upper right).  

      2. Below the main information (surrounded by a box), and just below the “Curated” tag, you will see the, "For Resource Owners" section. If you are indeed the resource owner, click on the "Click here to generate sitemap" link.

      3. Follow steps 1 - 4 to complete the process. If you add your Technical Contact information, step 2 will be generated on the following page.

     You may edit or append these files at any time from the DISCO Dashboard. Just click on your resource's name and then the [Edit] button. Alternatively, you may go through the process above again and simply replace your old files with the new ones.

    Create an Interop File to share your data through SciCrunch's Data Federation

    The Data Federation provides the ability to drill down into individual databases and data sets and return relevant content. This type of content, part of the so-called "hidden web," is typically not indexed by existing web search engines. SciCrunch will work with your resource in its current form using tools to integrate it as thoroughly as we can with little to no hassle on your part. The process is quite easy and you won't have to change anything to participate.

    The types of resources that we are currently working with:

    • Database with query API

    • Database with web service

    • Database dump

    • XML data

    • Structured web pages without API (e.g., HTML)

    • Unstructured data files in several formats (Excel, PDF, etc.)

    • RDF 

    File format for Interoperation

    DISCO describes the interoperation with a resource and supports any type of interface description language (IDL). For SciCrunch, specific DISCO web interoperation and database schema formats have been defined and continue to be revised to accommodate new interoperation scenarios.

    Web Interoperation: This DISCO capability provides information necessary to systematically extract portions of data in semi-structured web resources. 

     Database Schema: This capability provides information on your database's schema and identifies the specific fields in the database to be shared. 

     Click on the links below to view samples of files:

     Integration of your data with SciCrunch's federated data is made possible by placing a copy of the information you provide in central SciCrunch Mediator servers - or indexing your database or web service directly. Federated data can then be searched by SciCrunch queries and presented to SciCrunch users with links back to the originating site for additional information.  

    Upload interop File to DISCO

    A DISCO account is required for uploading any interop files.

    1. Log in to your disco account
    2. Go to resource detail page. (e.g.
    3. Click "edit" button on the right middle of the page.
    4. Click "Add New Service" button at the bottom
    5. Chose the "interoperation" option from the pull down menu.
    6. Chose appropriate format for your interop file from pull down menu.
    7. Chose an desired upload method from DISCO Information section at the bottom.
    8. Click "Process" button at the right bottom corner.
    9. Go back to the resource detail page. (e.g.
    10.Click "update" button besides "Local Version".

    What is a LinkOut file? 

    The National Center for Biotechnology Information (NCBI) has implemented a capability called "LinkOut" that allows users of NCBI Entrez (who might for example be looking at an article in PubMed) to link to related information in resources external to the NCBI.

    Entrez LinkOut is a DISCO capability that provides a mechanism to collect resource's data links related to Entrez objects and forward them to NCBI. NCBI users will find these links when using the Entrez LinkOut feature. 

     These LinkOut files provide links between PubMed, or other NCBI databases available or linking (e.g., Gene, Protein, Nucleotide, etc.), and your data when you register to the SciCrunch Data Federation through the LinkOut Broker. To enable this feature, your data must include PubMed ids (e.g., 12345, not Bob et al, 2010).

    Create a LinkOut file

    There are three ways to generate linkout files:

    • Using the LinkOut's data generation tool. This tool allows the data to be entered into an Excel spreadsheet and then automatically converted to DISCO files. Instructions are provided in the tool's help page. The latest version of this tool can be downloaded at

    • Generating LinkOut files by hand or by programs:

    • Go to DISCO production server
      • Login as admin
      • Locate resource in dashboard
      • Go to resource’s page, by following its short name’s link
      • Select [EDIT] disco information
      • Click on [ADD New Service]. For the new DISCO Service, under Type select “Entrez LinkOut”, under URL you type some XML file name (e.g.:”linkoutsql.xml”), and under Format select “disco.linkout.sql” . In the lower bar right to DISCO Information, select “Update DISCO local” radio button.
      • Save the information by pressing [Process], and you’ll get the confirmation
      • Press [Done] and you’ll be back on the resource’s dashboard page.
      • At this moment the new DISCO information is in the local site. We need to update the information on the Server. For that we press the [Update] button besides the local Version. And we get the next screen.
      • At this moment we have an empty disco.linkout.sql file. Press the [View] button besides.
      • Then press the [Edit] button
      • Copy and paste this initial code template (see following description for linkout xml file format)
      • Back in the linkout dashboard page. We need to verify we can retrieve the data before it is submitted to NCBI. For this you check “Import” and “Include LinkName to Entrez URL name”, click [Process], and confirm.
      • You get confirmation screen, ad email is send out.
      • After this you verify the information by clicking Show in Html and verify the information and links.
      • Assuming everything is Ok; as it seems. Then you check Transfer to NCBI and press {Process] again. You may want not to Import the data again (uncheck Import).We’ll get a confirmation email. Data will be available at NCBI in 48 hrs.
      • If there are problems with the the data import or the linkout file, check the import only status to disable transferring links to NCBI.
    The linkout files format

    The DISCO LinkOut format is encoded in XML format, as described below.

    The root node should contain the following information:

    <disco format="disco.linkout" format-version="1.0">

    ... the first node contains brief site information (e.g.:)

    <site-info site-name="CCDB" />

    ... the second node contains brief site information (e.g.:)

    <technical_contact name="Willy WaiHo Wong" email="" /

    ... the next node is the required data container node:


    ... and inside this node one or more child "<linkout>" nodes are used to describe one LinkOut item at the time. The content in bold in the XML sample below shows the type of content that can be provided from a resource.




      linkcategory="Electron microscopy product"

      linkname="Development of a model for microphysiological simulations: ... electron tomography"



    The contents of the attributes of the <oid> node above may contain the following information:

    • db: An NCBI database name (e.g.: "PubMed", "Protein", "Nucleotide", etc). For the complete list of these databases, refer to the databases available for linking list on the NCBI's LinkOut documentation Web site.

    • oid: An Entrez object ID of an item in that database (e.g.: "15988042").

    • linkcategory: A short phrase used to categorize the link. This phrase is used to group similar links from different resources. More details about how to make an appropriate linkcategory is described below. (This integration capability is only available within the LinkOut server's Gateway tool, and not present on Entrez.)

    • linkname: The description of the link. This name is used by Entrez to describe the data linked from a resource. It should be concise and NCBI does not allow using their object names on this text.

    • linkourl: The URL in your resource associated with this information.

     In addition to this LinkOut format, your resource needs a main "disco.xml" to identify  its location. That file and its purpose is described in DISCO dashboard page ( For more information or help regarding this format please contact the NIF Interoperability team. 

    Implementing DISCO Web Interoperation with data that includes LinkOut information. For resources registered with DISCO Web Interoperation (SciCrunch level 2.5 integration), SciCrunch developers may be able to create data views to extract LinkOut data for that resource.

    Linkcategory types

    Per National Library Service request, the categories that SciCrunch uses have been standardized to the following types:


    Resource: Registry

    Resource: Software

    Reagent: Adenovirus

    Reagent: Antibodies

    Reagent: Cell Line

    Reagent: Plasmid

    Data: Activation Foci

    Data: Animal Model

    Data: Brain Anatomy

    Data: Brain connectivity

    Data: Cell Model

    Data: Clinical Trials

    Data: Chemical

    Data: Chemosensory receptor

    Data: Computational model

    Data: Disease Annotation

    Data: Electrophysiology

    Data: Gene Annotation

    Data: Gene Expression

    Data: Images

    Data: Interactions

    Data: Microarray

    Data: Neuronal properties

    Data: Neuronal reconstruction

    Data: Protein Expression

    Data: Taxonomy

    Data: Transgenes

    Data: Value observation

    Data: Volumetric observation

    Please modify your linkcategory into one of following types based on your data characters. If there is none suitable for your database, please inform us and we can discuss a possible solution together.

    Generating LinkOut files for PubMed Central Europe
    • Open pgAdmin
    • Connect to DISCO_CRAWLER database
    • Open new query window
    • Paste the next query, change accordingly the information in yellow below

    select oid, resource_id, 'Cell Centered Database', 'MED', entrez_oid, link_name, link_category, link_url

    from disco_entrez_object where resource_id='nif-0000-00007'

    • Push the button “Execute query, write results to file”. Name the file accordingly to the resource id (e.g.:‘nif-0000-00007.csv’)
      Note: uncheck Column names checkbox.

    • Open the attached python script. Change the filename as highlighted below and execute.
      This will transform the csv into the linkout Europe xml format. One more thing:

    • Open the XML file and add the encryption line above.


    Once done copy the XML files to PubMed Central Europe FTP site.

    Follow their special instructions carefully. You may need to compress some of these files. 

    # -*- coding: utf-8 -*-
    Created on Mon Oct 27 21:50:00 2014
    @author: lmarenco
    Converts (linkout-resource.csv) file FORMAT into (linkout-resource.xml) pubmed XML format
    import csv, sys
    from xml.sax.saxutils import escape
    filename = 'nif-0000-00007'
    from lxml import etree
    root = etree.Element("links")
    with open(filename + '.csv','rb') as f:
    	reader = csv.reader(f)
    		for row in reader:
    			link = etree.SubElement(root, "link")
    			link.attrib["providerId"] = "1474"
    			resource = etree.SubElement(link, "resource")
    			title = etree.SubElement(resource, "title")
    			s = row[5] + ' '
    			s = s.decode('utf8') #s.encode('ascii', errors='xmlcharrefreplace')
    			title.text = row[2] + " - " + row[6] + ": " + s + " [provided by SciCrunch]"
    			url = etree.SubElement(resource, "url")
    			url.text = row[7]
    			record = etree.SubElement(link, "record")
    			source = etree.SubElement(record, "source")
    			source.text = "MED"
    			id = etree.SubElement(record, "id")
    			id.text = row[4]
    	except csv.Error as e:
    		sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
    with open(filename + '.xml', 'w') as fw:
    	fw.write(etree.tostring(root, pretty_print=True))
    #print etree.tostring(root, pretty_print=True)

    Advertise your terminology or ontological information.

    This capability facilitates semantic data integration with the resource. A list of terms used by the resource (with mappings to standardized terms) is defined for use by SciCrunch central servers. Without this functionality term mappings have to be made manually by a resource curator or knowledge integrator. 

    DISCO terminology allows free use of any format (XML, RDF, OWL, etc) to provide mapping between terms and meta/data elements in a resource.

    Check the DISCO dashboard Terminology summary page for current resource examples of these formats.

    Please contact the NIF interoperability team for more information.

    Share your resource's news with the SciCrunch community.

    Coordinates reporting of important changes in the resource to interested users through RSS feeds. This can also be done through the RSS wiki page.

    Newsfeeds are traditionally represented in RSS and atom formats. DISCO provides a mechanism to refer to news in a resource using the main DISCO file. View the SenseLab news.

    Also, see the DISCO dashboard News summary page for current examples of these formats.

    Please contact the NIF interoperability team for more information.

    DISCO Dashboard

    The Dashboard is the place for resource owners interoperating with SciCrunch to manage their resource. This includes editing or appending files and setting the crawl frequency.  Additionally, the dashboard provides general information about the resource such as the status of the resource (parsed), where the DISCO files are stored (locally at SciCrunch or remotely at site), and which services each resource is participating in. Clicking on a particular service, e.g., Resource Description (AKA Sitemap), retrieves a summary of participating resources. 

    Managing resources in DISCO 

    Data updating

    The updating frequency is based on how often the data come in and the original resource get updated. There are three updating frequencies currently, which include weekly, monthly and three monthly.

    Setting Flags in DISCO


    1.NOCRAWL: Setting on/off the NOCRAWL flag on an entire resource.

    For NeuronDB, as shown below in the DISCO’s data source dashboard, a NOCRAWL flag shows as a red VCR stop icon




    Open the interop DISCO file, and in the “interfaces” node, add the optional attribute “data-crawl-status”, place any text that explains why the data should not be crawled.

    See below



    Save the interop file. Resource status will change when saved or if the file is remote the next time the system tries to recrawl.

    DISCO should inform that an attempt to import data from a nocrawl site has been attempted and stopped.

    2. ARCHIVE: Specifying an entire resource as an archived one

    ResearchCrossroads, an archival resource, shows a grey file cabinet icon.


    To make that change, open the interop DISCO file, and in the “interfaces” node add the optional attribute and text data-update-type=”archive”,

    See below. Save the interop file


    Concept mapping tool

    The Concept Mapping tool is used by SciCrunch curators to manage Data Federation resources, including setting up the table and view, as well as the column definitions and mapping. The tool also allows for the database column contents to be exported to Google Refine for the mapping of the concepts and provides back-end support services. 

     The concept mapping tool is accessible at

    Registering a new source

    Open the URL, enter your user name and password.


    On the sources list page, sources can be filtered by searching NIF ID, source name or updated time etc. Clicking the “New” button on the left bottom corner, you can add a new resource into the SciCrunch Data Federation.


    Defining a new source

    If you are adding a new source, you will see the screen below. Enter the new source name (See naming rules below), SciCrunch ID, connection information, description, etc. The source type will always be Relational DB.

    The description of the resource should be relatively consistent with that of the SciCrunch Registry; however, it may need to be shortened and more brief until the SciCrunch Registry snippets can be implemented. For the most part, the length of description should be less than 2 lines and should link to the main resource.

    The connection information include data location URL, user name and password. 

    Creating an eligible source name
    1. Source name should be consistent with the resource name/database name.
    2. No dash, comma, or other punctuations and special characters.
    3. Source name should be less than 30 characters (including spaces), if it is too long, use the abbreviation instead. For example, a resource called “Avian Brain Circuitry Database” should be named “ABCD.”

     When finished editing, click “Done”.


    Creating a new view

    There are two options for creating a new view.

    Option1: Clicking the “New” button on the left bottom corner, you can add a new view for an existing source by selecting the view source, defining the view name and schema name. Click 'create', when you are done.

    Option 2: Copying and transfering the information from a similar existing view. Click "from Template" , and then select the template source and view, and defining the new view name and select the new source. Click 'OK' when you are done.


    Defining the view
    Basic Information

    Begin by entering the Basic Information of the new view, including the name, alias, source, schema name, description and category (See below). 

     The name here refers to the index name of the view, which should not be the same as the source name. The index name briefly describe the characteristics of the source. e.g., the view name for BMAS is “BrainRegions”, while the source name of BAMS is ‘BAMS’. Therefore, the whole display name of the resource we see in data federation will be “BAMS:BrainRegions”, which is the combination of both source name and index name. 

     The “alias” is always the same as the name, select “owner source” from the pull-down menu. 

     Schema name is the acronym of the source name in most cases. 

      Description will be a paragraph briefly stating the information of view, such as the purpose the the view, the content of the view etc, and this will be displayed on SciCrunch website.

     Set the resource’s “Indexable” radio button to “Yes if this is a view.

    At the bottom of this page, click “Add Category,” and fill in the child category and parent category for the particular view separately (See available categories below). A resource can be assigned to multiple categories - up to three. The category has to precisely match the content of the data. 

    SciCrunch Data Federation categories (Parent categories are bolded):

    Type of Data






    Brain Activation Foci

    Clinical Trials

















    System Level

    Gross Anatomy





    When finished editing, click “Done”.

    Create the view definition

    Click “View Definition” on the left menu, and then write sql to crawl the data from a specific resource.

    SQL is a standardized query language for requesting information from a database. Here are some examples:

    “  select  



           trim(trailing '"' from e_name) as e_name,


    round(value_mean) as value_mean,

    round(value_sd) as value_sd,





           from l2_nlx_151885_data_summary


    a.e_uid, (Note: e_uid should always be included in a view)

    a.eudract_number as id,


    a.full_title as title,

    '' as official_title,

    'Country: '|| as recruitment,

    a.medical_condition as conditions,

    '' as intervention,

    a.sponsor_name as sponsored_by,

    a.gender,a.population_age as age_groups,

    '' as phases,

    '' as study_type,

    '' as brief_summary,

    case when b.level is null then ''

    when  b.level like 'LLT' then 'MedDRA levels: Lowest Level Term'

    when b.level like 'HLGT' then 'MedDRA levels: High Level Group Term'

    when b.level like 'HLT' then 'MedDRA levels: High Level Term'

    when b.level like 'PT' then 'MedDRA levels: Preferred Term'

    else '' end  as detailed_description,

    cast (a.start_date as varchar) as date,

    'nlx_151313' as other_ids,

    ''|| a.eudract_number as url 

     from l2_nlx_151313_clinicaltrial_summary a, l2_nlx_151313_clinicaltrial_summary_disease b

    where a.eudract_number like b.eudract_number” 

     You can always check the script by clicking “Check View & Update Columns” on the right hand side.    

    Helpful hints for writing good views:

     * Make sure the total number of rows is correct when joining tables, did you mean an outer join or an inner join? Go with outer join unless there is good reason to do something else, remember the data will update, your join may fail later even if it works now.

     * Make a reasonable attempt to standardize data to the NIF annotation standards: Increased expression not more staining, Adult not 4wks

     * Make a reasonable attempt to replace numbers or single letters that denote categories with category labels: Bad: 1, f, fem, symbol Good: Female

     * Add spaces where there are none in the data this will help the search Bad Author Names: "Cathy Mendelsohn;Xiantong Xin"  Good author names: "Cathy Mendelsohn; Xiantong Xin"

    When finished editing, click “Done”.

    Table mappings 

    The keywords identify terms or concepts that should be included in NeuroLex, and to keep track of keywords that are being used by curators. Click “Table Mapping” from the top menu bar to add/map keywords for the view.

    Rules for making keywords

    1. Mandatory keywords: A name of resource, abbreviation, data type, a resource ID and view ID as strings. If applicable, add terms as keywords if they are not specified in the column. E.g., if a view is human MRI and neither of these are specified in the columns,  they should be added as keywords If the view all relates to Diabetes Type 1 and there is no column for this, it should be added as a keyword.

    2. Optional keywords: the organization name.

    3. Every concept in the table is about the particular term.  (See annotation Standards below.)

    4. Add concept ID for the Ontology terms. The examples of ontology terms are including but not limited to organism, technique, anatomical region/structure, related condition, functional level. 

    When finished editing, click “Done”.

    Define the display

    Click the “Displays” tab on the top menu bar to set the display of column headers and the order. Click the "populate" button under "Display Templates", all the columns defined in the sql will be populated here. Under “Display Info” the column display name and column name in the sql. are filled in “Name” and “Primary Column” respectively, and the value of the column is assigned under template”; e.g., Name: Gene, Primary column:gene_name, Template: ${gene_name}. (Note the naming conventions below.) 

    The right “Columns” function will help curator to check if the template works well and shows which columns  are related to the defined column. By click "delete" under "Display Templates", one column or multiple columns can be deleted by selecting one colunm or multiple columns while holding "Ctrl" key.


     Column header naming convention  

    1  Uniform header for similar content across resources/views, such as gene/gene symbol

    2. Examples of uniformal headers are listed below:










    3. No numbers, special characters or punctuations are allowed to be part of column headers! If you include one the view will not be generated.

    4) Column headers must be unique for each column with one single review. 

    Note about html 


    If you're going to use characters in a text portion of your template

    that are specific to HTML markup (in this case just '<' and '>')


    please use the proper HTML escape sequences for them ("&lt;" and


    "&gt;", respectively). The data doesn't need to change and any actual

     HTML markup in templates should be left as is. For instance use:


    <em>${x} &lt; ${y}</em>


    instead of:


    <em>${x} < ${y}</em>


    Column order

    Click “Reorder”, the order of the columns may be modified by draging any columns up or down. (Note the column order conventions below). When finished defining, click “Done”.


    All sources should look and "act" the same to the extent possible.  Thus, each view through the SciCrunch should adhere to a consistent set of guidelines for creating these views.  The order of columns should be uniform and all services and views should adhere to these standards. The order of the columns should generally follow the same format so that users have a consistent experience when they change from source to source.  Obviously, the exact number of columns will change depending on the source.

    • Column 1:  Source database and link to source database if appropriate

    • Column 2:  Most important entity, i.e., what the database is offering or about:  gene, anatomical structure

    • Column 3:  Additional entities in order of descending granularity, e.g., anatomical structure > cell > subcellular structure

    • Column 4:  Phenotypes and other measures

    • Column 5:  Descriptions and comments

    • Column 6:  Literature citations or other external references

     Column number

    The number of columns should be less than 10 in order to achieve the best display effects. 

     Columns formatting 

    From the "Columns" tab along the top, double click any column to add/edit the Name, Alias, Data Type, Weight, Indexable, Facet, Key, and column mappings.


    1. Index: All columns, expect e_uid, should be marked indexable.
    2. Export: Both exposed and unexposed columns should be marked exportable, e_uid should not be exported.
    3. Weight: there are 5 levels of weight setting ranging from 0.5 to 4.0. The database name, the most important entity, is weighted as 4.0. The second important entity and various ids are weighted as 2.0. The description, notes or comments are weighted as 1.0. The URLs or least important content are weighted as 0.5.
    4. Facet: Facet data should be repeated many times, such as: disease, phenotype inheritance, gene name, etc.; however, URL and text are not able to be faceted. 
    5. Is key - In most cases, the key variable should always be ‘e_uid’. Therefore, the e_uid column should be set as ‘Yes” for “Is Key” value. 
    6. Column level mapping - SciCrunch is providing both column and value mapping to enhance the semantic search and to pave the way for export of the SciCrunch linked data graph.  The purpose of the column mapping is to set the ontological domain of the entities contained within.  Each of these domains generally corresponds to one of the NIFSTD modules:  organism, anatomical entity, cell, subcellular entity, molecule, function, disease, technique, resource.  We do not want to map the column at too granular level so as to avoid consistency problems with the contents.  At this point, we are also not mapping column roles.  So the fact that an organism serves as the subject of a study will not be reflected in the mapping:  any column containing an organism should be mapped to organism.  Similarly, even if a column contains brain parts, we will map to anatomical entity, to ensure that if the source later adds parts of the spinal cord or parts of the peripheral nervous system, that we will not be in conflict.  This policy may be revisited as the ontologies evolve.

    The current column concepts include:



    Catalog Number





    Full Text



    Gene-target Reagent

    Genomic Locus

    Genomic locus variant




    Interaction Type

    Molecular Domain













    Sub-cellular Anatomy

    Mapping rules

    Overall rules:

    1. One column can be mapped to multiple terms (e.g. publication and identifier).
    2. One column cannot be mapped to two of the same term (e.g. organism and organism)

    Specific rules for individual mapping terms:

    1. Description, notes and comments should be mapped to ‘Full Text’.
    2. Protein and Chemical substance should be mapped to ‘Molecule’.
    3. Gene allele should be mapped to ‘Genomic locus variant’.
    4. Ids such as e_uid, gene id, SciCrunch ids, PubMed ids and accession numbers should be mapped to ‘Identifiers’.
    5. Reference and  PubMed ids should also be mapped to ‘Publication’.
    6. Gene symbol and gene name should be be mapped to ‘Gene’
    7. Brain structure should be mapped to ‘Anatomy’
    8. Organism and spices should be mapped to ‘Organism’

    When finished mapping, click “Done”.

    SciCrunch Data Ingestion Quality Control

    Prior to releasing a new data view to production it is important to scrutinize it and all its components as a user would. A Data Ingestion Checklist has been assembled to assist with this and now includes a Quality Control section. Once all the checks are complete, should be notified.

    Entity mapping using Google Refine

    The Google Refine tool is used to create concept mappings of individual columns, and it is accessible at

    Common Issues are listed below, if in need of more info. go to for a full User and Developer documentation made by Refine.  (Warning: In Refine’s documentation not everything is listed such as all the components needed to add a new Web-Server and make it work.)

    In what format can Google Refine export and how can we import that data back into Google Refine if we need to update it?

    • Data can be exported in Refine through TSV, CSV, Excel, and HTML table. Excel is the perfect one in that you can easily import it back into Refine if needed to update the data. But if the data is like the HOMOLOGENE data-set (which is 321MB) then you cannot open it with Excel because of its size.

    • Also if you need to export it in a format that is not listed above, you can use their (Google Refine’s) process of Templating Exporter and add your own exporter

    The formats currently supported (in version 2.0) include:

    1. TSV, CSV, or values separated by a custom separator you specify

    2. Excel (.xls, xlsx)

    3. XML, RDF as XML

    4. JSON

    5. Google Spreadsheets

    SIZE of Data:

    Refine has been tested mainly with data ranging around 10 to 50 MB and all the process work fast and well, however, Refine was tested with data ranging from 100MB to even 321MB (which is the max. tested so far). Problems that erupt are the speed and process of making simple commands like Splitting Columns, Coping Columns, Reconciliation and etc...  

    If in need of using bigger files than 321MB:

    321MB is the approximate max. of data that can be used. A data file of 500MB was tested but an error appeared “"java.lang.OutOfMemoryError: Java heap space”. This can be fixed (not tested so far, but Refine claims it’s possible) Refine states that:

     “There is no hardcoded limit in how much data Refine can load... however, the

            underlying Java virtual machine (JVM) that runs refine starts with a fixed

            memory size limit. When the JVM runs out of memory (and yes, Refine eats

            *LOTS* of memory), weird things can happen and we're not exactly awesome in

            reporting errors from the server side of refine to the client running in

            your browser so what the client side might perceive as the load being

            finished, it's really the server side hitting the memory limit and giving


            Add more memory to the JVM: (follow these instructions here):  


    For keeping all data after the reconciliation process: Clone the column that is going to be reconciled in the Excel file and then do the reconciliation process for one of the columns. This is not-so important for image but is needed for further references of knowing what the data was and what it became.

    • To make an exact clone of a column go to the chosen Column and press “Edit Column” and then the “add column based on that column” command. Give the Column a name and an exact copy will appear next to the original Column.

    Adding a Web-Server: 

    By connecting your data with other databases, you get more value out of your data (Reconciliation)

    • Have FireBug

    • If you tried to plug in a bad Server, there is no validation in Refine so it doesn’t show anything and another server is unable to be added. You must open FireBug, go to the console and type “ReconciliationManager.standardServices = [ ]” Then add in a correct Web Server.

    The SERVER

    • must properly support JSONP

    • need to use the whole callback URL parameter, namely, "jsonp1311617874279", not just "jsonp"

    • jsonp1311617874279 is just something generated by Refine; it's random and different every time. In other words, when your server sees<whatever this is>it must replies with

    <whatever this is>({"schemaSpace":"","name":"NIFSTD Reconciliation Service","identifierSpace":"","view":{"url":"{{id}}"}})

    If having a problem:

    Post on the Issue site and one of the developers will get back to you as soon as possible.

    Extra info:

    “If you have data in a very peculiar text format, just import it without splitting lines into columns, and then once it's imported, do your own custom column splitting.” 

     “Once imported, the data is stored in Google Refine's own format, and your original data file is left undisturbed.”

    “You can also point Google Refine at a URL to a data file or a Google Spreadsheet. The mime-type of that URL tells Google Refine which format the data is in. Currently only Google Spreadsheets which are published publicly are supported.”

    “Fetching URLs From Web Services - grabbing from the Web more data related to the data you already have”


    1. Generate a data file, in csv, xls or format, by following Pavel’s instructions;

    2. Open the data file in Google Refine (figure 1);


    Figure 1: open a data file in Google Refine.

    3. (Optional) Once we choose a column to refine, copy the column to a new one. Click arrow next to the column header, select “Edit column”->”Add column based on this column” (figure 2);


    Figure 2: copy column


    In the pop-up dialog, enter a new column name, choose “keep original” or “Set to blank” for “On error”, then click OK (figure 3);


    Figure 3: copy column (2)


    4. For the column we want to refine, click its column header, select “Reconcile -> Start reconciling” from the menu (figure 4);


    Figure 4: Select “Start reconciling…” from menu


    5.  Add NIFSTD Reconciliation Service if we haven’t done so. Click the “Add Standard Service” button at the bottom of the pop-up window. Enter NIFSTD Reconciliation Service URL ( in the dialog (figure 5); If the NIFSTD Reconciliation Service already exists in the left panel, just select it.


    Figure 5: Add Ontoquest Reconciliation Service

    6. Select “Reconcile against no particular type”, then click “Start reconciling”. If you want to reconcile against a particular type, select the type from the list of “Reconcile each cell to an entity of one of these types:” (figure 6);

    Tips: If the type list is not available, do “Reconcile against no particular type” first, then clear the reconciliation result, and reconcile against a particular type now. Usually, the types become available at this time (figure 7).


    Figure 6: Reconcile against no particular type


    Figure 7: Reconcile against a particular type

    7. View results (figure 8);


    Figure 8: After reconciliation

    8. Export data by clicking “Export” on the upper-left corner, then choose the desired format (figure 9).


    Figure 9: Export data

    9. Import the results back to concept mapping tool (Figure 10).

    Go to the main page of concept mapping tool, click the import button on the top menu, and choose ‘load reconciled  CSV’. 

    (Figure 10: Import data to concept mapping tool) 

     Then select the documents that you downloaded from google refine, click ‘OK’. The new mapped CSV file will be imported to concept mapping tool now. 

    Vocabulary Source Creation

    Vocabulary sources, like other sources require a view to be created, see view creation above, but the difference is that the vocabulary source should have a defined columns.

    When the source has been selected, several columns need to be defined.

    Provider: This is the ontology or vocabulary source. For NIF, Ontoquest in this example serves the NIFSTD ontology, so NIFSTD is the provider of the data.

    Category: This column can be filled in if the source view has only one type of term, for example NCBI Gene as a vocabulary source has genes or the registry has only resource names, but can be left blank and defined by the Category Column if the category of each term is specified in the view, for example NIFSTD has both organisms and biological processes from two ontolgoies, these categories are specified in a column.

    Synonym delimeter: The character that separates the synonyms, if multiple synonyms exist in the data

    ID Column: The column that specifies the identifier of the term.

    Term Column: The column that specifies the label or preferred label of the term.

    Definition Column: The column that holds the definition of the term.

    Category Column: This is a category assigned to the terms in this part of the vocabulary source. For example, if the vocabulary source contains organisms and diseases the column that specifies these.

    Inferred Column: The column that specifes whether the term is inferred or standard. Inferred terms expand to not only synonyms, but also other relationships. For example GABAergic neuron is defined as any neuron that releases GABA, a class of neuron which changes if the neurotransmitter GABA is found to be released by a new cell type.

    Abbreviation Column: The column that contains the abbreviation of the term.

    Acronym Column: The column that contains the acronym of the term.

    Synonym Column: The column that contains the synonyms of the term.

    Make sure to save your mappings!

    *Note, DISCO currently does not control the indexing process of vocabulary sources, so the definition has to end with an email to the systems team to start indexing manually.

    Annotation Standards

    Current NIF annotation standards are maintained in the Neurolex Wiki under NIF Annotation Standard. They should be used when mapping in Google Refine. 

     NIF annotation standard for age classification

    1. Adult organism
    2. Adolescent organism
    3. Juvenile organism
    4.    Newborn organism
    5.   Infant organism
    6.  Embryonic organism

      NIF annotation standard for expression level

    1.   Increased expression
    2.   Decreased expression
    3.   No change in expression

       NIF annotation standard for treatment paradigm

    Additional SciCrunch Curation Policies


    NITRC integration policyNITRC Integration Policy
    DISCO and BiositeMaps Integration Policy
    Level-3 Database Licenses

    Curation How-To's

    Redirecting pages in NeuroLex

    Known Curation Issues:

    Link to curation issues page

    Keywords, that are not in NIFSTD:

    Link to Key Words, that are not in NIFSTD

    Resource Hierarchy:



    • No labels

    1 Comment

    1. Comments from Xwiki

      aarnaud says:
      Should we add de-identification software?
      -2009-02-20 16:53:43.0

      AnitaBandrowski says:

      1. have added the last sentence, is this appropriate??

      Clinical Medicine Knowledge Base - A collection of clinical biomedical subject data - individual records or reduced information - arranged according to a structured semantic framework. Includes diagnosis, treatment, and/or outcomes information. Note, studies that are ongoing and/or in the beginning phases thus still recruiting patients are also included in this resource type.
      -2009-02-23 17:05:35.0

      aarnaud says:
      What about this for Clinical Research (Translational) Knowledge Base? I didn't know what to do with the reference to synonym, so I left it out.

      Clinical Research (Translational) Knowledge Base - A collection of clinical biomedical subject data - individual records or reduced information - arranged according to a structured semantic framework that harnesses knowledge generated during basic research in the laboratory, and in preclinical studies, and is applied to the development of trials and studies in humans. The adoption of best practices in the community, cost-effectiveness of prevention and treatment strategies are also an important part of translational science and are a part of this collection.
      -2009-02-27 12:37:08.0

      AnitaBandrowski says:

      1. removed the word medicine from the term "Clinical knowledge base" because it was redundant.
        -2009-03-03 15:19:42.0

      cfn says:

      1. like the changes noted above. I'm reviewing the list as a whole, and will make a few suggestions today to see if I should make the changes on line.
        -2009-03-04 09:32:28.0

      AnitaBandrowski says:

      -2009-03-12 09:54:52.0

      AnitaBandrowski says:

      1. have edited the biomaterial resource categories that follow: I added the word bank to tissue, as tissue is a different object from a tissue bank and we really do mean tissue bank in the context of the onological structure here. Again Organism was changed to organism repository for the same reason and reagent to reagent provider... These changes should make more explicit the meaning of the terms that we use when used out of context of the ontological structure.
        o Tissue bank
        o Organism repository
      • Reagent provider
      • Instrument provider
        -2009-03-12 09:57:48.0

      aarnaud says:
      Consider the title, Translational Research Knowledge Base, and making it a child of Clinical Knowledge base.
      -2009-03-13 10:20:32.0

      aarnaud says:

      1. think the example for reagent provider, under material resource, should directly sell reagents.
        -2009-04-02 15:40:22.464

      memartone says:

      1. changed "Neurolex" to "NIFSTD" because we will be including these terms in Neurolex (indeed, they should all be there), but they may not yet be in the NIFSTD ontologies except in the "unclassified" module.
        -2009-04-16 01:42:41.336

      AnitaBandrowski says:
      We need a couple more training materials:

      1. quizzes - used as educational aids.
        2. instructional software (this may belong in the software category, we should discuss) - there are pieces of software written that have an extensive teaching with this software section. These should be included.

      -2009-05-25 14:59:30.29