Blog from May, 2012

As of May 28, 2012 - 5:40 pm PDT
  • EMAIL
    • Delays of up to 12 hours are still being seen on some messages
  • CAMERA
    • Oracle down
    • Portal is up, but some services may be unavailable
  • CCDB
    • Online and fully operational to the best of our knowledge
  • Cell Image Library
    • Online and fully operational to the best of our knowledge
  • NIF
    • Portal is up, but some services may be unavailable
    • load balancer is bypassed and both neuinfo.org and www.neuinfo.org have been redirected to nif-apps1
    • blog is down and the blog box has been removed from the NIF front webpage temporarily to speed loading of the page
  • NITRC
    • Butch emailed that noble was down.  It was rebooted and NITRC is online.
  • WBC
    • Dave Little reported dev and stage servers are down

  • DEV servers (general)
    • many of the DEV servers at Holly remain down due to problems with their back end storage
  • Scopes
    • Gatan 3View, RTS 2000, 3200 EF still down 
As of May 28, 2012 - 1:30 pm PDT
  • EMAIL
    • Delays of up to 12 hours are still being seen on some messages
  • CAMERA
    • Oracle down
    • Portal down
  • CCDB 
    • Oracle down  => ORACLE FIXED
    • iRODS down because Oracle is down => iRODS working
  • NIF
    • nif-apps[1-2] are down
    • blog and website are down  =>  websites (neuinfo.org and www.neuinfo.org) have been redirected to a maintenance page
  • NITRC
    • emailed Butch asking for status, no reply as yet
  • WBC
    • Dave Little reported servers are down
As of May 28, 2012 - 12:00 pm PDT
  • No significant change in status in the last 13 hours
  • EMAIL
    • Delays of up to 12 hours are still being seen on some messages
  • CAMERA
    • Oracle down
    • Portal down
  • CCDB 
    • Oracle down
    • iRODS down because Oracle is down
  • NIF
    • nif-apps[1-2] are down
    • blog and website are down
  • NITRC
    • emailed Butch asking for status, no reply as yet
  • WBC
    • Dave Little reported servers are down
UCSD Campus Power Outage
As of May 29, 2012 - 12:15 am PDT
  • NIF
    • nif-apps[1-2] are up
    • blog and website are up

 

As of May 28, 2012 - 12:00 pm PDT
  • No significant change in status in the last 13 hours
  • EMAIL
    • Delays of up to 12 hours are still being seen on some messages
  • CAMERA
    • Oracle down
    • Portal down
  • CCDB 
    • Oracle down
    • iRODS down because Oracle is down
  • NIF
    • nif-apps[1-2] are down
    • blog and website are down
  • NITRC
    • emailed Butch asking for status, no reply as yet
  • WBC
    • Dave Little reported servers are down
As of May 27, 2012 - 11:00 pm PDT
  • All of our servers have been powered back on, but we are still working on problems with many of the services running on them. 
  • The network issues are mostly resolved, except for one that may be affecting CAMERA.
  • IRODS remains down
  • Oracle dbs are down 
     If you notice a service down please submit a Jira ticket.
As of May 27, 2012 - 1:00 pm PDT
  • Please be advised that the CRBS SysOps Team is short on staff due to the holiday weekend.  As a result, it may be several hours before everything is back online and functioning properly
  • We are currently investigating the possibility that there is a problem with one of the switches in one of our racks at SDSC
  • We are aware that many systems are still offline or unusable.  Please be patient until we post here that we have everything back online and need help with end-to-end testing
As of May 27, 2012 - 10:00 am PDT
  • Campus network issues affecting CRBS appear to have been resolved.  Email is being delivered, though it will take some time for it to catch up with all the messages that were queued up.  No mail should have been lost, but time will tell.
  • We are resuming our recovery efforts.  Many systems that depended on servers at NCMIR or Holly are still offline and we are working on getting them back online.
As of May 27, 2012 - 7:30 am PDT
  • There are campus network routing issues that are preventing access to many of our systems - for example, our CRBS Status Page, NCMIR email, CAMERA systems
  • When the network issues are resolved, we will resume our recovery efforts
    Many thanks to Sean Penticoff, Edmond Negado, and Brandon Carl for working through the night and early this morning to get us this far. 
As of May 27, 2012 - 5:00 am PDT
  • WORKING - some CRBS systems are now back online - for example, this wiki and our support page
As of May 27, 2012 - 1:30 am PDT
  • Power restored
As of May 26, 2012 - ~11:45 am PDT
  • Most of UCSD Main Campus loses electrical power

Additional Info on the outage itself can be found at:

     UCSD Status Page

     SDSC Status Page

As of 4:39pm PDT:

  • Power was restored and deemed stable
  • We began bringing systems back online

As of 3:20pm PDT:

  • Most websites have been redirected to a page indicating that we are down for emergency maintenance.
  • The power has been shutdown.

As of 3:00pm PDT:

  • mailserver, email list, docushare and web servers are all down
  • backup storage is down
  • a 10 minute delay has been requested to finish bringing down a few stragglers

As of 2:05pm PDT:

  • We just received word that CalIT2 will have the power shutdown again at 3pm today.  We are scrambling to shutdown equipment and prepare for this unexpected/unplanned event.

=============

FM experienced a problem after the maintenance work this morning and need to shut power down again by around 3 pm.
Please shutdown all equipment ASAP, as necessary.  At this time we don't have an estimate of the window that FM requires.
=Tad
Tad Reynales, Manager
Technology Infrastructure
CALIT2 @ UC San Diego

=============


As of 12:00pm PDT:

  • All systems are powered on
  • website DNS entries are still being restored
  • We expect that everything will be back online and ready for end-user validation at approximately 1:30pm PDT

As of 10:45am PDT:

  • stage.nitrc is up
  • docushare is up
  • mail server is up, mail should start coming in
  • websites are coming back up

As of 10:30am PDT:

Power has been restored as of about 10 AM; FM reported successful completion of their work and have left Atkinson Hall as of ~10:30 AM.
=Tad
Tad Reynales, Manager
Technology Infrastructure
CALIT2 @ UC San Diego

As of 10:10am PDT:

  • Power has been restored at CalIT2 and we are starting to bring systems back online

As of 9:51 am PDT:

As of 9:43 this morning:

  • Mail to <username>@ncmir.ucsd.edu is being delayed
  • Websites are down, but we are attempting to redirect them to a maintenance page
  • All CAMERA resources have been shutdown
  • All CRBS resources hosted at CalIT2 have been shutdown
  • NITRC stage has been shutdown.