As of May 28, 2012 - 5:40 pm PDT
- EMAIL
- Delays of up to 12 hours are still being seen on some messages
- CAMERA
- Oracle down
- Portal is up, but some services may be unavailable
- CCDB
- Online and fully operational to the best of our knowledge
- Online and fully operational to the best of our knowledge
- Cell Image Library
- Online and fully operational to the best of our knowledge
- Online and fully operational to the best of our knowledge
- NIF
- Portal is up, but some services may be unavailable
- load balancer is bypassed and both neuinfo.org and www.neuinfo.org have been redirected to nif-apps1
- blog is down and the blog box has been removed from the NIF front webpage temporarily to speed loading of the page
- NITRC
- Butch emailed that noble was down. It was rebooted and NITRC is online.
- WBC
- Dave Little reported dev and stage servers are down
- Dave Little reported dev and stage servers are down
- DEV servers (general)
- many of the DEV servers at Holly remain down due to problems with their back end storage
- Scopes
- Gatan 3View, RTS 2000, 3200 EF still down
As of May 28, 2012 - 1:30 pm PDT
- EMAIL
- Delays of up to 12 hours are still being seen on some messages
- CAMERA
- Oracle down
- Portal down
- CCDB
- Oracle down => ORACLE FIXED
- iRODS down because Oracle is down => iRODS working
- NIF
- nif-apps[1-2] are down
- blog and website are down => websites (neuinfo.org and www.neuinfo.org) have been redirected to a maintenance page
- NITRC
- emailed Butch asking for status, no reply as yet
- WBC
- Dave Little reported servers are down
As of May 28, 2012 - 12:00 pm PDT
- No significant change in status in the last 13 hours
- EMAIL
- Delays of up to 12 hours are still being seen on some messages
- CAMERA
- Oracle down
- Portal down
- CCDB
- Oracle down
- iRODS down because Oracle is down
- NIF
- nif-apps[1-2] are down
- blog and website are down
- NITRC
- emailed Butch asking for status, no reply as yet
- WBC
- Dave Little reported servers are down
As of May 29, 2012 - 12:15 am PDT
- NIF
- nif-apps[1-2] are up
- blog and website are up
As of May 28, 2012 - 12:00 pm PDT
- No significant change in status in the last 13 hours
- EMAIL
- Delays of up to 12 hours are still being seen on some messages
- CAMERA
- Oracle down
- Portal down
- CCDB
- Oracle down
- iRODS down because Oracle is down
- NIF
- nif-apps[1-2] are down
- blog and website are down
- NITRC
- emailed Butch asking for status, no reply as yet
- WBC
- Dave Little reported servers are down
As of May 27, 2012 - 11:00 pm PDT
- All of our servers have been powered back on, but we are still working on problems with many of the services running on them.
- The network issues are mostly resolved, except for one that may be affecting CAMERA.
- IRODS remains down
- Oracle dbs are down
As of May 27, 2012 - 1:00 pm PDT
- Please be advised that the CRBS SysOps Team is short on staff due to the holiday weekend. As a result, it may be several hours before everything is back online and functioning properly
- We are currently investigating the possibility that there is a problem with one of the switches in one of our racks at SDSC
- We are aware that many systems are still offline or unusable. Please be patient until we post here that we have everything back online and need help with end-to-end testing
As of May 27, 2012 - 10:00 am PDT
- Campus network issues affecting CRBS appear to have been resolved. Email is being delivered, though it will take some time for it to catch up with all the messages that were queued up. No mail should have been lost, but time will tell.
- We are resuming our recovery efforts. Many systems that depended on servers at NCMIR or Holly are still offline and we are working on getting them back online.
As of May 27, 2012 - 7:30 am PDT
- There are campus network routing issues that are preventing access to many of our systems - for example, our CRBS Status Page, NCMIR email, CAMERA systems
- When the network issues are resolved, we will resume our recovery efforts
Many thanks to Sean Penticoff, Edmond Negado, and Brandon Carl for working through the night and early this morning to get us this far.
As of May 27, 2012 - 5:00 am PDT
- WORKING - some CRBS systems are now back online - for example, this wiki and our support page
As of May 27, 2012 - 1:30 am PDT
- Power restored
As of May 26, 2012 - ~11:45 am PDT
- Most of UCSD Main Campus loses electrical power
Additional Info on the outage itself can be found at:
As of 4:39pm PDT:
- Power was restored and deemed stable
- We began bringing systems back online
As of 3:20pm PDT:
- Most websites have been redirected to a page indicating that we are down for emergency maintenance.
- The power has been shutdown.
As of 3:00pm PDT:
- mailserver, email list, docushare and web servers are all down
- backup storage is down
- a 10 minute delay has been requested to finish bringing down a few stragglers
As of 2:05pm PDT:
- We just received word that CalIT2 will have the power shutdown again at 3pm today. We are scrambling to shutdown equipment and prepare for this unexpected/unplanned event.
=============
=============
As of 12:00pm PDT:
- All systems are powered on
- website DNS entries are still being restored
- We expect that everything will be back online and ready for end-user validation at approximately 1:30pm PDT
As of 10:45am PDT:
As of 10:30am PDT:
As of 10:10am PDT:
- Power has been restored at CalIT2 and we are starting to bring systems back online
As of 9:51 am PDT:
- Websites hosted at CalIT2 are being redirected to http://maintenance.crbs.ucsd.edu
As of 9:43 this morning:
- Mail to <username>@ncmir.ucsd.edu is being delayed
- Websites are down, but we are attempting to redirect them to a maintenance page
- All CAMERA resources have been shutdown
- All CRBS resources hosted at CalIT2 have been shutdown
- NITRC stage has been shutdown.
At 05:52:54 AM PDT A power event occurred affecting half of campus
It is confirmed that CalIT2 and suspected that Holly were affected.
Currently power has been restored a few systems that did not come up after power was restored.
Our mail server is included in this outage.
Sat, Nov 05 7:30 AM mail.ncmir.ucsd.edu.
Sat, Nov 05 5:57 AM dev-web.crbs.ucsd.edu.
Sat, Nov 05 5:57 AM tom.crbs.ucsd.edu
Sat, Nov 05 5:57 AM vm-dev-8.crbs.ucsd.edu.
Sat, Nov 05 5:56 AM stitch.crbs.ucsd.edu
Sat, Nov 05 5:56 AM drlittle.crbs.ucsd.edu.
Sat, Nov 05 5:55 AM lilo.crbs.ucsd.edu
Sat, Nov 05 5:55 AM tom.crbs.ucsd.edu.
Sat, Nov 05 5:55 AM 132.239.132.214
Sat, Nov 05 5:54 AM dolphin.crbs.ucsd.edu
Sat, Nov 05 5:54 AM vm0-apps.camera.calit2.net.
Sat, Nov 05 5:54 AM featherie.ucsd.edu.
Sat, Nov 05 5:54 AM navi.crbs.ucsd.edu
Sat, Nov 05 5:54 AM navi.crbs.ucsd.edu
Sat, Nov 05 5:54 AM vihar.crbs.ucsd.edu.
Sat, Nov 05 5:54 AM vihar.crbs.ucsd.edu
Sat, Nov 05 5:54 AM compute-0-8-0
Sat, Nov 05 5:54 AM portal-dev.camera.calit2.net
Sat, Nov 05 5:54 AM vm0-apps.camera.calit2.net.
Sat, Nov 05 5:53 AM compute-0-8-0
Sat, Nov 05 5:53 AM compute-0-8-0
Sat, Nov 05 5:53 AM leibniz.ucsd.edu.
Sat, Nov 05 5:53 AM portal.camera.calit2.net (JCVI a
Sat, Nov 05 5:53 AM www.wholebrainproject.org
Sat, Nov 05 5:53 AM stage-nitrc.crbs.ucsd.edu
Sat, Nov 05 5:53 AM www.wholebraincatalog.org
Sat, Nov 05 5:53 AM www.wholebraincatalog.org
Sat, Nov 05 5:53 AM tom.crbs.ucsd.edu
Sat, Nov 05 5:53 AM bacula.crbs.ucsd.edu
Sat, Nov 05 5:53 AM 132.239.132.247
Sat, Nov 05 5:53 AM lilo.crbs.ucsd.edu
Sat, Nov 05 5:53 AM stitch.crbs.ucsd.edu
Sat, Nov 05 5:53 AM lobster.crbs.ucsd.edu
Sat, Nov 05 5:52 AM porpoise.crbs.ucsd.edu
Sat, Nov 05 5:52 AM seabass.crbs.ucsd.edu
Thank you for your patience while we work on bringing these few systems back up.
If you notice anything UP but not working properly, please submit a ticket via our support website
Sincerely,
CRBS SysOps
As of 1:30pm PDT, if you notice anything not working properly, please submit a ticket via our support website.
There are a few dev and stage systems that still need to be brought online.
Additionally, we are working on getting the CAMERA cylume Rocks cluster, and it's associated systems, including the gama server, back online.
Thank you for your patience through this extraordinary event.
Sincerely,
CRBS SysOps
Please see our status web page for additional details.
The following servers/services should be operating nominally:
- Everything
Delivery of the switch hardware we need was delayed. It did not arrive until 11:30am. As a result, there has been a corresponding slip in our schedule.
List of affected Virtual Machines (VMs)
A list of affected Virtual Machines (VMs) can be found here
Project Information
CAMERA
Intermittent network interruptions while the network is upgraded this morning.
Oracle databases and Oracle database servers unavailable during NetApp move this afternoon.
victory and constellation oracle servers have been relocated.
CCDB
Intermittent network interruptions while the network is upgraded.
While the maunaloa storage system is moved, the following data stores will be unavailable.
- CellImageLibrary
- HarvardData
- Image Server "scratch" space
NIF
Intermittent network interruptions while the network is upgraded.
- NIF1, NIF2, NIF4 and nif-crawler servers have been patched, updated and relocated.
NITRC
Intermittent network interruptions while the network is upgraded.
Aproximately 30 minute outage while nitrc.org bare metal server is relocated.
Work to be Done
switch hardware upgrade
Delivery of the switch hardware we need was delayed. It is due by noon today, via FedEx.
"Bare Metal" servers
Servers that are not virtualized will be moved this morning while we are installing the new switch hardware. This will impact:
- nitrc.org
- braininfo
- the Oracle 3-node RAC system and databases hosted there.
- maunaloa data storage
- the SVN data repository
- the CVS data repository
VM migration status
We are hoping the switches will arrive early enough to allow us to migrate the VMs over 10Gb, suspend them over 10Gb while we relocate the NetApp, and then resume them.
Oracle RAC move
Daniel Wei will be assisting us with the move of this equipment.
NetApp move
The NetApp will be one of the last pieces of equipment to be relocated. During this time, all VMs that use shared NetApp storage to facilitate disaster recovery will be suspended. CAMERA Oracle databases will be unavailable. WBC data stored on the NetApp will also be unavailable.
Work completed
VM migration
Four of our VM hosts have now been moved to the new location and a number of VMs have been migrated to these machines.
General Improvements
- BIRN Portal and GAMA servers have been virtualized
- NIF server system software updated
NIF status
- NIF1, NIF2, NIF4, NIF5 have all been moved to the new location. Patches have been applied and network has been reconfigured. Network has improved with adding redudent/failover links for all of the machines for both management and public networks. Management network upgraded to 1GB.
- Currently I have NIF's website pointing to a different webserver so that when the production nodes go down, we will still have a status page up for NIF.
We have upgraded power and all the necessary network drops in the new location. Progress is being impeded by the lack of redundant 10Gb layer 3 switches for the new location. We've done everything we can think of to expedite the procurement of this equipment, including requesting Saturday morning delivery via FedEx, if necessary.
In addition, we have moved the first VM host to the new location and plan to start migrating VMs to it tomorrow. Unfortunately, without the switches, the process of live migrating the VMs will take much longer than originally planned for.
Vicky
What is Going On
The Oracle RAC at SDSC did not come up properly after the power outage yesterday.
System Affected
oracle-rac1
oracle-rac2
oracle-rac3
Project(s) - Software/Application(s) Affected
CCDB - production iRODS (iCAT database is offline)
Steps Being Taken to Rectify the Situation
Daniel Wei has been contacted (left message) and his assistance has been requested. We are waiting for him to get in touch with us.
ETA (if known)
We upgraded the server and the applications for Jira, Confluence, Crowd, and Fisheye/Crucible. This should improve performance, fix bugs, in general, and also improve the account management interface.
subversion.crbs.ucsd.edu is online
Use your crowd login - you must be a member of the crbs-sysops groups to write to the systems repo.
Currently the puppet code has been migrated to SVN. It was checked in at https://subversion.crbs.ucsd.edu/systems/apps/puppet and the CVS repo for puppet should not be used.
If you're interested in using our SVN repository, contact Kennon Kwok at kkwok_at_ncmir_dot_ucsd_dot_edu.
I've installed the Gliffy plugin, which allows you to create diagrams, flowcharts, floorplans, network diagrams, etc. from within Confluence. I put a quick, easy example here . For better examples, see http://www.gliffy.com/examples/flow-charts/
I've relocated the installation server from my desktop (dev-234) to a production vm (cobbler.crbs.ucsd.edu). Cobbler services have been turned off on dev-234. The iso location has been updated in confluence to http://cobbler.crbs.ucsd.edu/cobbler-ks.iso
The Bamboo build test server, bamboo.crbs.ucsd.edu, has been relocated to an upgraded server at SDSC. Back on the "old" server, both the bamboo and bamboo-dev services were disabled.