Categories: ExadataOracle

Exadata failing RAID HBA card

This is more of informative story what happened to us recently.

In the past we had issue with one of the Exadata cell node where the RAID HBA card has failed. After working with support they decided to replace the card and at the same time update the card firmware.

That issue was described in detail in note:

Exadata X5/X6 reports “Disk controller was hung. Cell was power cycled to stop the hang.” and SAS HBA logs report correctable errors on SW images prior to 12.1.2.3.2 (Doc ID 2176276.1)

Now this time one of the database nodes went down and surprise surprise it was the RAID HBA card again!

Only problem was that the replacement part was 24 hours off and we have some single node databases running on this node. So we had to find feasible workaround for this issue.

We came up with an idea to replace the card from one of the cell nodes as we run ASM high redundancy and thought we can survive this time without one of the cell nodes being up for 24 hours on a weekend.

If you are not familiar with ASM redundancy with Exadata here is good short summary of it: http://blog.umairmansoob.com/tag/using-high-vs-normal-redundancy-in-exadata/

Once on site we came up with idea to use similar card from our Oracle Platinum Gateway server. It’s still component inside Oracle’s support and we wouldn’t need to touch the cell node which would have had more risks involved.

Oracle’s Field Engineer replaced the part and we got our database node up! Still there was one issue that Infiniband link did not work. This was due to to Infiniband port being autodisabled during node crash.

After enabling the port from Infiniband switch everything was running properly again.

If you have similar critical issue it’s worth remembering there might be alternative way if the replacement part is not near you. Of course the part should be available but that’s a different issue..

Raid card model is Oracle Storage 12 Gb SAS PCIe RAID HBA.

Simo

Recent Posts

Connecting to Autonomous Database Running on Google Cloud

Last time I showed how to provision Autonomous Database Serverless (ADB-S) on Google Cloud. This…

1 month ago

Can you believe it? Provisioning Autonomous Database in GCP!

I bet few years back folks didn't expect that by 2024 we would be able…

1 month ago

IP Address Insights with CLI

My previous post on IP Address Insights I mentioned it wasn't yet available with CLI…

6 months ago

Thoughts on Oracle Database@Azure

This will NOT be a technical walkthrough on Oracle Database@Azure but rather my opinions and…

6 months ago

OCI Vulnerability Scanning Setup

Many times when you work for someone, they already have their own vulnerability scanning throughout…

6 months ago

OCI IP Address Insights

Recently OCI announced small but VERY useful service, IP Address Insights. Why this matters? I've…

6 months ago