Categories: cloudExadataOCIOracle

Exadata Cloud Service scaling (adding nodes) hangs

This will be a short post, mainly just to guide where to look if you ever encounter this issue.

Recently I was tasked on adding a database node on Exadata Cloud Service X8M. The new X8M has dynamic deployment options so if you need more storage or compute power for your database nodes, you can just add those.

Sve has written good posts on this earlier on provisioning X8M and also scaling X8M – read details from there!

My issue with scaling was that after scaling event started – it basically hanged! Nothing happened and the work requests seemed to be stalling. I had done this same action to another X8M just few days prior and thought there was something odd.

Since debugging options are fairly limited (and at that point in time I wasn’t sure which log file to look) I created SR to look this through. Around same time we received an email from OCI:

But it can’t be the security lists since I had looked them through multiple times! Or can it? Looking it another time through I noticed a typo in the CIDR block which I then corrected. Network requirements are defined in the OCI documentation which I always use as a reference.

There are rules in general for ports ICMP, Service Gateway and ports 22, 6200 and 1521. But there’s also note referencing X8M and scaling:

For X8M systems, Oracle recommends that all ports on the client subnet need to be open for ingress and egress traffic. This is a requirement for adding additional database servers to the system.

This is anyway in general what I’ve seen done in many implementations, size the subnet for Exadata only as per network requirements and then open all subnet traffic. But now due to typo this had failed, only problem was nothing happened after opening the ports and work request continued to hang!

We had rather long conversation with support as they didn’t believe me, luckily you can get log file addNodeActions*.log under /u01/app/oraInventory/logs which showed the error AND also showed nothing was running at the moment.

Once that was confirmed, support restarted the workflow and everything completed smoothly within the normal timeframe.

Summary

Small mistake but took some time to resolve, always double check the network rules before scaling! Also additional log files on node 1 can be found under /u01/app/oraInventory/logs for scaling event itself.

Apart from that positive experience with the scaling email received and how fast adding a node overall is as it takes only 4-5 hours!

Simo

Recent Posts

Connecting to Autonomous Database Running on Google Cloud

Last time I showed how to provision Autonomous Database Serverless (ADB-S) on Google Cloud. This…

1 month ago

Can you believe it? Provisioning Autonomous Database in GCP!

I bet few years back folks didn't expect that by 2024 we would be able…

1 month ago

IP Address Insights with CLI

My previous post on IP Address Insights I mentioned it wasn't yet available with CLI…

6 months ago

Thoughts on Oracle Database@Azure

This will NOT be a technical walkthrough on Oracle Database@Azure but rather my opinions and…

6 months ago

OCI Vulnerability Scanning Setup

Many times when you work for someone, they already have their own vulnerability scanning throughout…

6 months ago

OCI IP Address Insights

Recently OCI announced small but VERY useful service, IP Address Insights. Why this matters? I've…

6 months ago