CRS Not Starting after Removing OS User: How to Workaround and How to Solve!

Hello all!
Turns that a few days ago a client reached me because his CRSD was simply not starting. Like this:

[root@proddb proddb]$ ./crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'proddb'
CRS-2676: Start of 'ora.crsd' on 'proddb' succeeded

[root@proddb proddb]$ ps -ef |grep crsd
root 19217 13424 0 11:53 pts/0 00:00:00 grep crsd

After some investigation, I found the following:

2017-01-24 14:00:06.859: [ CRSSEC][1690195712]{1:51052:2} Exception: OwnerEntry construction failed to retrieve user id by name with ACL string: owner:jacknobody:rwx and error: 1
2017-01-24 14:00:06.912: [ CRSSEC][1690195712]{1:51052:2} Exception: ACL entry creation failed for: owner:jacknobody:rwx

Hmmm, seems some CRS resources are owned by “Jack Nobody”… Turns that I this us was removed from OS:

[root@proddb proddb]$ cat /etc/passwd |grep jacknobody
[root@proddb proddb]$ 

What to do now?

First, I’d recommend you to review MOS reference:
CRSD Daemon Fails to start with Exception Error during ACL Entry Lookup for a resource” (Doc ID 1491367.1)
ora.crsd Resource No Longer Starts on Either Node All Other Grid Infrastructure Initialization Resources Starting Fine (Doc ID 2179122.1)

The this is pretty simple, just re-add user to OS and all will work as expected!
It’s not needed to have same password, only to recreate user.

After recreating the user, the CRS was successfully started. 🙂

But if I need to remove the osuser, what to do?

Here is the quick step-by-step:

1. Check resources related to jacknobody.

– Connect to server proddb with root and take an OLR and OCR dump to verify which resources are owned by this user

    $ ocrdump -local /tmp/olr.log
    $ ocrdump /tmp/ocr.log
    $ grep jacknobody /tmp/olr.log /tmp/ocr.log

2. Change Ownership of resources related to jacknobody.
** For each resource found in 1 **

– Change the ownership of the respective resource type to the new user (oranew, for example), as root user.
Examples:

    $ crsctl setperm type ora.cluster_resource.type -o oranew
    $ crsctl setperm type ora.database.type -o oranew
    $ crsctl setperm type ora.listener.type -o oranew

– Verify the change for affected types:
Examples:

    $ crsctl getperm type ora.cluster_resource.type
    Name: ora.cluster_resource.type
    owner:oranew:rwx,pgrp:oinstall:rwx,other::r--
    
    $ crsctl getperm type ora.database.type
    Name: ora.database.type
    owner:oranew:rwx,pgrp:oinstall:rwx,other::r--
    
    $ crsctl getperm type  ora.listener.type
    Name: ora.listener.type
    owner:oranew:rwx,pgrp:oinstall:rwx,other::r--

3. Check if no resources still related to jacknobody.
– Check if all resources were correctly changed:

    $ ocrdump -local /tmp/olr_2.log
    $ ocrdump /tmp/ocr_2.log
    $ grep jacknobody /tmp/olr_2.log /tmp/ocr_2.log

– If any database resources still owned by jacknobody, then remove them from the OCR and add it back after deleting osuser:

    $ srvctl config database -d  # To get database confguration
    $ srvctl remove database -d  # To remove
    $ srvctl config database -d  # To re-add database. Include in this command desired options using configuration from first command.

4. Remove user jacknobody
Remove user from OS:

    $ userdel jacknobody

This is also described as per MOS Oracle Restart is not coming up due to non-existing owner ( Doc ID 1931046.1 ).

And this is it for today!
See you next week!

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.