Oracle 9i Real Application Clusters :: Log (Raw Devices)

OCPdba.Net

02/07/2003 mappingDBCA.cfg Disk Layout dbca pic 1

   Created the Cluster!!
   ------------------------

   a) Preliminary Steps:
   ---------------------

   0 ) => Ensure SRVM_SHARED_CONFIG is set to the correct raw device on both nodes (/dev/raw/raw2 in this case)
   1 ) => Start the Watchdog Timer and the Cluster Manager Software on both nodes
   2 ) => Start the listeners on bith nodes
   3 ) => Start the Global Services Daemon

   
   b) DBCA Oracle Installer:
   -------------------------

   4 ) => Ensure DBCA_RAW_CONFIG points to your mappingDBCA.cfg file 
   5 ) => Start the Database Configuration Assisstant
   6 ) => Choose Create a Database
   7 ) => Cluster Configuration
   8 ) => Select both nodes

   9 ) => Most of the file specifications were correctly picked up from the DBCA_RAW_CONFIG file
          The ones that did not get correctly picked up were the tablespaces that I had added
          such as USER_DATA and USER_NDX.

   10) => Go through the installation process checking that the raw devices are set correctly.

   11) => Save the scripts, Save the Configuration as a template, Do NOT check Create Database as yet!
          For some reason the installer had grief with the redo logs and gave an error saying that they
          were set to zero. The next step (I do not know why...) bypassed the problem.

   12) => At this point the dbca exits since 'Create Database' was not checked
   13) => Restart the dbca
   14) => Choose Create Database, and use your newly created template, double check the settings
   14) => At the very end, check only 'Create Database'
   15) => And go!
   15) => About a half an hour later my RAC was built and running!!

02/06/2003

   a) LVM Gotchas to lookout for in a shared storage configuration:
   ----------------------------------------------------------------

   Problem:
   --------
   In my configuration BOX2 already had two prior Volume Groups VGSYSTEM and VGINTERNAL
   prior to the addition of VGRAC.

   BOX1 had no Volume Groups on it.

   When adding the additional Logical Volumes needed for the cluster database I did them using
   lvcreate commands on BOX1. This created a problem as lvcreate on BOX1 used the next available
   minor device number for logical volumes on BOX1.

   This caused BOX1 to reuse an existing minor device number on BOX2. When BOX2 attempted to do a vgscan
   a conflict was encountered and the VGRAC volume numbers stepped on the volume numbers as used in VGSYSTEM.

   As a result BOX2 could not mount its root file systems!! 


   Solution:
   ---------
   I dropped the VGRAC volume group and rebuilt it from scratch, from BOX2!

   This is important, since BOX2's minor device number sequence will be ahead of BOX1's. BOX1
   would have no problems with this when a vgscan was done.

   The rule then is to build the Shared Volume Group (VGRAC) from the machine with the most logical volumes.
   

02/05/2003 mappingDBCA.cfg listener.ora tnsnames.ora

  a) Listener Pre Configuration
  -----------------------------

  - The listeners on box1 and box2 were both configured, have a look at the
    sample configuration in the listener.ora file for box2 above.

  - The tnsnames.ora file also shows the configuration for that file.
  
02/04/2003 createdb_lvols.sh removedb_lvols.sh rac-mkraw

  a) Additional Logical Volumes Scripts
  -------------------------------------
  The two scripts createdb_lvols.sh and removedb_lvols.sh can be used once each to either
  add or remove the needed the logical volumes.

  b) Raw devices
  --------------
  Also look at the revised rac-mkraw script which has had devices added to support the
  added logical volumes.

  Under SuSE 8.1 the additional raw devices were needed from 16 upwards, to do this the following commands were used:

  cd /dev/raw
  mknod raw16 c 162 16

  162 is the Major device number for a raw device character file, the minor number in this case is 16, increment
  as needed.
  
02/03/2003 Disk Layout vgdisplay -v vgrac

  a) Proposed Disk Layout
  -----------------------

  I created logical volumes on the vgrac volume group to fulfill the needs o fthe various aspects of the database
  such as:

  - Data Tablespaces
  - Index Tablespaces
  - Undo Tablepaces
  - Redo Logs
  - System Tablespace
  - spfile

  Look at the "Disk Layout" link above for details.


  b) vgdisplay -v vgrac
  ---------------------

  Look at the link above for a listing of "vgdisplay -v vgrac"

02/02/2003 rac-core rdevtest.conf

  Couple of things done:
  ----------------------

  a) rac-core script
  ------------------

  Added /etc/init.d/rac-core 
  This is a script that allows me to start,stop,restart and check the status of:
  - Watchdog Daemon Timer
  - Oracle Cluster Manager
  - Global Services Daemon


  b) Files, Permissions, Symbolc Links, etc
  -----------------------------------------

  - Make sure the raw files /dev/raw/raw? are owned by oracle as they will need to be rw for Global Service to Use them
  - Created a symbolic link from /var/opt/oracle/srvConfig.loc to /u01/oracle/OraHome1/srvm/config/srvConfig.loc on all nodes


  c) Start the Global Services Daemon:
  ------------------------------------

  - A preliminary step done from box1 for first tome usage was: srvconfig -init
  - gsdctl start

    
  d) To verify that the node raw devices are working a test file as follows was created:
  --------------------------------------------------------------------------------------

  cat $ORACLE_HOME/bin/rdevtest.conf

  intbox1:/dev/raw/raw1
  intbox2:/dev/raw/raw1
  intbox1:/dev/raw/raw2
  intbox2:/dev/raw/raw2
  intbox1:/dev/raw/raw3
  intbox2:/dev/raw/raw3

  Then I ran $ORACLE_HOME/oracm/bin/rdevtest rdevtest.conf: (on both nodes after starting the GSD Daemon)

  oracle@box1:/u01/oracle/OraHome1/oracm/bin> ./rdevtest rdevtest.conf
  Ok

  oracle@box2:/u01/oracle/OraHome1/oracm/bin> ./rdevtest rdevtest.conf
  Ok


  e) You can use the rac-core script as follows to test:
  ------------------------------------------------------

  box1:/etc/init.d/rac-core status
  Watchdog Daemon is running
  Cluster Manager is running
  Global Services Daemon is running





02/01/2003

  Back to work, I apologize for the delay, we have a new baby boy!

  I will get started on the cluster again.


11/27/2002 Backup Scripts Link

  I have been very busy with some personal activities and probably will be for another two to three weeks.

  For those who are following the log I apologise for the delay.

  I added a script, pretty simple stuff to backup various directories and files of importance. Just grab them above or visit the link.

11/21/2002 ocmargs.ora Oracle Architecture Notes

  
  Starting Oracle Cluster Manager:
  --------------------------------

  i)   Edit $ORACLE_HOME/oracm/admin/ocmargs.ora as follows:


       watchdogd -g dba -l 0 -d /dev/null
       oranm
       oracm /a:0
       norestart 1800

       The watchdog arguments "-l 0 -d /dev/null" are for testing only, when done testing
       remove them so the watchdog line is as follows:

       watchdogd -g dba


  ii)  Start Oracle Cluster Manager and watchdogd (as root):

       su - oracle
       cd $ORACLE_HOME/oracm/bin
       su root 
       ./ocmstart.sh
       

       Here's what the processes were after I executed the script:

       ==================================================================================
       #ps -ef|egrep "watchdog|oracm"
       root      5052     1  0 14:06 ?        00:00:00 watchdogd -g dba -l 0 -d /dev/null
       root      5053     1  0 14:06 pts/1    00:00:00 oracm /a:0
       root      5055  5053  0 14:06 pts/1    00:00:00 oracm /a:0
       root      5056  5055  0 14:06 pts/1    00:00:00 oracm /a:0
       root      5057  5055  0 14:06 pts/1    00:00:00 oracm /a:0
       root      5058  5055  0 14:06 pts/1    00:00:00 oracm /a:0
       root      5059  5055  0 14:06 pts/1    00:00:00 oracm /a:0
       root      5060  5055  0 14:06 pts/1    00:00:00 oracm /a:0
       root      5061  5055  0 14:06 pts/1    00:00:00 oracm /a:0
       root      5062  5055  0 14:06 pts/1    00:00:00 oracm /a:0
       root      5070  4868  0 14:09 pts/1    00:00:00 egrep watchdog|oracm 
       ==================================================================================


   Installing Oracle 9iR2 RAC option
   ---------------------------------

   After starting OCM I started to look for the srvctl and gsd commands, to no avail...

   I then decided to re-run the installer and two things happened: 
   (bear in mind that the OCM was started as laid out above)

   a) I was presented with a dialog for the cluster node selection and both box1 and box2 were
      available within it!

   b) From the Available Products screen I selected "Oracle 9i Database 9.2.0.1" [Next]

      then selected "custom" from the Installation Types screen [Next]

      Two options were automatically highlighted under the Available Product Components screen:
      1) Oracle 9i Real Application Clusters 9.2.0.1 (New Install)
      2) Legato Networker Single Server 6.1.0.0.0 (New Install)

      I checked the 9i RAC option for install only

      The reason the RAC software was unavailable the first time I did the install was 
      because the Cluster Manager was not started, so a cluster was not detected.

      When installing the software on box2 I disabled the OCM software and the RAC option was unavailable, 
      after starting the OCM software ($ORACLE_HOME/oracm/bin/ocmstart.sh) the RAC option was availabe and the
      cluster nodes were presented.


      The software prompts to run $ORACLE_HOME/root.sh 

      As root issue the following commands before running the script:

      mkdir -p /var/opt/oracle
      touch /var/opt/oracle/srvConfig.loc

      then run $ORACLE_HOME/root.sh

      this places the following entry entry in the file

      srvconfig_loc=/dev/raw/raw2

      which corresponds to the Server Config logical volume /dev/vgrac/svrcfglv


   srvctl and gsd (RAC specific commands) were now available for use.


   Oracle Architecture Notes
   -------------------------
   Follow the Oracle Architecture Notes above, Several of the OCM parameters have been made obsolete 
   from 9iR1 to 9iR2. Also the oranm (Node Monitor) process is obsolete, and nmcfg.ora is not needed.

   rcp rlogin rsh
   --------------
   The IBM paper recommends that the above be configured between box1 and box2 as some the oracle configuration scripts
   use these commands to copy config files between the machines. I am leary of this as it is a security hole, a colleague 
   has suggested that I symlink rcp to scp.... a good suggestion that I will look into later. For now I'm following the
   recommendation.


   More tomorrow!!!

11/20/2002

  Changes to /etc/lilo.conf
  ------------------------

  i)   Edit your lilo.conf and include an entry for:

       append = "CONFIG_WATCHDOG_NOWAYOUT=Y"

  ii)  Run /sbin/lilo -v 

  iii) reboot

11/18/2002 Oracle Environment

  Installed Oracle's Cluster Manager Software for Linux, under 9iR2, this is a separate option you choose at install time.

  Recall that /etc/init.d/rac-mkraw maps the logical volumes in vgrac to raw devices as follows:

  raw /dev/raw/raw1 /dev/vgrac/cmlv
  raw /dev/raw/raw2 /dev/vgrac/svrcfglv
  raw /dev/raw/raw3 /dev/vgrac/quorumlv


  At least one shared raw device has to be created as an informatin repository for the database server configuration. 
  In this case /dev/vgrac/svrcfglv is used which maps to /dev/raw/raw2

  This is known as the Server Management (SRVM) Configuration device

  Note the environment variable (see the link above) SRVM_SHARED_CONFIG is set to this raw device


  Installation Steps:
  -------------------

  1) Start installer and select "Oracle Cluster Manager 9.2.0.1.0" [ Next ]

  2) The first dialog prompts for "Public Node Information", specify:

     Public Node 1: box1
     Public Node 2: box2          [ Next ]

  3) The next dialog prompts for "Private Node Information", specify:

     Private Node 1: intbox1
     Private Node 2: intbox2      [ Next ]


  4) The next dialog prompts for "WatchDog Parameter Information"

     Accept the default of 60000 milliseconds (60 seconds) [ Next ]


  5) You are now prompted for the quorum disk information.
     Specify /dev/raw/raw3 which maps to /dev/vgrac/quorumlv [ Next ]

  6) [ INSTALL ]


  Wait for the install to complete and repeat the process on box2... tomorrow I'll try to test this...

11/17/2002 rac-synclvm rac-mkraw rac-kernel

  Several scripts were added, please customize them to suit your environment...

  Added files to allow:

  i)   Syncronization of the VGRAC volume group for shared storage on boot

       The commands are:

       vgchange -a n vgrac
       vgscan
       vgchange -a y vgrac
       vgdisplay -v vgrac

       These:
       - deactivate the vgrac group
       - rebuilds the /etc/lvmtab
       - reactivates the vgrac group
       - displays information on the vgrac group

       This must be done at each boot so I created a file /etc/init.d/rac-synclvm and sym linked it to run level 5 as follows:

       ln -sf /etc/init.d/rac-synclvm /etc/init.d/rc5.d/S25rac-synclvm


  ii)  Creation of the raw devices - initialize the raw devices on boot

       The commands (at this point) are:

       raw /dev/raw/raw1 /dev/vgrac/cmlv
       raw /dev/raw/raw2 /dev/vgrac/svrcfglv
       raw /dev/raw/raw3 /dev/vgrac/quorumlv

       The IBM pdf in the main page of the site lists the devices as /dev/raw1 /dev/raw2 /dev/raw3 on SuSE 8.0 up raw devices
       are grouped under /dev/raw/ 

       Note that this script will be extended when the database is built later... so check it oftem for changes. I will note 
       in the log when it changes.

       This must be done at each boot so I created a file /etc/init.d/rac-mkraw and sym linked it to run level 5 as follows:

       ln -sf /etc/init.d/rac-mkraw /etc/init.d/rc5.d/S26rac-mkraw


  iii) Setting kernel parameters for semaphores and shared memory segments

       To set Semaphore Parameters the file /proc/sys/kernel/sem is modified as follows:

       echo "Setting SEMMSL SEMMNS SEMOPM SEMMNI in /proc/sys/kernel/sem"
       echo 250 256000 100 1024 > /proc/sys/kernel/sem

       This must be done at each boot so I created a file /etc/init.d/rac-kernel and sym linked it to run level 5 as follows:

       ln -sf /etc/init.d/rac-kernel /etc/init.d/rc5.d/S27rac-kernel


  I wish the day had more hours in it.....  Still reading on RAC setup..

11/17/2002

  i)  Added a FAQ section - getting emailed the same questions
  ii) Still reading on RAC setup..

11/16/2002 Volume Group Log Box1 Adaptec Utility Screen Box2 Adaptec Utility Screen Equipment pic 1 Equipment pic 2 Equipment pic 3
  It has been a very busy night, but it was all worth it. All the hardware configuration is done, tested and works!!


  1) The Shared SCSI disk sub-system:

     a) I actually ended up using different SCSI id's as in my diagram...

        Box1 SCSI ID Layout:
        --------------------
        0 - The SCSI Adapter itself
        2 - An internal SCSI Quantum Atlas IV, this is box1's only drive with the OS on it

        Box2 SCSI ID Layout:
        --------------------
        15 - The SCSI Adapter itself

        
        Shared SCSI Storage Array
        -------------------------
        4 - QUANTUM XP34550W           
        7 - SEAGATE ST34371W

        I opened up the Adaptec Utility on both boxes, notice that they see all the dives along the chain, except
        the SCSI cards in the opposite machines... see the links at the bottom for actual screen shots

 
        The SCSI Storage Array had to be powered up first. I tested with first box1 and shared storage on alone, 
        then box2 and shared storage alone, then finally with box1, box2 and shared storage, it worked great.


     b) LVM Setup: Use box1 or box2 to do this, I used box2 then informed box1 of the changes using vgscan.
        On box2 the disks are seen as /dev/sda and /dev/sdb , use whatever your environment has.

        i)   Using fdisk partition the disks to use the whole disk with a single partition of type 8e (LVM).

        ii)  Initialise the disks for use:
             pvcreate /dev/sda1
             pvcreate /dev/sdb1

        iii) Create the volume group VGRAC for use:
             vgcreate vgrac /dev/sda1 /dev/sdb1


        iv)  Create two logical volumes: (May need more later as I read on, can always remove these, read the LVM pdf)
             lvcreate -i1 -L 100M -n cmlv vgrac
             lvcreate -i1 -L 100M -n svrcfglv vgrac

        
        v)   Verify:
             vgdisplay -v vgrac


        vi)  Notify box1 of the new volume group:
             1) Inform box1 of the new volume group and activate it
                vgscan
                vgchange -ay vgrac

             2) Verify on box1 by issuing:
                vgdisplay -v vgrac

             Look at the vglog.txt link for the actual output...

             Notice on box1 the drives are /dev/sdb and /dev/sdc, but this does not affect things as the VG information
             is stored on the disk, so the vgscan builds the vgroup in /etc/lvmtab correctly... nice.


  2) The Interconnect: NetGear FA310TX and LinkSys 10/100 NICs, plus a 10/100 crossover patch cable 

     a) I found A NetGear FA310 NIC and a LinkSys NIC for the interconnect. I used YaST2 to configure these. 
        I then used a crossover cable between the two. This is fast... later if this is not fast enough I 
        may consider bumping the interconnect up to Gigabit Cards.. (have to price it first)

     b) Setup was simple....

        i)  Used YaST2 to add and configure the cards with IP addresses 1.1.1.1 & 1.1.1.2 on box1 and box2 respectively,
            subnet mask 255.255.255.0  I didn't have to do anything special.. that was it.

        ii) Edited /etc/hosts on both machines to include the following:
            
            1.1.1.1    intbox1
            1.1.1.2    intbox2

        iii) I was able to ping and ssh both ways between the boxes
     
  
  Tomorrow it's onto RAC!! Lots of reading in store for me....

11/14/2002 Adaptec 2940UW Flash BIOS Version 2.2
  1) Built the shared SCSI tower tonight...

  2) The Adaptec Cards I have had older versions of the BIOS on them... 
     The newer BIOS for Adaptec's AHA-2940UW may be obtained from the link above

  3) I ran a test with the SCSI disk sub system connected to Box2 only, LVM works great
     I also liked that SuSE does a vgscan on boot so the lvmtab is rebuilt automatically
     This allows the drives to be moved around without worry about their SCSI id's changing.. wonderful!

  4) Tomorrow it's onto the dual SCSI connect...
11/12/2002
  Starting to build hardware setup....

  During Setup of SuSE 8.1 ... problems with Adaptec 2940 Cards and aic7xxx.o driver

  SuSE 8.1 has a problematic Adaptec driver included

  i)  Edited /etc/lilo.conf and added:   append="acpi=off"
  ii) Ran /sbin/lilo -v


OCPdba.Net