Scheduled start and stop an AWS EC2 Instance hosting a Flask Web application

I have a Flask web application running on an AWS EC2. To save the bill, I want to stop the EC2 at 10:30 PM ET and start it next day at 6:30 AM ET every day automatically. So I set out to research and implement the solution to achieve this goal.

First of all, My application uses nginx web server, I find I need to run the following command in order to make sure nginx server will be restarted after the EC2 server reboot:

chkconfig nginx on

Secondly, I figured out I can use cron to automatically start my application after EC2 reboot. I put all commands needed to start my application in a shell script, then added in the cron:

$> crontab -l 
@reboot /u01/app/mysql/start_myapp.sh  > /tmp/start_myapp.out 2>&1

Lastly, I had to find out a way to start/stop EC2. I googled and got this url:
https://aws.amazon.com/answers/infrastructure-management/ec2-scheduler/

It stated:

In 2016, the EC2 Scheduler was launched to help AWS customers easily configure 
custom start and stop schedules for their Amazon Elastic Compute Cloud 
(Amazon EC2) instances. In 2018, AWS launched the AWS Instance Scheduler, 
a new and improved scheduling solution that enables customers to schedule Amazon 
EC2 instances, Amazon Relational Database Service (Amazon RDS) instances, and 
more. We encourage customers to migrate to AWS Instance Scheduler for future 
updates and new features.

Quickly checking “AWS Instance Scheduler”, I can know it is a overkill for my simple case as I only have one EC2 at this moment for this need. So I decided I would just use “aws ec2 stop-instance” and “aws ec2 start-instance”. And my Windows desktop pc at home is typically up and running 24 hours, so i put the command in the bat file and used the Windows Task Scheduler to call the bat file.

The bat files looks like:

-- start_ec2.bat
set AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXX
set AWS_SECRET_ACCESS_KEY=YxxxxxxxxxxxxxXXXXXLDs
set region = us-east-1

c:\AWSCLI\aws.exe ec2 start-instances --instance-ids i-0xxxxxxxxxxxx4
-- stop_ec2.bat
c:\AWSCLI\aws.exe ec2 stop-instances --instance-ids i-0xxxxxxxxxxxx4

Below screenshot shows that the Stop EC2 task running successfully in the past three days. I did verify this solution works for my case.

stopRC2_taskscheduler

How to drop a RAC database manually

The easiest way to create or drop a RAC database should be using DBCA GUI. However, sometime we prefer to do it on command line. In this case we can use DBCA -silent option with a response file in 12.2 version of Oracle RDBMS. Recently I have tested the syntax for both creating and droping a RAC db. i.e.

dbca -silent -createDatabase -responseFile /path/to/dbca.rsp -ignorePrereqFailure
dbca -deleteDatabase -sourceDB db1tst -silent -responseFile /path/to/dbca.rsp

When we use DBCA GUI to create a database, we can save a response file, this file can be used as a template for us to create other databases with silent mode.

However, today I encountered a situation that I was unable to drop a testing db because I cannot enter the correct sys passowrd

$ dbca -deleteDatabase -sourceDB db1tst -silent -responseFile /opt/oracle/stage/dbca.rsp
Enter SYS user password:

Connecting to database
9% complete
14% complete
DBCA Operation failed.

Look at the log file "/opt/oracle/product/cfgtoollogs/dbca/db1tst2.log" for further details.

In the log, it can be seen:

[ 2018-02-10 22:24:43.093 GMT ] ORA-01017: invalid username/password; logon denied

[ 2018-02-10 22:24:43.095 GMT ] Connection to the database failed, verify the user name and password and try again.
[ 2018-02-10 22:24:43.199 GMT ] DBCA_PROGRESS : DBCA Operation failed.

I was unable to get the password right, so I have to drop the db manually. Below I describe the steps I went through for future reference:

1. check the db configuration and status

$ srvctl config database -d db1tst

Database unique name: db1tst
Database name: db1tst
Oracle home: /opt/oracle/product/12.2.0/db_1
Oracle user: oracle
Spfile: +DATA_DB1TST/DB1TST/PARAMETERFILE/spfile.321.967756369
Password file: +DATA_DB1TST/DB1TST/PASSWORD/pwddb1tst.295.967753731
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATA_DB1TST,FRA_DB1TST
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group:
Database instances: db1tst1,db1tst2,db1tst3,db1tst4
Configured nodes: racnode01,racnode04,racnode03,racnode02
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services:
Database is administrator managed

$ srvctl status database -d db1tst

Instance db1tst1 is running on node racnode01
Instance db1tst2 is running on node racnode04
Instance db1tst3 is running on node racnode03
Instance db1tst4 is running on node racnode02

2. Stop db and start in node 1 with mount

srvctl stop database -d db1tst

srvctl start instance -d db1tst -n racnode01 -o mount

3. Try to drop database, receiving “ORA-01586”

SQL> drop database;
drop database
*
ERROR at line 1:
ORA-01586: database must be mounted EXCLUSIVE and not open for this operation

4. set cluster_database=false, restart db and drop again

SQL> alter system set cluster_database=false scope=spfile;

System altered.

SQL> shutdown immediate;
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.

SQL> startup mount restrict;
ORACLE instance started.

Total System Global Area 8388608000 bytes
Fixed Size 8622144 bytes
Variable Size 3439333312 bytes
Database Buffers 4513071104 bytes
Redo Buffers 427581440 bytes

Database mounted.
SQL>
SQL>
SQL> drop database;

Database dropped.

5. Remove the db from GI

$ srvctl remove database -d db1tst
Remove the database db1tst? (y/[n]) y

$ srvctl config database -d db1tst
PRCD-1120 : The resource for database db1tst could not be found.
PRCR-1001 : Resource ora.db1tst.db does not exist

6. Delete the directory from the ASMCMD command line

ASMCMD> pwd
+DATA_DB1TST/db1tst
ASMCMD> cd ..
ASMCMD> rm -rf db1tst

Change the running nodes of Oracle RAC instances

Oracle RAC i.e. Real Application Cluster is a very unique database infrastructure. I have seen the term ‘cluster’ in other database world, but they are usually the “cluster” of both servers and storages ( for example MariaDB Galera cluster). In Oracle, the database is defined as a collection of physical operating system files or disks. Instance is a set of Oracle background processes or threads running on the same computer(aka node, server) and sharing a common memory(i.e. SGA, System Global Area). For quite a time ago, we can have at most one instance per database. We can think of a database as an island (lots of treasure there). Instance is a bridge with a particular traffic control system from the mainland to this island. An Oracle RAC database then can be thought as that we have multiple bridges built to connect to the same island.

At a client side, we created and configured dozens of such 4-node (bridge) RAC databases. Normally we have a number with the name of the “bridge”, ie bridge 1, 2, 3,4 and the “traffic control systems (instances)” also have a number in their names, i.e. 1,2,3,4. Naturally we want to have instance 1 running on node 1, so on and so forth. However we see instance 3 running on node 4 in many cases for some reason unknown( not be able to trace the origin of that so far). This could be very inconvenient and error-prone ( i.e. may cause mismatch/confusion in some configuration files which contain database instance info, for example oratab file).

SRVCTL is the command line toot that can be used to manage configuration information of a RAC. We can use SRVCTL commands to add, remove, start, stop, modify, enable, and disable a number of entities, such as databases, instances, listeners, SCAN listeners, services, grid naming service (GNS), and Oracle ASM.

Below I summarize what we’ve done with SRVCTL commands to re-configure the RAC so that the node number and instance number become aligned.

-- get the current database status
$> srvctl status database -d oradb

Instance oradb1 is running on node oranode01
Instance oradb2 is running on node oranode02
Instance oradb3 is running on node oranode04
Instance oradb4 is running on node oranode03

-- remove instance 3 and 4
-- -f force option is needed otherwise failed
$> srvctl remove instance -d oradb -i oradb3 -f
$> srvctl remove instance -d oradb -i oradb4 -f

-- add back instance with desided node
$> srvctl add instance -d oradb -i oradb3 -n oranode03
$> srvctl add instance -d oradb -i oradb4 -n oranode04

-- start instance
$> srvctl start instance -d oradb -i oradb3
$> srvctl start instance -d oradb -i oradb4

-- verify the dabase status

$> srvctl status database -d oradb
Instance oradb1 is running on node oranode01
Instance oradb2 is running on node oranode02
Instance oradb3 is running on node oranode03
Instance oradb4 is running on node oranode04

My experience of applying Oracle Jan 2018 PSU Patch 27100009

For a client, after the installation of GI 12.2 and RDBMS 12.1,12.2 software, and creating several RAC databases, I needed to apply the latest PSU patch, which is “Patch 27100009 – Grid Infrastructure Jan 2018 Release Update (GI RU) 12.2.0.1.180116”. During this process, I have encountered several issues and learned some tricks worthy of taking notes for future reference.

First of all, we really should have downloaded the latest released OPatch utility. Although in the patch readme it says “You must use the OPatch utility version 12.2.0.1.6 or later to apply this patch for all platforms”, it does not mean OPatch 12.2.0.1.6 won’t give you trouble. Actually with OPatch 12.2.0.1.6 , I got the following error and patch failed:
OPATCHAUTO-68021: The following argument(s) are required: [-wallet]

Based on the MOS doc – Creation of opatchauto wallet in 12.2 in 12.2.0.1.8 (Doc ID 2270185.1), it turned out that “Opatchauto in 12.2 requires the creation of a wallet file with password for the owner of grid software on all nodes. However, the latest version of Opatch (12.2.0.1.9 or higher) does not require wallet as a mandatory parameter with opatchauto. So to save the trouble of creating wallet, I went ahead to download the latest Opatch, which is version12.2.0.10.

Secondly, It appeared that the opatchauto depends on the inventory.xml file, we’d better make sure it only contains the homes we want to apply the patch. In my case, they are GI 12.2 home and RDBMS 12.2 home. This file is located at [inventory_loc]/ ContentsXML directory, while the [inventory_loc] can be found in the file /var/opt/oracle/oraInst.loc (Solaris) or /etc/oraInst.loc (Linux) . Below are the two examples to remove unwanted  HOMEs:
/grid/app/12.2.0/grid/oui/bin/runInstaller -silent -detachHome ORACLE_HOME="/opt/oracle/oraclex/product/12.2.0/db_1"
/grid/app/12.2.0/grid/oui/bin/runInstaller -silent -detachHome ORACLE_HOME="/opt/oracle/agent/agent_13.2.0.0.0"

Also it is necessary to make sure that the node list for a HOME is correct in the inventory.xml file, i.e., the node list section should not be empty. The syntax to update node list:

/grid/app/12.2.0/grid/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/grid/app/12.2.0/grid "CLUSTER_NODES= "

Lastly, we may have to take care of the ACFS filesystem based on the environment setting. In my environment, GI and RDBMS are not shared, but there are ACFS filesystem. Based on “Supplemental Readme – Patch Installation and Deinstallation for 12.1.0.x.x GI PSU and Database Proactive Bundle Patch (Doc ID 1591616.1)”, I had to unmount the ACFS first, start the opatchauto to patch the GI and RDBMS home together with one node at a time.
To umount the ACFS filesystem, we can proceed with the following steps:
1. Execute the following command to find the names of the CRS managed ACFS file system resource.
# crsctl stat res -w “TYPE = ora.acfs.type” -p | grep VOLUME
2. Execute the following command to stop the CRS managed ACFS file system resource with the resource name found from Step 1.
As root user execute:
# srvctl stop filesystem -d -n

What would happen if I had not stopped the acfs filesystem? You would get some error messages like below:

2018/01/31 22:39:42 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2018/01/31 22:40:27 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.
2018/01/31 22:40:32 CLSRSC-329: Replacing Clusterware entries in file '/etc/inittab'
2018/01/31 22:40:44 CLSRSC-205: Failed to uninstall ADVM/ACFS

After fixing the cause of failure Run opatchauto resume

OPATCHAUTO-68061: The orchestration engine failed.
OPATCHAUTO-68061: The orchestration engine failed with return code 1
OPATCHAUTO-68061: Check the log for more details.
OPatchAuto failed.

OPatchauto session completed at Wed Jan 31 22:40:44 2018
Time taken to complete the session 27 minutes, 3 seconds

opatchauto failed with error code 42

To fix this, I went through the following steps:
1. As root disabled CRS: crsctl disable crs.
2. reboot the node to clear the device busy condition.
3. Once the node came back, enabled crs: crsctl enable crs.
4. Performed “opatchauto resume”.
5. Checked crs, found “ora.mgmtdb” offline, did: “srvctl start mgmtdb” to start it up.
(note to myself: Management database is administered through “srvctl start/stop/status mgmtdb” commands.)
It was a good journey to be exposed to the latest PSU patching!