Wednesday, July 27, 2022

Gitlab 500 internal server error while accessing repos

came across this issue when restoring gitlab 8.16.4-ee to a DR site

ERROR : 500 internal server error

while accessing repo
tail -f /var/log/gitlab/gitlab-rails/production.log

OpenSSL::Cipher::CipherError (bad decrypt):
  app/models/project.rb:519:in `import_url'
  app/models/project.rb:555:in `external_import?'
  app/models/project.rb:547:in `import?'
  app/models/project.rb:563:in `import_in_progress?'
  app/controllers/projects_controller.rb:95:in `show'
  lib/gitlab/middleware/multipart.rb:93:in `call'
  lib/gitlab/request_profiler/middleware.rb:15:in `call'
  lib/gitlab/middleware/go.rb:16:in `call'
  lib/gitlab/middleware/readonly_geo.rb:29:in `call'


SOLUTION
It is pretty obvious that it is related to the gitlab secret key

"OpenSSL bad decrypt"

#1. copy /etc/gitlab/gitlab-secrets.json from old instance to new
#2. sudo gitlab-rails runner  "Project.where.not(import_url: nil).each { |p| p.import_data.destroy if p.import_data }"


Gitlab Backups:
## gitlab-rake gitlab:backup:create

You may run into this error -
Dumping PostgreSQL database gitlab_prod ... pg_dump: server version: 9.5.2; pg_dump version: 9.2.18)

just make sure you point pg_dump and psql to latest version

/opt/gitlab/embedded/bin/pg_dump --version
pg_dump (PostgreSQL) 9.6.1

lrwxrwxrwx 1 root root   49 Jan 31 17:10 pg_dump -> /opt/gitlab/embedded/postgresql/9.6.1/bin/pg_dump

lrwxrwxrwx 1 root root       46 Jan 31 17:42 psql -> /opt/gitlab/embedded/postgresql/9.6.1/bin/psql


## make sure you backup /etc/gitlab/gitlab-secrets.json and /etc/gitlab/gitlab.rb seperately

Wednesday, December 1, 2021

AWS SysOps certification story

 My AWS SysOps certification story:

10/10/21 AWS Certified Solutions Architect - Associate (SAA-C02) - cleared thru PSI
Last time I did this certification was way back in march 2017

10/12/21 AWS Certified Cloud Practitioner (CLF-C01), cleared thru PSI
Actually wanted to appear for SysOps-Associate.
Requested a voucher for PSI and then figure out the PSI doesn't offer SysOps-Associate test.
I believe that's the only test PSI doesn't offer.

10/18/21 AWS Certified SysOps Administrator - Associate (SOA-C02) - failed thru Pearson Vue
This was very interesting. I got 55 questions, I was done with all the questions and I had more 70 mins left. I was thinking may be this is the easiest exam comparatively. I took my own time to revise all the questions twice. I had about 40mins and decided to submit/close it. Once I submitted it said you have 3 labs for 20mins each and you cannot go back to the lab once submitted. I was not prepared for these labs and had not idea that this test had labs.
I somehow started with the lab, couldn't able to follow all the steps in the first lab. I had only 20mins left for the next two labs
Luckily the 2nd and 3rd labs were easy enough and still had 9 mins left but couldn't go back to the first to fix it.

So, there goes my first failed attempt and now I have wait for 2 weeks before I can attempt this test again. Apparently this is the only test that has labs as of now

Exam labs make up 20% of your total score. A passing score is 720 out of 1,000.

Another thing about this test, or rather not good and frustrating about this test is - it doesn't tell you if you passed or failed right away. It takes couple of days, as per the official statement, it may take upto 5 business days. Most likely because of labs, someone has to review the labs steps and sanctity.

11/18/21 AWS Certified SysOps Administrator - Associate (SOA-C02) - no show thru Pearson Vue
I had to reschedule my test multiple times due to conflict with other meetings and things to do.
It was a 8am test, I login to start the test and find out there is nothing for me to start. I panicked, started calling the Pearson support number (866-207-9983)
They told me, this test was scheduled for 11/12/21 and I didn't show up :-)
I however reschedule my 11/12/21 test to 11/18/21 but it didn't go thru or not sure if I never clicked submit to finalize.
Luckily I had captured the snapshot of the last page of exam time. 
I created a ticket with Pearson Vue Case: 07312581, and after few email exchange they cancelled that exam and sent me the refund and restored my voucher.
Please always make sure you have the confirmation email of the exam and also login once before the exam and check the schedule is updated.


11/19/21 AWS Certified SysOps Administrator - Associate (SOA-C02) - failed/void thru Pearson Vue
This test went well, unless I reached lab2.
In Lab2 I couldn't able to edit values in the template, was not able to select the words properly to edit.
Couldn't able type numbers in the notepad and many other issues.
Created a support case with AWS CASE 9232500091, they asked me to create a case with Pearson Vue.
Created a case with Pearson Case ID: 07321951

The response from Pearson after few days of review.
"First and foremost, we want to sincerely apologize for any inconvenience you may have faced during your exam.
A case has been submitted on your behalf to AWS, and you can expect to receive an update from AWS shortly.
Our team would like to fix this by offering you a customer service voucher code that can be used to schedule
your next exam at no additional charge to you. "

Once they make the exam void and 2 weeks wait period doesn't apply.


11/30/21
- AWS Certified SysOps Administrator - Associate (SOA-C02) - cleared thru Pearson Vue
Finally I was able re-schedule my 12/3/21 to 11/30/21 to finish it

Bottomline, it creates a huge bump in planning ahead if you fail the test or do not pay enough attention to scheduling or test pattern details. And other thing is this can be a very difficult exam if you are NOT hands-on. The labs are not very difficult but it has its own performance and layout challenges when you are working thru the restricted browser.  And finally I would recommend tutorialsdojo.com for labs

Hope you will not make these mistakes I made

Tuesday, April 27, 2021

ansible error with python boto3

 Ansible Error:

TASK [provision-ai-vpc : Create VPC] ************************************* task path: /Users/pankajgautam/infra/ansible/roles/provision-ai-vpc/tasks/main.yml:14 redirecting (type: modules) ansible.builtin.ec2_vpc_net to amazon.aws.ec2_vpc_net fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed to import the required Python library (botocore or boto3) on Veevas-MacBook-Pro.local's Python /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python. Please read the module documentation and install it in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter"} PLAY RECAP ********************************************************************* localhost : ok=1 changed=0 unreachable=0 failed=1 skipped=3 rescued=0 ignored=0 


It seems like Mac comes with python2.7 and cannot be touched/removed

pankajgautam@Veevas-MBP ~ % /usr/bin/python --version

Python 2.7.16


If you install awscli it comes with its own python

pankajgautam@Veevas-MBP ~ % /usr/local/aws-cli/aws --version

aws-cli/2.1.37 Python/3.8.8 Darwin/20.4.0 exe/x86_64 prompt/off


You can install python with pip or pip3 that generally installs under /Library

pankajgautam@Veevas-MBP ~ % ls  /Library/Python/ 

2.7 3.8


Then you can also install python using brew that gets installed under /usr/local/Cellar

pankajgautam@Veevas-MBP ~ % ls -l /usr/local/Cellar/ | grep python

drwxr-xr-x  3 pankajgautam  admin  96 Apr 21 12:58 python@3.9




You can specify which python to be used with ansible using ansible_python_interpreter

and also make sure you have boto3 module imported in that python version

 

AWS_PROFILE=aiprod ansible-playbook -e "aws_region=us-east-1 release_type=gr ansible_python_interpreter=/usr/local/Cellar/python@3.9/3.9.4/bin/python3" -i ansible/inventory/prod ansible/provision-ai-vpc-pb.yml -vv 2>&1 | tee provision.log 


pankajgautam@Veevas-MBP % sudo pip3 install boto3 

WARNING: The directory '/Users/pankajgautam/Library/Caches/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. Collecting boto3 Downloading boto3-1.17.54-py2.py3-none-any.whl (131 kB) |████████████████████████████████| 131 kB 640 kB/s 

Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.9/site-packages (from boto3) (0.10.0) 

Collecting botocore<1.21.0,>=1.20.54 

Downloading botocore-1.20.55-py2.py3-none-any.whl (7.4 MB) |████████████████████████████████| 7.4 MB 545 kB/s Collecting s3transfer<0.5.0,>=0.4.0 Downloading s3transfer-0.4.1-py2.py3-none-any.whl (79 kB) |████████████████████████████████| 79 kB 16.1 MB/s Collecting urllib3<1.27,>=1.25.4 Downloading urllib3-1.26.4-py2.py3-none-any.whl (153 kB) |████████████████████████████████| 153 kB 18.2 MB/s Collecting python-dateutil<3.0.0,>=2.1 Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB) |████████████████████████████████| 227 kB 16.1 MB/s Collecting six>=1.5 

Downloading six-1.15.0-py2.py3-none-any.whl (10 kB) Installing collected packages: six, urllib3, python-dateutil, botocore, s3transfer, boto3 Successfully installed boto3-1.17.54 botocore-1.20.55 python-dateutil-2.8.1 s3transfer-0.4.1 six-1.15.0 urllib3-1.26.4 


pankajgautam@Veevas-MBP infra % /usr/local/Cellar/python@3.9/3.9.4/bin/python3

Python 3.9.4 (default, Apr 5 2021, 01:49:30) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin Type "help", "copyright", "credits" or "license" for more information. 

>>> import boto3 >>> 


Thursday, February 6, 2020

migrating to (aws nitro based system) machine type m4 to m5


Tested the conversion with RHEL8 and AmzonLinx2
from m4=>m5 and r4=>r5


As of now following instances are based on the Nitro system:
A1, C5, C5d, C5n, G4, I3en, Inf1, M5, M5a, M5ad, M5d, M5dn, M5n, p3dn.24xlarge, R5, R5a, R5ad, R5d, R5dn, R5n, T3, T3a, and z1d

Before changing your instance to a nitro based system, make sure
1. The (ENA) elastic network adapter is installed and enabled for the instance.
2. The NVMe driver is installed on the instance and is loaded in the initramfs image of the instance.
3. Use "/etc/fstab" to mount the file systems using UUID/Label.

EBS volumes
[root@pod ~]# lsblk
NAME    MAJ:MIN  RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0     0  200G  0 disk
├─xvda1 202:1     0    1M  0 part
└─xvda2 202:2     0  200G  0 part /
xvdd    202:48    0  500G  0 disk /data
xvde    202:80    0  500G  0 disk /data/logs
xvdf    202:160   0  500G  0 disk /data/active

NVMe EBS volume
[root@pod ~]# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1     259:8    0  200G  0 disk
├─nvme0n1p1 259:9    0    1M  0 part
└─nvme0n1p2 259:10   0  200G  0 part /
nvme1n1     259:3    0  500G  0 /data
nvme2n1     259:4    0  500G  0 /data/logs
nvme3n1     259:2    0  500G  0 /data/active

NVMe map command:
# /sbin/ebsnvme-id /dev/nvme1n1
Volume ID: vol-0fb8db1fddd1f5834
xvde

Suggested /etc/fstab
UUID=35e495d3-a1df-40d9-bb6c-e37b81aec11f /data xfs noatime 0 0
UUID=075bd9b0-370e-48b1-9de2-4d061d16ca5b /data/logs xfs noatime 0 0
UUID=5e5fde64-1346-45e0-9eeb-6c68be8b81bd /data/active xfs noatime 0 0

Commands to check instance if its ready to changed to nitro system

lsinitrd /boot/initramfs-$(uname -r).img|grep nvme
modinfo nvme
modinfo ena
dracut -f -v



Lesson learned:

Currently AWS takes care of these volume device name change from 4 series instance to 5 series nitro based instances on the amazon linux 2 only. And the same steps should also work with Redhat hosts if we apply the udev rule and python script.

With amazon linux 2 the magic code that is creating the symlinks from /dev/xvdb, or /dev/xvdc devices to /dev/nvme1n1 or /dev/nvme2n1 is by the udev rules

In this file "/etc/udev/rules.d/70-ec2-nvme-devices.rules",

and more specifically this line KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/sbin/ebsnvme-id -u /dev/%k", SYMLINK+="%c"
If we read that - this is saying, when a device is attached to the system, that has a kernel device name that matches the pattern given, run this program and create a symlink with the output of that program.

Is a python script (/sbin/ebsnvme-id) to get the underlying mount information then to create a sym link, 


One thing to keep in mind here is that this only works if the device presented to amazon linux 2 hosts are in /dev/xvdX format and NOT /dev/sdb or /dev/sdc which is an old format. In that case you may have tweak the rules to get the results you desire


Recommendation from  c5_m5_checks_script
OK     NVMe Module is installed and available on your instance
OK     ENA Module with version 2.0.2K is installed and available on your instance


Printing correct fstab file below:
# /etc/fstab
UUID=a727b695-0c21-404a-b42b-3075c8deb6ab /                       xfs     defaults        0 0
UUID=587cde86-c167-4e73-92cb-b67739d9991d /data/logs xfs  noatime 0 0
UUID=6410c3de-3714-4198-b31b-afeca374ef43 /data/active xfs  noatime 0 0