Monday 8 May 2023

What did I learn today?

Welcome to the what did I learn today series. The intention of this blog spot is to compose the stuff that I learnt day-to-day basics and journal it here.  Please follow the series and provide your support.


See you in the series.

Sunday 27 March 2022

Minimum Number of cars

A group of friends is going on holiday together. They have to come a meeting point(the start of the journey) using N cars. There are P[K] people and S[K] seats in the K-th car for K in range [0 .. N-1]. Some of the seats in the cars may be free, so it is possible for some of the friends to change the car they are in. The friends have decided in order to be ecological, they will leave some cars parked at the meeting point and travel with as few cars as possible.  


Given to arrays P and S, consisting of N integers each, returns the minimum number of cars needed to take all of the friends on holiday.


Example:


1. Given P = [1,4,1] and S = [1, 5, 1], the function should return 2. A person from car number 0 can travel in car number 1 instead. This way, car number 0 can be left parked at the meeting point.


2. Given P = [4,4,2,4] and S = [5,5,2,5], the function should return 3. One person from car number 2 can travel in car number 0 and the other person from car number 2 can travel in car number 3.


3. Given P = [2,3,4,2] and S = [2,5,7,2], the function should return 2. Passengers from car number 0 can travel in car number 1 and passengers from car number 3 can travel in car number 2.


Write an efficient algorithm for the following assumptions:


class Solution {
  public int solution(int []P, int[] S) {
     Arrays.sort(S);
     int sum = 0;
     for(int people: P) sum += people;
     S = descending(S);
     int total = 0;
      int i;
      for(i = 0;  i < S.length && seatSum < sum; i++) {   
        seatSum += S[i]; 
      }    
    return i;
  }
} };


Thursday 28 November 2019

Postgresql PUB/SUB

Note: The blog is copied over(used for my reference) here https://citizen428.net/blog/asynchronous-notifications-in-postgres/


I’m fascinated by Postgres: the more I learn about it, the more I realize how much I still don’t know. Recently I discovered its asynchronous communication capabilities, which apparently have been around for a long time ¯\(ツ)
Let’s look at the two most interesting commands related to this topic, NOTIFY and LISTEN. Here’s what the documentation has to say on them:
NOTIFY provides a simple interprocess communication mechanism for a collection of processes accessing the same PostgreSQL database. A payload string can be sent along with the notification, and higher-level mechanisms for passing structured data can be built by using tables in the database to pass additional data from notifier to listener(s).
Whenever the command NOTIFY channel is invoked, either by this session or another one connected to the same database, all the sessions currently listening on that notification channel are notified, and each will in turn notify its connected client application.
LISTEN registers the current session as a listener on the notification channel named channel. If the current session is already registered as a listener for this notification channel, nothing is done.
Sounds like publish-subscribe on the database level, interesting! I learn best by trying things out and writing some code, so let’s dive in.

Setting up Postgres for notifications

For testing purposes, let’s create an overly simplified orders table, that except for the primary key also contains an email address to identify the person who placed the order and a bigint field to store the total order amount in cents:
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  email TEXT NOT NULL,
  total BIGINT NOT NULL
);
Next we need to define a function which returns a trigger:
CREATE OR REPLACE FUNCTION notify_event() RETURNS TRIGGER AS $$
  DECLARE
    record RECORD;
    payload JSON;
  BEGIN
    IF (TG_OP = 'DELETE') THEN
      record = OLD;
    ELSE
      record = NEW;
    END IF;

    payload = json_build_object('table', TG_TABLE_NAME,
                                'action', TG_OP,
                                'data', row_to_json(record));

    PERFORM pg_notify('events', payload::text);

    RETURN NULL;
  END;
$$ LANGUAGE plpgsql;
The above is pretty straightforward:
  1. Declare some variables for later use.
  2. Switch on the TG_OP special variable to decide which version of the row we want to serialize.
  3. Use json_build_object and row_to_json to generate the notification payload.
  4. Use pg_notify to broadcast a message on the events channel.
  5. Return NULL since this is an AFTER trigger.
Now we can create a notify_order_event trigger, which will call this function after we perform a CRUD operation on the orders table:
CREATE TRIGGER notify_order_event
AFTER INSERT OR UPDATE OR DELETE ON orders
  FOR EACH ROW EXECUTE PROCEDURE notify_event();
With this in place we should now be able to receive events. Let’s inform Postgres that we’re interested in notifications on the events channel:
LISTEN events;
Now whenever we insert, update or delete a record we will receive a notification:
INSERT into orders (email, total) VALUES ('test@example.com', 10000);
INSERT 0 1
Asynchronous notification "events" with payload "{"table" : "orders", "action" : "INSERT", "data" : {"id":1,"email":"test@example.com","total":10000}}" received from server process with PID 5315.
Great, we just received our first asynchronous notification, though admittedly that’s not particularly useful within the same psql session, so let’s add another listener.

Listening from another process

For the following example we’ll once again use Jeremy Evan’s excellent Sequel gem:
require 'sequel'

DB = Sequel.connect('postgres://user@localhost/notify-test')

puts 'Listening for DB events...'
DB.listen(:events, loop: true) do |_channel, _pid, payload|
  puts payload
end
The above code first connects to the database and then uses Sequel::Postgres::Database#listen to listen for events in a loop.
If we start this script and insert a record in our database the JSON payload will get output to the console:
→ ruby test.rb
Listening for DB events...
{"table" : "orders", "action" : "INSERT", "data" : {"id":2,"email":"test@example.com","total":10000}}

Sunday 16 October 2016

ElasticSearch Snapshot and Backup onto GCE

Hello, everyone as the title suggests the following post describe how to backup ElasticSearch Snapshot to GCE (Google Compute Engine).

Why GCE and not Amazon S3?
Because all are stack are listed in Google Compute. :)

Anyway , before I begin.

SPOILER ALERT:  The backup over GCE is only compatible with ElasticSearch Version 5.0+. So if you are using a lower version of ElasticSearch this post will probably not help you much. Perhaps the easiest solution in the above case would be to backup the snapshot locally and then move it over to Google Compute Engine using gcloud command line utility.

Ok, here we go.

First a little background.

We being using ELK to monitor our application logs. Now, the application logging is so heavy that by the end of the month we mostly run into the low (system) space problem. And the only thing we can do (when this happen) is to delete the old indices (so as to free some system space) without affecting our operation. But deleting the indices is not a good solution(atleast without backup) since we could never recover to the old state (If we ever want to).

And btw, I have to admit this, I have procrastinated this task for quite a while(mainly due to other commitment). But not until the day when our ELK stack went down due to low space constraint and fixing it became the order of the day.

Ok, those who don't know, ElasticSearch provide out the box snapshot support (which is quite amazing) plus it also provides a way to back it up with version.

First,

Download ElasticSearch : You need to have ElasticSearch-5.0+ (download it from here)

- Download GCE Plugin :  Next step would be to install the google repository plugin i.e repository-gcs Just follow the below command.

./bin/elasticsearch-plugin install repository-gcs

- Creating a Bucket: Assuming that you already have a Google Account Setup. Next step would involve creating a bucket (where you need to backup the ElasticSearch snapshot)

  1. Connect to the Google Cloud Platform Console.
  2. Select your project.
  3. Got to the Storage Browser.
  4. Click the "Create Bucket" button.
  5. Enter the bucket name.
  6. Select a storage class.
  7. Select a location.
  8. Click the "Create" button.
The plugin supports couple of authentication mode

Compute Engine authentication: This mode is recommended if your Elasticsearch node is running on a Compute Engine virtual machine.

Service Account: The  authentication mode.

For the sake of this post, we would be covering the Service account. But if you are interested in Compute Engine Authentication you can read more about it from here.

To work with the Service Account we first need to create a service account in Google Compute.

One can create the Service Account under IAM & ADMIN section -> Service Account.

Upon creating the Service Account download the given JSON file and move it into the config directory (I named the file as service-acc.json)

- Repository: Before we can start the backup(to GCE) we need to create a snapshot repository.

curl -XPUT 'localhost:9200/_snapshot/GceRepository?pretty' -d '
{
    "type": "gcs",
    "settings": {
      "bucket": "elkp",
      "service_account": "service-acc.json"   
    }
}'
{acknowleged: true}

Confirming the same.

curl -XGET 'localhost:9200/_snapshot/_all?pretty' 
{
  "GceRepository" : {
    "type" : "gcs",
    "settings" : {
      "bucket" : "elkp",
      "service_account" : "service-acc.json"
    }
  }
}

- Snapshot(ting) & BackupWith all done.Now, we are ready to backup the snapshot onto GCE.
curl -XPUT 'localhost:9200/_snapshot/GceRepository/snapshot_1?wait_for_completion=true'

A note on wait_for_completion extracted from here

“ The wait_for_completion parameter specifies whether or not the request should return immediately after snapshot initialization (default) or wait for snapshot completion. During snapshot initialization, information about all previous snapshots is loaded into the memory, which means that in large repositories it may take several seconds (or even minutes) for this command to return even if thewait_for_completion parameter is set to false - Straight from ElasticSearch.

- Restore(ing) : At last a note on restoring the snapshot. Well, even that quite easy as well. 

curl -XPOST 'http://localhost:9200/_snapshot/GceRepository/snapshot_1/_restore'

Note: As mentioned on Elasticsearch guide.
- A snapshot of an index created in 2.x can be restored to 5.x.

But I think the reverse is not true, at least when I tested it. (correct me, If I'm wrong).

- Other Useful Commands: There are few other commands that are good to know.

## status for a currently running snapshot
GET /_snapshot/_status

## status for a given repository
GET /_snapshot/GceRepository/_status

## status for a given snapshot id.
GET /_snapshot/backups/GceRepository/snapshot_1/_status

## deleting a snapshot
DELETE /_snapshot/GceRespository/snapshot_1

I will encourage you to please go through ElasticSearch guide on Repository and Backup for more information on it.

And btw, if I haven't mentioned this yet. ElasticSearch has seriously amazing documentation. You must check it out its spot on.

Hope that helped. See you later.

Thanks.


Friday 8 July 2016

File test operators in Bash

Following are the file test operators in Bash.

-e  -> file exists
-a -> file exists

This is identical in effect to -e. It has been "deprecated," [1] and its use is discouraged.

-f -> file is a regular file (not a directory or device file)
-s -> file is not zero size
-d -> file is a directory
-b -> file is a block device
-c -> file is a character device

device0="/dev/sda2"    # /   (root directory)
if [ -b "$device0" ]
then
  echo "$device0 is a block device."
fi

# /dev/sda2 is a block device.



device1="/dev/ttyS1"   # PCMCIA modem card.
if [ -c "$device1" ]
then
  echo "$device1 is a character device."
fi

# /dev/ttyS1 is a character device.

-p -> file is a pipe

function show_input_type()
{
   [ -p /dev/fd/0 ] && echo PIPE || echo STDIN
}

show_input_type "Input"                           # STDIN
echo "Input" | show_input_type                    # PIPE

# This example courtesy of Carl Anderson.

-h -> file is a symbolic link
-L -> file is a symbolic link
-S -> file is a socket
-t -> file (descriptor) is associated with a terminal device

This test option may be used to check whether the stdin [ -t 0 ] or stdout [ -t 1 ] in a given script is a terminal.

-r -> file has read permission (for the user running the test)
-w -> file has write permission (for the user running the test)
-x -> file has execute permission (for the user running the test)
-g -> set-group-id (sgid) flag set on file or directory

If a directory has the sgid flag set, then a file created within that directory belongs to the group that owns the directory, not necessarily to the group of the user who created the file. This may be useful for a directory shared by a workgroup.

-u -> set-user-id (suid) flag set on file

A binary owned by root with set-user-id flag set runs with root privileges, even when an ordinary user invokes it. [2] This is useful for executables (such as pppd and cdrecord) that need to access system hardware. Lacking the suid flag, these binaries could not be invoked by a non-root user.

          -rwsr-xr-t    1 root       178236 Oct  2  2000 /usr/sbin/pppd

A file with the suid flag set shows an s in its permissions.

-k -> sticky bit set

Commonly known as the sticky bit, the save-text-mode flag is a special type of file permission. If a file has this flag set, that file will be kept in cache memory, for quicker access. [3] If set on a directory, it restricts write permission. Setting the sticky bit adds a t to the permissions on the file or directory listing. This restricts altering or deleting specific files in that directory to the owner of those files.

          drwxrwxrwt    7 root         1024 May 19 21:26 tmp/

If a user does not own a directory that has the sticky bit set, but has write permission in that directory, she can only delete those files that she owns in it. This keeps users from inadvertently overwriting or deleting each other's files in a publicly accessible directory, such as /tmp. (The owner of the directory or root can, of course, delete or rename files there.)
-O -> you are owner of file
-G -> group-id of file same as yours
-N -> file modified since it was last read
f1 -nt f2
file f1 is newer than f2
f1 -ot f2
file f1 is older than f2
f1 -ef f2
files f1 and f2 are hard links to the same file
!
"not" -- reverses the sense of the tests above (returns true if condition absent).

Friday 24 June 2016

Working with logstash



Logstash a centralized tool to collect and aggregate logs. It is so intuitive and it's configuration are so easy to understand that you would just love it.

The post describes how to work with Logstash and Logstash configuration.

In nut shells, Logstash is composed of three main components.

  1. Input
  2. Filter
  3. Output


- Input :  What is the medium/source through which Logstash would receive your log events.

A valid input source could be stdin,tcpudpzeromq etc. In fact, Logstash has a wide range of input tools which you can choose from.(to get full list input plugin click here)

The input block essentially looks like this.

input {
   stdin {
      codec => 'plain'
    }
}



- Output : The source or medium to which the Logstash would send or store it's event.

Just like input Logstash provide a wide range of Output plugin as well.

The vanilla output block looks like this -

output {
   stdout {
      codec => 'rubydebug'
   }
}

If you really aren't considering to perform any filtration on data or log message you receive, most of the times the above blocks(input and output) is sufficient to start with Logstash.

Note: We are making a minor adjustment in our working example. Instead of using the stdin we would be using tcp as the input plugin.

A final look at our configuration 

## logstash.conf
 input {
   tcp {
      port => '5300'
   }
}

output {
   stdout {
      codec => 'rubydebug'
   }
}

Testing Configuration -

logstash -f logstash.conf --configtest


Loading the LogStash

logstash -f logstash.conf


You might get a little help from the below screenshots to understand how Logstash output looks like.


Note: I had used Telnet to send logs to Logstash.

@timestamp: An ISO 8601 timestamp.
message: The event's message. 
@version: the version of the event format. The current version is 1.
host: host from which the message / event's was sent. 
port: port of the client.

- Filter Filter plugin, are used to massage(filter) the logs(if needed) so that one can modify the received log message before output(ting) it via output plugin.

A simple filter block look like this. (we will explore this in our next example)

filter {
   grok {
     ## grok filter plugin 
   }
}

To explain the power of Logstash, let us just work with a demo example.

Here we have an application which generates logs of various types

  •  Custom debugging logs.
  •  SQL logs etc.
Example.

[20-JUN-2016 14:00:23 UTC] Received Message

[20-JUN-2016 14:00:24 UTC] Before query the IP Address
(1.0ms)  SELECT "ip_addresses"."address" FROM "ip_addresses" WHERE "ip_addresses"."resporg_accnt_id" = 3
[20-JUN-2016 14:00:24 UTC] After query the IP Address
[20-JUN-2016 14:00:24 UTC] The Ip address found is X.X.X.X

[20-JUN-2016 14:00:27 UTC] Quering ResporgID
ResporgAccountId Load (2.0ms)  SELECT resporg_account_ids.*, tfxc_fees.fee as fee FROM "resporg_account_ids" AS resporg_account_ids LEFT JOIN ip_addresses ON resporg_account_ids.id = ip_addresses.resporg_accnt_id LEFT JOIN tfxc_fees ON resporg_account_ids.id = tfxc_fees.resporg_account_id_id WHERE "resporg_account_ids"."active" = 't' AND (((ip_addresses.address = 'x.x.x.x' AND ip_addresses.reserve = 't') AND ('x.x.x.x' = ANY (origin_sip_trunk_ip))) OR (resporg_account_ids.resporg_account_id = 'XXXX') OR (resporg_account_ids.resporg_account_id = 'XXXX'))
[20-JUN-2016 14:00:27] Resporg ID is TIN

[20-JUN-2016 14:00:29 UTC] Querying Freeswitchinstance 
FreeswitchInstance Load (1.0ms)  SELECT  "freeswitch_instances".* FROM "freeswitch_instances" WHERE "freeswitch_instances"."state" = 'active'  ORDER BY "freeswitch_instances"."calls_count" ASC, "freeswitch_instances"."average_system_load" ASC LIMIT 1
[20-JUN-2016 14:00:29 UTC] FreeswitchInstance is IronMan.

[20-JUN-2016 14:00:29 UTC] Get the individual rate
IndividualCeilingRate Load (0.0ms)  SELECT  "individual_ceiling_rates".* FROM "individual_ceiling_rates" WHERE "individual_ceiling_rates"."resporg_account_id_id" = 7 AND "individual_ceiling_rates"."originating_resporg_id" = 3 LIMIT 1
[20-JUN-2016 14:00:29 UTC] The individual rate is 20

[20-JUN-2016 14:00:30 UTC] Query the individual rate
Rate Load (1.0ms)  SELECT  "rates".* FROM "rates" WHERE "rates"."resporg_account_id_id" = 3 LIMIT 1
[20-JUN-2016 14:00:30 UTC] The Selected rate is 40 


Now, we need our system to output(or store) the logs based on their type(SQL and Custom type)

This is where the power the Filter(plugin) outshine.


GROK filter plugin

A closer look at filter(grok) plugin suggests that one can add a regex for the incoming log events(for filtering).

Note: Grok has a wide range of regex pattern (120+) that you can choose from. But it's power is not limited to predefined regex pattern. In fact, one can provide a custom regex pattern as well (like in our case)

In our cases, we can apply regex on either SQL or Custom logs(we are choosing SQL message) and then segregate them.

Note. If you need help building patterns to match your logs, you will find the grokdebug and grokconstructor application quite useful.

The Regex -




Let's define our configuration now.

## input the log event via TCP.
input {
   tcp {
      port => '5300'
   }
}

filter {
  ## apply this filter only to log event of type custom
  if ([type] == "custom") {
    grok {
       ## load your custom regex pattern 
       patterns_dir => "./pattern"
       ## Compare the message with the you applied regex
       match => { "message" => "%{ARSQL:sql}" }
       ## if the message matched the given regex apply a field called "grok" match
       add_field => {"grok" => "match"} 
    }

  ## if the field has a grok match, which means that  above regex match
   if ([grok] == 'match') {
      ## apply mutate filter plugin to replace the type from CUSTOM to SQL
      mutate {
        replace => {"type" => "sql"}
        ##  remove the grok field that was added in the earlier filter
        remove_field => ["grok"]
       }
    }
  }
}

## output plugin. For now we will be using rubydebug but we can every easily used any of the output plugin 
output {
   stdout {
      codec => 'rubydebug'
   }
}


Let examine output


{
       "message" => "Received Message",
      "@version" => "1",
    "@timestamp" => "2016-06-20T14:00:23.320Z",
          "host" => "werain",
          "type" => "custom" ## custom tag
}

{
       "message" => "(1.0ms)  SELECT "ip_addresses"."address" FROM "ip_addresses" WHERE "ip_addresses"."resporg_accnt_id" = 3
",
      "@version" => "1",
    "@timestamp" => "2016-06-20T14:00:24.520Z",
          "host" => "werain",
          "type" => "sql" ## we have successfully managed to change the type to sql(from custom) based 
                          ## on the grok regex filteration
}


Notice the type sql being mutated(replaced) in place of custom type.

Note:  Well if that is not enough you can ask a LogStash to filter the event from an external program.If you want you simply try my demo example and LogStash configuration defined over here and here


That all folks. I hope I manage to do justice to the amazing library called LogStash which has simplified my tasks of log-management to such ease.

Thanks.


What did I learn today?

Welcome to the what did I learn today series. The intention of this blog spot is to compose the stuff that I learnt day-to-day basics and jo...