Sunday 22 September 2019

AWS S3 upload files | Upload file to S3 using python | Lambda upload files to S3 bucket

S3 is storage provided by AWS. It stores data inside buckets.
We can upload data to s3 using boto3 library. Here is code which also works for AWS lambda functions.

Below is sample code to upload files to S3 using python :

import json
import boto3 
import requests

access_key='your_access_key'
secret_access='your_secret_access'
region = 'your_region'
s3 = boto3.resource('s3', region_name=region, aws_access_key_id=access_key,
                                   aws_secret_access_key=secret_access)
  
def s3_uplaod(bucket, domain,content):
    file_path = '{}/{}'.format(domain,"name_of_file")
    object = s3.Object(bucket, file_path)
    object.put(Body=content)

def main(job):
    try:
        bucket = 'your_bucket_name'
        domain = 'sub-folder_inside_bucket'
        content = 'local_location_of_file'
        s3_uplaod(bucket, domain,content)
    except Exception as e:
        print(e)

def handler(event, context):
    main(event)
handler('a','a')

Notes:

  1. You can replace required parameters like keys and bucket name.
  2. domain : used where we have folder inside bucket.
  3. if path doesn't exist inside bucket it will create required path.
  4. You can skip handler part if not using code for AWS lambda functions.
You can comment below if you face any issues here.


Saturday 14 September 2019

Add Crawlera proxy to pyton script | Using crawlera proxy

Crawlera is one of proxy used to rotate our IP's. Proxies are beneficial when we are doing web extraction kind of stuff.

Below is small example in how to use crawlera proxy with our python script which also works for AWS lambda functions :

import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

class JobCreator():
      def getProxies():
          url = "http://httpbin.org/ip"
          proxy_host = "proxy.crawlera.com"
          proxy_port = "8010"
          proxy_auth = "yourauthkey"
          proxies = {"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port),
                "http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)}
          return proxies

      def call(self):
         proxies = getProxies()
         html = requests.get('https://api.myip.com', verify=False, proxies=proxies)
         print(html.content)

# Lambda Handler
def handler(event, context):
    print("Job Creation Initiated.")
    obj = JobCreator()
    obj.call(event)
    print("Jobs Created Successfully.")

#handler('test','test')
#call()

To know more on how to use AWS lambda functions yo can leave messages in below comment box.



Friday 13 September 2019

Postgres Load large file into database | Load large CSV to postgres database

We can load large files like CSV  into our postgres database usong COPY command. Below is an example to do so:

export PGPASSWORD='your_password'; psql -h hotsname -p port-U username -d database -c "\copy table_name FROM 'location of file' WITH DELIMITER '|' CSV HEADER;"

Just put in the parameters required. You can adjust the delimiter as needed like ',' or ';'


Wednesday 11 September 2019

Postgresql split column into rows | Postgres delimited split rows

Problem:

Table structure:


Roll No Name Hobbies
1 Ram Dance,Playing
2 Alex Music,Movies,Internet

Now we want to split hobbies into rows based on comma(,).

to get our results like this:

1  Ram  Dance
1  Ram  Playing
2  Alex  Music
2  Alex  Movies
2  Alex  Internet


To get this output you can use below query in postgres:

SELECT 
    Roll No, Name
    regexp_split_to_table(Hobbies, E',')  as Hobbie
FROM Tablename

Use your delimiter and replace table name with your table name.