Blog

Categorical and Numeric Data in Scikit-Learn Pipelines

I always tend to organize every aspect of my experiments with organizers as useful as Pipeline. However, one shouldn’t be passing continuous variables into a OneHotEncoder or vice versa for Scalers. The solution is, split your data, treat them in separate pipelines before merging them together again. Inspired by Scikit Learn Examples.

Continue reading Categorical and Numeric Data in Scikit-Learn Pipelines

Counting Weekends between Two Dates in PostgreSQL

I found myself the problem of counting the occurrence of specific “days of the week” between two dates; for, of course, generating features for a predictive analysis task. For example, the number of Fridays and Saturdays between 2019-01-01 and 2019-01-15. And thankfully, good old PostgreSQL came to the rescue!

Continue reading Counting Weekends between Two Dates in PostgreSQL

Collecting Documents for Transferring from NSU

Transferring your credits from North South University might seem like a huge bureaucratic mess. But it is not. This post will guide you through the process of collecting the necessary papers for the process.

Making your ID “0”

Your NSU ID has a few parts.

181 0000 6 42

The 8-th digit is our digit of interest. If it is 6, you need to make it 0 before proceeding with anything. Visit the Office of Records (Admin, Level-3) and ask them for the form to perform the ID change. You’ll need to have:

  • SSC Certificate (Original + 1 Photocopy)
  • HSC Certificate (Original + 1 Photocopy)
  • SSC Transcript (Original + 1 Photocopy)
  • HSC Transcript (Original + 1 Photocopy)

The officer will return the originals after verification. Your ID will get updated in 5 ~ 7 days depending on the workload at the office.

Requesting Documents

Typically you’ll need the “Official Transcript” and the “Medium of Instruction Certificate”. They cost 300 and 100 BDT respectively.

  • You can make the payment online from the RDS. Payments -> Online Payment.
  • Or, you can collect the necessary forms from the Office of the Controller of Examinations. (Level-3, Admin Building)

Make sure that your ID has been ‘0’-ed before you proceed with this step. Attach a photocopy of your SSC Certificate with each of the forms.

Congratulations and Good luck!

GP launched 013: Update your RegExp!

Grameenphone quietly rolled out its 013 series of numbers and it wrecked everything. No web service, including Google is working with this 013 series of numbers. Its time for developers to fix this mess and update every validation logic you may have for validating mobile numbers or MSISDNs.

I enlisted a few RegExps for de-terrorizing you, even if a bit.

01[35-9]\d{8}

Enough reading, update your RegExps now!

Flask-like “global” request context in Sanic (asyncio)

Although something like Flask’s globally accessible request object is considered a terrible way of writing code (explicit is better than implicit), sometimes it makes sense to use it. For example, while passing a Correlation-ID to track a request’s life cycle through your micro-services.

You can memorize the Correlation-ID throughout the lifecycle of a request without explicitly passing it around like juggling balls. This is actually a good approach as the Correlation ID is not a core business logic – just a distraction. We’ll see how we can implement such a request-bound “global” context in Sanic, and how to setup a simple Correlation ID implementation.

aiotask-context

This nifty Python module can maintain a distinct context against each asyncio Task. Which means, each request can have an associated context we can use to store and pass around passive details.

$ pip install aiotask-context
from sanic import Sanic
from sanic.response import json

# import aiotask-context
import aiotask_context as context

app = Sanic()

# hook aiotask-context
@app.listener('after_server_start')
async def hook_context(app, loop):
    loop.set_task_factory(context.task_factory)

@app.route("/")
async def test(request):
    return json({"hello": "world"})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8000)

Now we have a simple Sanic app with the context hooked up.

Implementing Correlation ID Generation and Passing

Now, let’s grab the correlation ID if it comes with the request, or generate our own otherwise. Afterwards, we need to save that value to the context for future use in other parts of the code (i.e. logging, requests to other microservices and so on).

from uuid import uuid4
import aiotask_context as context

@app.middleware('request')
async def handle_correlation_id(request):
    cid = request.headers.get('X-Correlation-ID') or str(uuid4())
    context.set('cid', cid)

And throughout the code, if you need the cid or any context value you have set, simply use context.get(key).

And the last step is definitely all about responding with the Correlation ID. We’ll be sticking to the middleware for this too. We just need to update the response object from the context.

@app.middleware('response')
async def insert_correlation_id(request, response):
    response.headers["X-Correlation-ID"] = context.get('cid')

Great! Now we don’t have to write spaghetti code and get lost in passing CIDs from functions to functions. Cheers!

pyenv build depencies on Ubuntu

pyenv can get particularly annoying during building Python. Dependencies are often missing resulting in failed builds. This short blog post is outlining the required dependencies to reduce headaches for future me, and anyone else reading this.

You have probably come here seeing errors like:

zipimport.ZipImportError: can't decompress data; zlib not available
WARNING: The Python readline extension was not compiled. Missing the GNU readline lib?
WARNING: The Python bz2 extension was not compiled. Missing the bzip2 lib?
WARNING: The Python sqlite3 extension was not compiled. Missing the SQLite3 lib?
ERROR: The Python ssl extension was not compiled. Missing the OpenSSL lib?

Well, just install these and you’ll be good to go.

$ sudo apt-get build-dep python3
$ sudo apt-get install libreadline-dev libsqlite3-dev bz2-dev libssl-dev zlib1g-dev libffi-dev

You’re welcome.

Viewing OpenCV matrices with matplotlib (w/ Jupyter Notebook)

While working with OpenCV, constantly popping up a new Window to view the results might not be the most effective way of work. However, we can minimize the burden if we are working on Jupyter Notebook; with matplotlib. Essentially we plot the matrices with matplotlib.pyplot.imshow.

Continue reading Viewing OpenCV matrices with matplotlib (w/ Jupyter Notebook)

Reading Bangladeshi NID and Smart Cards with ZXing

Bangladeshi NID and Smart Cards come printed with a 2D Datamatrix Barcode, known as PDF417. The information on the cards can be extracted without using an OCR solution through the barcode. We’ll be using Google’s ZXing library; learn about the basics of using it. Continue reading Reading Bangladeshi NID and Smart Cards with ZXing