Displaying CJK Characters in Matplotlib Plots

Matplotlib by default does not support displaying Unicode characters such as Chinese, Japanese and Korean characters. This post introduces two different methods to allow these characters to be shown in the graphs. The issue here is that we need to configure Matplotlib to use fonts that support the characters that we want to display. To configur...

Generating N-grams from Sentences in Python

N-grams are contiguous sequences of n-items in a sentence. N can be 1, 2 or any other positive integers, although usually we do not consider very large N because those n-grams rarely appears in many different places. When performing machine learning tasks related to natural language processing, we usually need to generate n-grams from input sen...

Making pandas Operations Faster

pandas is one of the most commonly used Python library in data analysis and machine learning. It is versatile and can be used to handle many different types of data. Before feeding a model with training data, one would most probably pre-process the data and perform feature extraction on data stored as pandas DataFrame. I have been using pandas e...

