Gated Linear Units (GLU) and Gated CNN 03-09-2020 03-09-2020 blog 6 minutes read (About 826 words)Gated Linear Units and Gated CNN Explained Deep Learning, Machine Learning, Natural Language Processing Read more
Evolved Transformer Explained 03-07-2020 03-07-2020 blog 5 minutes read (About 774 words)Evolution Search on Transformer Neural Network Architectures Deep Learning, Machine Learning, Natural Language Processing, Neural Architecture Search, Evolution Algorithm Read more
Bilingual Evaluation Understudy (BLEU) 11-17-2019 11-17-2019 blog 11 minutes read (About 1644 words)Elucidating the Machine Translation BLEU Scoring Mechanism Natural Language Processing, Machine Translation Read more
Word2Vec Models Revisited 08-23-2019 08-23-2019 article 23 minutes read (About 3524 words)A Review on the Classic Word2Vec Models Machine Learning, Natural Language Processing, Optimization Read more
Hierarchical Softmax 08-17-2019 08-17-2019 article 8 minutes read (About 1191 words)Orchestrated Softmax for Fast Language Model Training Machine Learning, Natural Language Processing, Optimization Read more
Noise Contrastive Estimation 07-26-2019 07-26-2019 article 30 minutes read (About 4509 words)Accelerate the Training of Neural Language Models Machine Learning, Natural Language Processing, Optimization Read more
Byte Pair Encoding 07-19-2019 07-19-2019 blog 16 minutes read (About 2377 words)Common Tokenization Method for Natural Language Processing Natural Language Processing, Information Theory, Byte Pair Encoding Read more
Transformer Explained in One Single Page 06-01-2019 06-01-2019 blog 21 minutes read (About 3119 words)Attention is All You Need Mathematically Deep Learning, Natural Language Processing, Transformer, Machine Translation Read more
Maximum Likelihood Estimation of N-Gram Model Parameters 06-09-2018 06-09-2018 blog 8 minutes read (About 1220 words)Mathematical Proof of the Maximum Likelihood Estimation of N-Gram Model Parameters Natural Language Processing, Probability Read more