Sign in

analytics, automation, things that stand out

A hassle-free way to create a song lyrics dataset for training generative language models

Photo by Brandon Erlinger-Ford on Unsplash

Introduction

Building datasets for language model training and fine-tuning can be very tedious. I learned this the hard way while trying to gather a conversational text dataset and a niche song lyrics dataset, both for training a single GPT-2 model. Only after several hours of semi-automated scraping and manual cleaning did…


Make sense of unstructured text data by applying machine learning principles.

Introduction

I recently completed my first machine learning project at work and decided to apply the methods used in that project to a project of my own. The project I completed at work revolved around automatically classifying textual data using Latent Dirichlet Allocation (LDA).

LDA is an unsupervised machine learning model…


Source

@marvelbeings is a simple Twitter bot I wrote in early 2019 during a vacation trip. The bot scrapes Marvel’s official characters site and tweets the Marvel character’s url link, the photo of said character found on the site, along with a boilerplate string and a few hashtags. …

Ekene A.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store