Here you will find an analysis of fine-tuning RoBERTa for content moderation on Reddit.

Let’s take this step by step.

First step: Data

  • Data is scraped from the private moderator logs of /r/nba, filtering specifically for comments that were either removed or approved
  • The dataset consists of the full text of 20,000 user comments (10,000 approvals and 10,000 removals), with the binary labels “approvecomment” and “removecomment” which are translated into 0 and 1, respectively.

Second step: Model

  • BiLSTM: Baseline model
    A BiLSTM uses two LSTMs to learn each token of the sequence based on both the past and the future context of the token. One LSTM processes the sequence from left to right, the other one from right to left.
    Achieved accuracy of 0.7466 on validation dataset

  • RoBERTa: Experimental model
    RoBERTa is a robustly optimized method for pretraining NLP systems that improves upon the BERT model. RoBERTa builds on BERT’s language masking strategy, wherein the system learns to predict intentionally hidden sections of text within otherwise unannotated language examples and produced state-of-the-art results on a range of NLP tasks.
    Achieved accuracy of 0.7720 on validation dataset

Third step: Evaluation of Results

  • As you may have noticed, the experimental RoBERTa model does not show a very large improvement in accuracy over the baseline BiLSTM model. There are two likely explanations for this:
  • First, that the performance may be approaching a limit of human “accuracy” in labeling. There will be discrepancies with respect to the moderators’ judgement that could lead to noisy labeling.
  • Second, that this work has not taken advantage of the full capabilities of RoBERTa such as pretraining with unlabeled data to gain a more accurate understanding of the platform’s domain specific jargon.

Another result that is worth talking about is the high recall that RoBERTa achieved on the validation dataset. This value indicates that this model would have capabilities of allowing quicker removal of illicit content, as the model could process all comments immediately upon posting instead of waiting for a user to see it and decide to report.

So that completes my analysis. I have successfully implemented an automatic comment-flagging system using RoBERTa fine-tuned on a labeled dataset for a particular subreddit.

Thank you for reading!

P.S. You can find the paper describing this implementation in detail in the “Projects” section.