Assessing the Stylistic Properties of Neurally Generated Text in Authorship Attribution

Enrique Manjavacas & Jeroen De Gussem & Walter Daelemans & Mike Kestemont

08/09/2017

EMNLP17 (Workshop on Stylistic Variation) - Copenhagen

https://emanjavacas.github.com/slides-content/copenhagen-emnlp-17

1 Introduction

RNNs are a powerful tool for text generation

Neurally generated text makes an authentic impression on readers
Basis for renewed interest in NLG
Recent interest in the information encoded by RNNs

Evaluation of neural synthetic text through Authorship Attribution

To what extent does the generative model retain author-related stylistic properties of the source?
How does it compare to traditional n-gram models?

Experiment idea

Fit Language Models (one per author)
Generate a collection of docs per author through sampling
Apply Authorship Attribution (AA) to the resulting synthetic dataset

Assumption

“Algorithmic” definition of style

Style is whatever textual properties an algorithm might use in order to successfully attribute a text to its actual author, without considering whether the properties discovered by the algorithm are stylistic in a narrower sense.

2 Summary of the presentation

Describe the models used for text generation
Describe the experimental setup
Discuss the results

3 Text Generation

Character-level Text Generation with Language Models

\(P(w_1, w_2, ..., w_n)\) = \(P(w_1|\text{<}bos\text{>})\) \(* \prod_{i=1}^n P(w_{i+1}|w_1, ..., w_{i})\)

Models

NGLM

\(P(w_t|\ldots)\) \(\approx P(w_t|w_{t-(n+1)}, \ldots, w_{t-1})\)

RNNLM

4 Experiment

Experimental Setup

Difficulty

Maximize comparability of authentic and generated text

Unequal training size per author for LMs
Unequal training and test size per author for the attributor (important in AA)
Authentic text has doc-level structure, LM-generated text does not

Proposed method

Random even doc-level split (referred to as \(\alpha\) and \(\omega\) for simplicity)
Create 20 fixed-size (5000w) docs per split by sampling sentences
Sample a third set (\(\bar{\alpha}\)) from the author’s LMs trained on \(\alpha\)

Attribution Experiments

Dataset

Patrologia Latina (⩲ 113M words)
Ecclesiastical latin texts spanning 1000 years
Homogeneous institutionalized literary language (L2 language)

Language Model Fitting

NGLM

N-gram order is set to 6
Parameters estimated through MLE (no smoothing since only interested in generation)

RNNLM

Model definition

Parameter	Value
Embedding size	24
RNN Cell	LSTM
Hidden size	200
Hidden Layers	2

Training

Parameter	Value
Batch size	50
Optimizer	Adam (default params)
Learning rate	0.001
Gradient norm clipping	5.0
Dropout	0.3 (RNN output)
Epochs	50

Validation perplexity: 4.015 (± 0.183)

Attributor

Linear SVM (scikit-learn) on tf-idf character {2,3,4}-grams
No word-level features since RNN can produce unseen words
Grid-search over parameters

Parameter	Grid values
Max-features	5,000; 10,000; 15,000; 30,000
C	1; 10; 100; 1,000

5 Results

5-fold cross-validated scores per experiment

Numbers

Source	Experiment	F1	P	R
Real	< α,ω >	0.833	0.818	0.869
	< ω,α >	0.811	0.795	0.853
NGLM	< α,ω >	0.706	0.744	0.750
	< ω,α >	0.837	0.811	0.881
RNNLM	< α,ω >	0.635	0.701	0.658
	< ω,α >	0.724	0.778	0.775

Discussion

Why does NGLM outperforms RNNLM in both setups (\(<\bar{\alpha},\omega>, <\omega,\bar{\alpha}>\))?

SVM uses very local features, NGLM reproduces very well the local distribution

LEXICAL OVERLAP: Mean-normalized ngram Jaccard-similarity across authors.

Character-level

Why does NGLM outperform Real setup in (\(\omega\), \(\alpha\))?

Prunning effect? Eliminating “distractive” features and enhancing those that are more relevant
It might prove beneficial for actual AA

6 Self-learning (Data-augmentation) Experiments

Is there still some authorial signal in the RNNLM-generated data?
Is there an effect of the long-term dependencies learned by the RNNLM on the stylistic properties of the generated data?
If so, augmenting the authentic training data with RNNLM-generated data could yield attribution improvements

Experiment

\(<\alpha+\bar{\alpha}, \omega>\)

Numbers

Source	Experiment	F1	P	R
Real	< α,ω >	0.833	0.818	0.869
	< ω,α >	0.811	0.795	0.853
NGLM	< α + α,ω >	0.814	0.809	0.850
	< α,ω >	0.706	0.744	0.750
	< ω,α >	0.837	0.811	0.881
RNNLM	< α + α,ω >	0.872	0.878	0.892
	< α,ω >	0.635	0.701	0.658
	< ω,α >	0.724	0.778	0.775

Discussion

-> The long-term dependencies prove beneficial (not redundant)

-> (Evidence for) RNNLM better modelling stylistic variation in the original distribution

7 Conclusion

LMs seem to capture stylistic properties to a certain extent
Arguably RNNLMs are richer models (based on the augmentation experiments)
More global attributors still needed. Stylistic evaluation still too local
Further exploration of Data Prunning (NGLM) & Data augmentation (RNNLM) for AA

8 Thank you for your attention!

9 Plots

DOC-LEVEL PCA: NGLM (150 most-freq ngrams doc-representation)

DOC-LEVEL PCA: RNNLM (150 most-freq ngrams doc-representation)