Gender differences in language use

Newmann et alii (2009)

http://emanjavacas.github.io/slides/

1. General research question
2. Methodology
3. Results
4. Conclusions
5. Bibliography

1 General research question

-> To what extent do women and men differ in their language use

-> Try to analyze gender differences across multiple linguistic categories and multiple contexts

1.1 Spot the difference!

1.2 What we now know: previous findings (since the 80s)

1.2.1 Turn-taking conversation

(Mulac, Bradac, Gibbons 1988)

Women used more questions (“Does anyone want to get some food?”)
Men used more directives (“Let’s go get some food”)
longer sentences among women
more and longer turns taken among men

1.2.2 Specific phrase level considerations of female language

(Robin Lakoff 1975)

hedges and tag questions (“it seems like”, “…, aren’t you?”)
Uncertainity verb phrases (“I wonder if…”)

1.2.3 Corpus-based research

(Biber, Conrad, Reppen 1998)

Females

– more intensive adverbs

– more conjunctions (“but”)

– more modal and auxiliary verbs (“could”)

– more first person plural

Males

– longer words

– more articles

– more references to locations

1.3 Shortcomings

1.3.1 Non replicability, contradictory reports

Thomson and Murachver’s (2001),

– e-mail communication found that men and women were equally likely to ask questions

(Mulac, Seibold, & Farris, 2000), role play,

– men used more negations and asked more questions – women posed more directives

Explanations:

– studies do not take into account context

coding schemes are not always consistent across studies

“One researcher’s uncertainty verb phrase is another’s hedge”

1.3.2 Non significant results (due to the small effect sizes)

Analysis of the correlation between gender and sentence length:

-> significant (p < .05) effect size (d) of .12

-> sample of over 2,000 people

-> mean words-per-sentence was 23.4 for men and 19.1 for women

-> with a standard deviation of 35.1

Pennebaker and Stone (2003)

2 Methodology

2.1 Corpus

2.1.1 Description

-> 11,609 participants, -> approximately 45,700,000 words -> 14,324 final text files, with 5,971 written by men and 8,353 written by women.

2.1.2 Taking context into account

2.2 LIWC (Linguistic Inquiry and Word Count)

-> Used as text preprocessing to filter out relevant categories (or features) which might be relevant for gender differentiation

–> LIWC defines a total of 74 categories

–> LIWC takes into account both function words and content words

–> but also more abstract entities like POS.

–> 54 language dimensions.

2.2.1 Snapshot of the official documentation

Click hier –> Link to the categories

Link to the LIWC website

2.2.2 Example output of LIWC

	Women	Men	Example
Linguistic Processes
Total Function Words	37	26
Adverbs	5	3	“very”, “really”
Article	2	2	“a”, the”
Prepositions	3	10	“to”, “with”
Personal Pronoun	11	7	“us”, “them”
First Pers. Pronoun	8	7	“I”
Total Verbs	10	3
Present tense verbs	5	3	“is”, “does”
Auxiliary Verbs	7	3	“am”, “will”
Psychological Processes
Affective processes	3	1	“cried”, “abandon”
Positive emotions	2	1	“love”, “nice”
Social processes	8	1	“mate”, “talk”
Cognitive processes	9	10	“cause”, “know”
Relativity	4	3	“exit”, “area”
Total	134	96

3 Results

3.1 Multivariate analysis

-> MANOVA (Multivariate Analysis Of VAriance)

–> F(53; 14,270) = 30.66 –> p < 0.05

Friendly explanation of MANOVA

3.2 Summary of interesting results

Females	Males	Null effect	Effect size (d)
pronouns			.36
present-tense verbs			.18
	word-legth		.24
	numbers		.15
	articles		.24
	prepositions		.17
social words			.21
positive feelings			.15
anxiety			.16
	swear words		.22
	occupation		.12
		word count
		question marks
		sex

3.3 Function words are better discriminants

-> comparation of the average effect size for function words categories with the average effect size for the content words categories:

– Content words: verbs (feeling, hearing, insight…) => d = .10

– Content words: nouns (friends, family, occupation, money, metaphysical…) => d = .11

– Function words (articles, prepositions, pronouns…) => d = .20

3.4 Taking context into account (Crossover)

3.4.1 Crossover effects

-> When considering single registers effect sizes can change their sign

		Emotion	Time M.	SoC	Fiction	TAT	Exams	Conversation
pronouns	.36	.25	.20	.33	.79	.30	.46	.03
present verbs	.18	.14	.04	.17	.28	.09	.41	-.21
word-legth	-.24	-.16	-.04	-.09	-.32	-.17	-.45	-.44
numbers	-.15	-.09	-.15	-.13	-.37	-.10	-.09	14
articles	-.24	-.21	-.07	-.33	-.70	-.22	-.05	-.77
prepositions	-.17	-.11	-.09	-.12	-.26	-.09	-.11	-.74
anxiety	.16	.13	.05	-.09	.04	.12	-.30	.16
swear words	-.22	-.14	-.10	-.24	-.28	-.12	-.06	-.43

3.4.2 Discriminant contexts/registers

Register	Effect size (d)
Fiction	.31
Conversation	.26
Exams	.22
SoC	.11
TAT	.09
Emotion	.08
Time Management	.08

4 Conclusions

-> Small and subtle differences but consistent and systematic

-> Function words are found to be the best discriminators

-> Mainly, the following variables: word length, articles, swear words, social words and pronouns

-> Further work is needed (the article was published 2009!) in reavealing the “why”

4.1 Discussion question

The phenomenon analized consists on a entire range of variables that:

-> act together, holistically (as opposed to the markers analyzed before)

-> are rather abstract and non-salient (frequencies of function words, etc…)

-> to what extent can this be considered to be relevant to active construction of identity?

-> could we explain this as a result of a unconsciously inherited language style?

5 Bibliography

Check the paper!

“Gender differences in language use” Newmann et alii (2009)