"place":{ "country":"United States", "place_type":"city", "country_code":"US", "bounding_box":{ "type":"Polygon", "coordinates":[[[-84.647561,37.031352],[-84.564839,37.031352], [-84.564839,37.117608],[-84.647561,37.117608]]] }, "full_name":"Somerset, KY", "name":"Somerset", "id":"0c610ec760ff6a57" }
-> Amsterdam: highly international
-> Antwerp: geographically small but highly multilingual
-> Berlin: large spread and decentralized
-> Brussels: bilingualism
Mining since December 2014
City | Number of tweets | Box size |
---|---|---|
Amsterdam | 679205 | 23,44 |
Antwerp | 415813 | 83,03 |
Berlin | 691998 | 45,01 |
Brussels | 497667 | 46,94 |
Total | 2284683 |
#hashtags
, @usernames
, and http://links.colangid.py | 97 languages |
CLD2 | > 80 languages |
LangDetect | 53 languages |
Only known available system is Bot or Not? (no API)
Double inference
Overrepresentation of prolific users/languages
Underrepresentation of certain languages
(from urbanmovements.co.uk)
"Nationality","District","Migratory_background","Foreigners" "fr","01 Mitte",782,3262 "fr","0101 Zentrum",385,1478 "fr","010111 Tiergarten Süd",63,218 "fr","01011101 Stülerstraße",9,30 "fr","01011102 Großer Tiergarten",0,3 "fr","01011103 Lützowstraße",35,94 "fr","01011104 Körnerstraße",16,79 "fr","01011105 Nördlicher Landwehrkanal",3,12 "fr","010112 Regierungsviertel",22,103
\(H = \sum\limits_{j=1}^J \frac{t_j}{TE}(E - E_j)\)
\(R^2\) 0.87
Can we establish mobility patterns based on the language of tweets?
—
languages | en | ar | es | ru | pt | tr | de |
# Users | 592 | 60 | 64 | 42 | 37 | 111 | 747 |
\(\sqrt[]{\frac{1}{n}\sum\limits_{i=1}^n haversin(a_i - \bar{a_i})^2}\)
One-way ANOVA
: Df Sum Sq Mean Sq F value Pr(>F) : lang 6 164.6 27.437 4.138 0.000516 *** : Residuals 308 2042.4 6.631 : --- : Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Single T-test
language pair | p-value |
---|---|
tr de | 0.0002 |
ar de | 0.0003 |
tr ar | 0.9903 |
pt es | 0.9518 |
Thank You for your attention