-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
81 lines (60 loc) · 3.52 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
------------------
Assignment 2 - Twitter stream recorder with emoticon tokenizer
Assignment 2 Python 1.0 08 May 2017
------------------
License Information
------------------
Copyright (c) School of Geography.
University of Leeds, Leeds, West Yorkshire, UK. LS2 9JT.
All rights reserved.
This code is under the Common Development and Distribution License (CDDL-1.0).
For details, please see the https://opensource.org/licenses/CDDL-1.0
-------------------
Contact
-------------------
Author = Rowan Gill
Email = [email protected]
------------------
Basic Information
------------------
This project was aimed at connecting to the Twitter API and streaming services to catch live tweets of any topic the user wants. This will
be saved into a cvs file with the name data_(year/month/day/hour/minute/second) so that no two files are ever the same, the format will be
screen_name - text(tweet) - Coordinates(if available). This does this by defining a counter to record the number of tweets to a certain point
defined by the user and creates the csv to record the data with designated column names. Then whenever a tweet is posted that contains the
words being searched for the code will open the file and write the tweet inside - it will continue to do this until the maximum tweets is met.
The tokeniser then runs through the text column and creates tokens of all the words, hashtags, hyperlinks and emoticons. It runs through all the
rows doing this and then runs a conversion for the emoticons as they will be created as utf-8 byte characters and therefore unreadable. To do this
a conversion csv was made and the code will run through and replace all of the utf-8 emoticon code with a literal string of its meaning.
---------------------
Instructions For Use
---------------------
Stream Grabber
**************
To use the code the user must first obtain their own twitter access token and secret and consumer key and secret in order to connect to the twitter
stream. This can be done by registering an app onto the twitterapps page and they will send you these to use. Once you have these place them in the
areas designated at the top of the onlineTweep python file.
The next step is to define how many tweets you want stored per run of the code, by default it will say self.m = 30 which stops at 30 tweets as it
was used to test the functions of the code. I am not sure of the maximum amount that TweePy can record from the free Twitter stream access so it is
up to the user to define this number.
The user then needs to only input what words they are going to be searching twitter for in the:
TweetStream.filter(track=[''])
This is what the code will be looking for in twitter posts and when it finds it will record it in the csv, simply put your terms inbetween the ''
and fire up the code.
Tokenizer
*********
To tokenise the tweets the user needs to define the file they want to tokenise within the:
bar = pd.read_csv("data_()")
This function was going to an iterator through the directory but that was not completed in this time frame so at the moment the user will need to
hardcode the file name they want to tokenise.
------------------
GitHub
------------------
If you want to see the whole code then go to:
https://github.com//Rohoolio/TwitterDense
------------------
Dependencies
------------------
This project was dependent on certain python libraries such as:
Pandas- http://pandas.pydata.org/
Tweepy - https://github.com/tweepy/tweepy
NLTK - http://www.nltk.org/