- Clean.py takes in a csv file and does all the cleaning of the text outlined in section 1.2.2 of the report.
- choose.py selects the subset of tweets we are going to analyze, and it selects the test tweets for prediction later on.
- betterDate.py converts the publish date of the tweets into a python datetime object.
- Visualize.py takes a csv file and constructs a word cloud from the corpus of tweets.
- analysisk.py, where k =1,2,3 are the processes by which we answer questions 1,2, and 3. Basically all three construct T matrices that are used to compare tweet via dot products.
- results.py quantify the goodness of our predictions with confusion matrix statistics.
- timePlot.py produces the histogram of consecutively similar tweets posted in a range of time frames.
JoetheManHowie/TwitterData
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|