Welcome to my series of scripts for analysis!

Some overhead of work flow

Clean.py takes in a csv file and does all the cleaning of the text outlined in section 1.2.2 of the report.
choose.py selects the subset of tweets we are going to analyze, and it selects the test tweets for prediction later on.
betterDate.py converts the publish date of the tweets into a python datetime object.
Visualize.py takes a csv file and constructs a word cloud from the corpus of tweets.
analysisk.py, where k =1,2,3 are the processes by which we answer questions 1,2, and 3. Basically all three construct T matrices that are used to compare tweet via dot products.
results.py quantify the goodness of our predictions with confusion matrix statistics.
timePlot.py produces the histogram of consecutively similar tweets posted in a range of time frames.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
.gitignore		.gitignore
Clean.py		Clean.py
JOE_HOWIE_CSC501_assignment_3.pdf		JOE_HOWIE_CSC501_assignment_3.pdf
MyVecs.py		MyVecs.py
README.md		README.md
Visualize.py		Visualize.py
analysis1.py		analysis1.py
analysis2.py		analysis2.py
analysis3.py		analysis3.py
betterDate.py		betterDate.py
choose.py		choose.py
results.py		results.py
timePlot.py		timePlot.py