While I’ve been working on finishing the final data visualizations and interface for the August show at Gray Area, I’ve also been trying to figure out how to find relevant media to the neighborhood culture and the urban environment.
Classification and Relevance Score
I’m doing bag of words analysis and algorithmically scoring each Photo and Tweet based on whether relevant or interesting to one of the 3 theme maps: where people are going, what people are doing, and how people are feeling.
The algorithm scores each media based upon the following attributes:
- Number of likes
- Number of relevances keywords, e.g. airbnb, drought, bus
- Number or retweets or comments
- High Positive or Negative sentiment
- Occurrence of parts of speech, such as gerund representing activity, i.e. eating
- Twitter Relevance score
- Objects sensed in the photo by Open CV
- Nearness to other relevant media
I’d also like to include color in this analysis. Perhaps, considering color in relation to sentiment? I also want to explore ways to find less prominent voices and media in other languages.
Here are some examples of media that score high in my algorithm and also seem interesting:
I wrote some code to capture emoji characters in tweets. These were the most used emoticons in the Mission District last week (order by usage):
😀 💙 😎 😭 😊 👓 😩 ✨ 😀 🌊 🎉 👌 💕 🌞
🌲 🌴 📷 😉 ✌ 🏄 🙏
When the sentiment of the post was negative, these were popular:
😭 💩 😩 😾 😔 😐
And here’s emoji’s, when the post was positive:
💙 😍 💜 🎉 ✨ 🌉 😉 😌
Of particular interest to this analysis are words and sentiment that cuts across the theme maps and different topics. Here are some example of words that match this criteria: