birdy news-- RWET final project

IDEA INTRODUCTION My final project for Reading and Writing Electronic Text basically followed my mid-term project idea, which focused on the relationship between individual and community; personal digital heritage and the internet phenomenon (as a whole, which composed by individual's digital creating activities).

I am really obsessed by the question that what is happening while I am typing this post? I think everyone of you should ask yourself this question in case of being "connected (to the internet)" as well as being "disconnected (to the reality)" simultaneously.

In the digital era, especially after the advent of the Internet, individualization have been highlighted ever-increasingly. That is one of the reasons why the amount of digital information have been created on the internet have already far more surpassed the whole information recorded in human history before digital era came. However, every coin has two sides. A huge amount of "trash" information have been created and is being created all the time which is quite not the case in former human history. In my opinion, the "trash" is not trash at all, it is just, somehow, too personalized.

HOW IT WORKS

In my final project, I was trying to explore a way to remind people the fact that the world keeps changing and making meanings unceasingly while we are changing and publishing our own meanings onto the Internet.

Basically, what I did in my final project was playing around with twitter API and New York Times API. The first part of my project is input the twitter account in which to get the tweet from, and input how many tweets you want. The program will get tweets from twitter API, then by using blob.noun_phrase, extract the noun phrase from tweets.  The second part, use extracted noun phrase as key word to search news in New York Times API, and the output is the headlines of those news which have the noun in that article (which might be a little bit confused sometimes, because usually in headlines, there is no the key noun in it, but tweet does have this noun phrase. I will talk about this later).

PROBLEMS & DIFFICULTIES

First one is facing to the limitation of New York Time API. In the first place, I was kind of hesitate between using "The Article Search API" and "The Most Popular API". Both of them have the headline extraction function. In article search API, you could get articles from any time period(from Sept. 18, 1851 to today) as you want. Many of them might not that popular. In most popular API, you could get the most hit news titles, however, only from the past 30 days. After weighting a lot, I decided to use the article search API. Because I think it will be fun if you search some old news which might talked about what you are talking about right now...

The second one I have ran into was how to move a period of string from each tweet. Because for each tweet in one of the tweet accounts I am following has the same format like this: [xxx joke] RT .xxx(author): "tweet". The "tweet" part is the only part I need. So what I need to do was delete strings before ":". However, what was waiting for me ahead was a new one, unicode. The fact is each tweet_text in tweet API is unicode instead of string, which means I have to convert it into string first, which is fine I thought, because our instructor have introduced us a way to do that. But, it did not work. So I just google for more method to deal with unicode stuff. You could see different methods in different stages I have used for unicode-string converting problem in my code later. 

In addition, after testing PrfJocular, I found out that it could not work with other tweet account, because tweets in other account usually do not have that format, so I just change the logic in def newtweet. Until now, the program could not work for every tweet account actually, which is the part I am still working on. The reason could be anything, like tweet initialized with strange symbols  like # * ^, or capital and lower problem...

Others include indentation problem (impressive one)... for loops in for loops (I admit that I am not reasonable enough and do not have strong logical thinking)... class stuff (now I am clearer about this).

Last but not least, I would like to make a kind of chatting atmosphere in this project, however, it turned out that it is almost impossible. In my opinion, there are couple reasons. First of all, conversation could not exist without making sense, which is not for randomly generated poetry (more for making fun I think); secondly, the limitation of API limits how accurate content I can get.

FOR PRESENTATION & PERFORMANCE

Because there are multiple headlines as output about one keyword (noun phrase), in order to keep the gap between tweet and news' headline in a reasonable and understandable range (the ideal condition is the gap is not too huge to bridge by former experience as well as leave a space to audiences for imagination). So after program generating, I manually picked one in the headlines to couple with the tweet. Then, as you see, did some graphic design output as PDF.

CONTINUE FROM THE END OF "HOW IT WORKS"

After the performance, I asked some of my friend about two different contents from twitter and NY Times. Most of them thought tweeter is more fun, which makes sense because NY Times, as a public media, is more serious. Mentioned about the "gap", some of them could bridge it while others can't.

RWET_final

RWET_final (1)

 

import sys
import twython
import urllib 
import urllib2
import json
import re
import unicodedata
import ast
from textblob import TextBlob
import pprint
print 'Arguments:',sys.argv[1]

def year(d):
    return d[0:4]

def month(d):
    return d[5:7]

def day(d):
    return d[8:10]

allNouns = list()
'''
nouns = list()
adjectives = list()
newNouns = list()
number =0
'''
headlines = set()
dates = list()

results = dict()
nounssss = list()
name = sys.argv[1]
num = sys.argv[2]

api_key = "cbQOYIm4QKOZFfU0udeecg" 
api_secret = "0xfLYKgOpLDfmYqLOujfWdeNBaG4K7AxJKm8e1huM"
access_token = "441793842-SG93JvCP4426PpU9nSRt8yef8cliRY6gRFnMZmjE"
token_secret = "BURRQAN5wZqzfQu9ly7TFiuE5412QtjSTb0G0c5LnPwoE"

twitter = twython.Twython(api_key, api_secret, access_token, token_secret)
response = twitter.get_user_timeline(screen_name=name, count=num)

tweets_to_be_printed = []

class tweet(object):

    def __init__(self, result, nounss, insideResponse):
        self.results = result
        # self.tweet = tweet
        self.allMyNouns = nounss
        self.insideResponse = insideResponse
        # print "hello"

    def isNoun(self, word):
        pass

    def isAdjective(self, word):
        if word == "":
            return False
        if word.lower() in adjectives:
            return True
        else:
            return False

    def newtweet(self, response):
        self.insideResponse = response
        self.results = list()

        for tweet in self.insideResponse:
            tweet_text = tweet['text']
            detweet = unicodedata.normalize('NFKD', tweet_text) 
            real_tweet = ""

            if detweet.find(":") != -1:
                index_of_colon = detweet.index(':')
                real_tweet = detweet[index_of_colon+2:]
                tweets_to_be_printed.append(real_tweet)
            else:
                real_tweet = detweet[0:]
                tweets_to_be_printed.append(real_tweet)

            mynouns = list()
            if tweet['retweeted'] != True and tweet['text'][0:2] != "RT":
                real_tweet = real_tweet.replace("\"", "\'")

                blob = TextBlob(real_tweet)
                for word in blob.noun_phrases:
                    self.allMyNouns.append(word)

        return self.allMyNouns

    def getTweet(self,listOfNouns):
        for wd in self.allMyNouns:
            #print wd
            searchterm = wd
            request_string = 'http://api.nytimes.com/svc/search/v2/articlesearch.json?fq=' + searchterm + '&facet_field=source&begin_date=19990101&end_date=20130101&api-key=e12e33fafba643a896576df64ba79eeb:18:69211686'
            urlresponse = urllib2.urlopen(request_string)

            for doc in docs:
                headlines.append(doc["headline"]["main"])
                dates.append(doc["pub_date"])

            for headline in headlines:
                print headline

a = tweet(results,nounssss,response)
results = a.newtweet(response)
results_unicode = [x.encode('UTF8') for x in results]
tweet_index = 0
for keyword in results_unicode:

    for mytweet in tweets_to_be_printed:
        if keyword in mytweet.lower():
            print "\n\n" + mytweet + "\n"
            break

    i=0
    headlines = set()
    if i<num:

        searchTerm = list()
        searchterm = str(keyword)
        searchTerm.append(searchterm)
        print "\n\n" + searchTerm[i] + "\n"

        params_dict = {"fq" : searchterm, "facet_field": "source", "begin_date": "19990101", "end_date": "20130101", "api-key": "e12e33fafba643a896576df64ba79eeb:18:69211686"}
        new_param_dict = dict()
        for key,value in params_dict.iteritems():
            new_param_dict[key] = unicode(params_dict[key]).encode('utf-8')

        params = urllib.urlencode(new_param_dict)

        request_string = 'http://api.nytimes.com/svc/search/v2/articlesearch.json?' + params
        urlresponse = urllib2.urlopen(request_string)
        tResult = json.load(urlresponse)
        outcome = tResult["response"]
        docs = outcome["docs"]
        i=i+1

        for doc in docs:
            headlines.add(doc["headline"]["main"])
            dates.append(doc["pub_date"])

        for headline in headlines:
            print headline