Get tweets stats from profile
Tags: #twitter #tweets #scrap #snippet
Author: Tannia Dubon

Input

Import library

1
import os
2
import re
3
import pandas as pd
Copied!
1
#install developer snscrape package via command line
2
os.system("pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git")
Copied!

Variables

1
#criteria for searching by username
2
username = "JupyterNaas"
3
tweet_count = 500
Copied!

Model

Scrap and save results in JSON

1
#search by username using command line
2
os.system("snscrape --jsonl --max-results {} twitter-search from:{} > user-tweets.json".format(tweet_count, username))
Copied!

Read JSON

1
# Reads the json generated from the CLI command above and creates a pandas dataframe
2
df = pd.read_json('user-tweets.json', lines=True, convert_dates=True, keep_default_dates=True)
3
df
Copied!

Clean dataframe to keep only necessary columns

  • URL
  • TITLE
  • CONTENT
  • HASTAGS
  • DATE
  • LIKES
  • RETWEETS
1
#copy dataframe
2
df1 = df.copy()
3
4
#keep only the columns needed
5
df1 = df1[['url','content','hashtags','date','likeCount','retweetCount']]
6
7
#convert columns to upper case to follow naas df convention
8
df1.columns = df1.columns.str.upper()
9
10
#convert time to ISO format to follow naas date convention
11
df1.DATE = pd.to_datetime(df1.DATE).dt.strftime("%Y-%m-%d")
12
13
#clean HASHTAGS column to provide searchable items in columns
14
df1.HASHTAGS = df1.HASHTAGS.fillna("[]")
15
df1.HASHTAGS = df1.apply(lambda row: ", ".join(list(row.HASHTAGS)) if row.HASHTAGS != '[]' else "", axis=1)
16
17
#display results
18
df1
Copied!

Output

Save to df

1
df1.to_csv("tweets_from_URL.csv", index=False)
Copied!
Copy link
Edit on GitHub