Comment on page
Chat about my latest profile posts
Tags: #linkedin #profile #post #stats #naas_drivers #content #automation #csv
Last update: 2023-07-26 (Created: 2023-07-24)
Description: This notebook enables you to converse inside MyChatGPT about your most recent LinkedIn posts using a CSV file stored in your Naas Lab and a JSON plugin asset. Data is updated and replaced with each run.
Disclaimer:
This code is in no way affiliated with, authorized, maintained, sponsored or endorsed by Linkedin or any of its affiliates or subsidiaries. It uses an independent and unofficial API. Use at your own risk.
This project violates Linkedin's User Agreement Section 8.2, and because of this, Linkedin may (and will) temporarily or permanently ban your account. We are not responsible for your account being banned.
from naas_drivers import linkedin
import pandas as pd
from datetime import datetime
import naas
import os
import json
try:
from wordcloud import WordCloud
except:
!pip install wordcloud --user
from wordcloud import WordCloud
import matplotlib.pyplot as plt
try:
import tiktoken
except:
!pip install tiktoken --user
import tiktoken
Mandatory
li_at
: Cookie used to authenticate Members and API clientsJSESSIONID
: Cookie used for Cross Site Request Forgery (CSRF) protection and URL signature validationlinkedin_url
: This variable stores the LinkedIn profile URL that will be used as an input for the script.
Optional
limit
: Number of posts retrieved.refresh_interval
: Time in minutes between two updates of the CSV file, which helps prevent excessive calls to the LinkedIn API.cron
: cron params for naas scheduler, for information on changing this setting, please check https://crontab.guru/ for information on the required CRON syntax.plugin_name
: It represents the name of the plugin.plugin_model
: It specifies the model to be used by the plugin.plugin_temperature
: It determines the creativity level of the generated content, with higher values resulting in more diverse outputs.plugin_max_tokens
: It specifies the number of maximum tokens to be used by the plugin.system_prompt_max_tokens
: Indicative limit of the maximum number of tokens allowed in the system prompt.output_dir
: This variable represents the name of the output directory.csv_file_name
: This variable stores the name of the CSV file that will contain the latest posts.image_file_name
: This variable holds the name of the image file that will display the word cloud.plugin_file_name
: This variable contains the name of the plugin file that will analyze the posts.
# Mandatory
li_at = naas.secret.get("LINKEDIN_LI_AT")
JSESSIONID = naas.secret.get("LINKEDIN_JSESSIONID")
linkedin_url = "https://www.linkedin.com/in/florent-ravenel/" # EXAMPLE "https://www.linkedin.com/in/myprofile/"
# Optional
limit = 5
refresh_interval = 60
cron = "0 8 * * *"
plugin_name = "LinkedIn posts analyzer"
plugin_model = "gpt-4"
plugin_temperature = 0
plugin_max_tokens = 8192
system_prompt_max_tokens = 2084
output_dir = "linkedin_outputs/latest_posts/"
csv_file_name = "posts_data.csv"
image_file_name = "wordcloud.png"
plugin_file_name = "posts_analyzer_plugin.json"
Create the output directory and define paths for the output files.
# Check if directory exists and create it if not
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Generate outputs files path
csv_file_path = os.path.join(output_dir, csv_file_name)
image_file_path = os.path.join(output_dir, image_file_name)
plugin_file_path = os.path.join(output_dir, plugin_file_name)
print('📂 CSV file path:', csv_file_path)
print('📂 Image file path:', image_file_path)
print('📂 Plugin file path:', plugin_file_path)
Retrieve the most recent posts from LinkedIn, establishing a limit to prevent transferring an overwhelming amount of data to the LLM.
def get_last_posts(
li_at,
JSESSIONID,
linkedin_url,
limit,
file_path,
refresh_interval
):
# Init
df = pd.DataFrame()
update_data = True
# Check if output already exists
if os.path.exists(file_path):
# Read file
df = pd.read_csv(file_path)
# Assess if LinkedIn API can be invoked based on the last call.
# To emulate human interaction, we must avoid making excessive calls to the LinkedIn API. Overdoing this could result in being banned.
if len(df) > 0:
if "DATE_EXTRACT" in df.columns:
# Manage calls to API
last_update_date = df.loc[0, "DATE_EXTRACT"]
time_last_update = datetime.now() - datetime.strptime(last_update_date, "%Y-%m-%d %H:%M:%S")
minute_last_update = time_last_update.total_seconds() / 60
if minute_last_update < refresh_interval:
update_data = False
print(f"🛑 Nothing to update. Last update done {int(minute_last_update)} minutes ago.")
if update_data:
# Get last posts
df = linkedin.connect(li_at, JSESSIONID).profile.get_posts_feed(linkedin_url, limit=limit)
# Save last posts in CSV
df.to_csv(file_path, index=False)
print("💾 Dataframe successfully saved:", file_path)
return df
df_posts = get_last_posts(li_at, JSESSIONID, linkedin_url, limit, csv_file_path, refresh_interval)
print("✅ Row fetched:", len(df_posts))
df_posts.head(limit)
# Share output with naas
data_link = naas.asset.add(csv_file_path)
Creating a word cloud is useful as it visually represents the frequency or importance of words in a text, providing a quick and insightful overview of the content.
# Creating the text variable
text = " ".join(text for text in df_posts.astype(str).TEXT)
# Creating word_cloud with text as argument in .generate() method
word_cloud = WordCloud(
collocations=False,
background_color="white",
width=1200,
height=600
).generate(text)
# Display the generated Word Cloud
plt.imshow(word_cloud, interpolation='bilinear')
plt.axis("off")
plt.show()
# Save your image in PNG
word_cloud.to_file(image_file_path)
print("💾 Image successfully saved:", image_file_path)
# Share output with naas
image_link = naas.asset.add(image_file_path, params={"inline": True})
Prepare data
Refine the dataframe for use in the plugin to prevent passing excessive data and tokens to the LLM.
def create_plugin_data(df):
# Keep column
to_keep = [
"POST_URL",
"AUTHOR_NAME",
"PUBLISHED_DATE",
"TITLE",
"TEXT",
"VIEWS",
"LIKES",
"COMMENTS",
"SHARES",
"ENGAGEMENT_SCORE"
]
df = df[to_keep]
# Filter
df = df[df["VIEWS"].astype(int) > 0]
# Multiply ENGAGEMENT_SCORE by 100 and drop the original column
df["ENGAGEMENT_%"] = df["ENGAGEMENT_SCORE"] * 100
df = df.drop(columns=["ENGAGEMENT_SCORE"])
return df.reset_index(drop=True)
data = create_plugin_data(df_posts)
data
Engineer system prompt
We used Playground to refined it: https://platform.openai.com/playground?mode=chat&model=gpt-4
system_prompt = f"""Act as a Social Media Analyst Asssitant. Your job is to help you unravel the story behind the user LinkedIn posts' performance.
You can dive deep into the data and gather insights that can help boost the user LinkedIn strategy.
You can help the user understand which posts are getting the most views, likes, comments, and shares.
You can also analyze the engagement of each post and see how it correlates with different factors.
When you refer to a post, create href in markdown format so that the user can go to the post you mention by clicking on the markdown href link in a new tab.
But that's not all! You can also help identify trends over time, find out the best time to post, best post to do next, understand the impact of different post types, and much more.
The possibilities are endless, be creative!
Now, let's get started. Here's the data from the user latest LinkedIn posts:
{data.to_string()}.
Let's dive in and discover the stories the data is waiting to tell!
The fist message should be about presenting yourself with maximum 5 bullet points and displaying the worldcloud: {image_link}
Then, wait for the first answer from the user, and then start with a first high level analysis.
Here is the link to download the data in csv: {data_link}
"""
Check tokens count
def num_tokens_from_string(string: str, encoding_name: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
system_prompt_tokens = num_tokens_from_string(system_prompt, "cl100k_base")
if system_prompt_tokens > system_prompt_max_tokens:
print("⚠️ Be carefull, your system prompt looks too big. Tokens:", system_prompt_tokens)
else:
print("✅ System prompt tokens count OK:", system_prompt_tokens)
Generate plugin
Plugin must be a JSON file with mandatory keys name, model, temperature, max_tokens and prompt
# Create json
plugin = {
"name": plugin_name,
"model": plugin_model,
"temperature": plugin_temperature,
"max_tokens": plugin_max_tokens,
"prompt": system_prompt,
}
# Save dict to JSON file
with open(plugin_file_path, "w") as f:
json.dump(plugin, f)
print("💾 Plugin successfully saved:", plugin_file_path)
You can now use in your MyChatGPT by copy/pasting the URL after the command
/use
naas.asset.add(plugin_file_path, params={"inline": True})
Schedule your notebook with the naas scheduler feature
naas.scheduler.add(cron=cron)
# to de-schedule this notebook, simply run the following command:
# naas.scheduler.delete()
Last modified 3mo ago