Input: list of all the variables, credentials, that needs to be setup
Model: list the functions applied to the data
Output: list the assets to be used by the user and its distribution channels if any.
When it's necessary (big amount of structured data to handle) we use are using parquet file + https://duckdb.org/ to query the dataset used in our data products.
What's nice with that solution is that if at any time your dataset is becoming too big, then you can switch to storing everything in AWS S3 and then querying your parquets files with AWS Athena for example.
Don't use emoji, space, or weird characters this will lead to errors just for a name, do you really what to find that the name `$t0️⃣ck of W/-\ll $street.csv` was causing your bug after 10 hours?
Don't create your own scheduler within the scheduler, like schedule every minute a script to choose which action to do. It will create a ton of output file, max you file storage very fast and debug with be a crazy hell!
Don't put password or sensitive information in your notebook, for many reason this is not the rigth place to store them, use our secret system instead, it store them encoded on your machine, not perfect but much better !