StorageUtils
What it is
- A small helper class that reads/writes common file formats to an object storage backend.
- Wraps an
IObjectStorageDomainimplementation and provides convenience methods for text, images, CSV/Excel, JSON/YAML, RDF triples, and PowerPoint presentations. - Optionally creates a timestamped copy of the stored content on each save.
Public API
Class: StorageUtils
Constructor
StorageUtils(storage_service: IObjectStorageDomain)- Binds the utility to an object storage service.
Read methods
get_text(dir_path: str, file_name: str, encoding: str = "utf-8") -> str | None- Fetches bytes from storage and decodes to text. Returns
Noneon error.
- Fetches bytes from storage and decodes to text. Returns
get_image(dir_path: str, file_name: str) -> bytes | None- Fetches raw bytes. Returns
Noneon error.
- Fetches raw bytes. Returns
get_csv(dir_path: str, file_name: str, sep: str = ";", decimal: str = ",", encoding: str = "utf-8") -> pandas.DataFrame- Reads CSV content into a DataFrame. Returns empty
DataFrame()on error.
- Reads CSV content into a DataFrame. Returns empty
get_excel(dir_path: str, file_name: str, sheet_name: str, skiprows: int = 0, usecols: list | None = None) -> pandas.DataFrame- Reads an Excel sheet into a DataFrame. Returns empty
DataFrame()on error.
- Reads an Excel sheet into a DataFrame. Returns empty
get_json(dir_path: str, file_name: str) -> Dict- Loads JSON into a Python object (typed as
Dictin signature). Returns{}on error.
- Loads JSON into a Python object (typed as
get_yaml(dir_path: str, file_name: str) -> Dict- Loads YAML via
yaml.safe_load. Returns{}on error or if YAML content is empty/null.
- Loads YAML via
get_triples(dir_path: str, file_name: str, format: str = "turtle") -> rdflib.Graph- Parses RDF content into an
rdflib.Graph. Returns an emptyGraph()on error.
- Parses RDF content into an
get_powerpoint_presentation(dir_path: str, file_name: str) -> io.BytesIO- Returns a
BytesIOstream for the presentation content. Returns emptyBytesIO()on error.
- Returns a
Write methods
save_text(text: str, dir_path: str, file_name: str, encoding: str = "utf-8", copy: bool = True) -> tuple[str, str]- Encodes and stores text. Optionally writes a timestamped copy.
save_image(image: bytes, dir_path: str, file_name: str, copy: bool = True) -> tuple[str, str]- Stores raw image bytes. Optionally writes a timestamped copy.
save_csv(data: pandas.DataFrame, dir_path: str, file_name: str, sep: str = ";", decimal: str = ",", encoding: str = "utf-8", copy: bool = True) -> tuple[str, str]- Stores a DataFrame as CSV bytes. Optionally writes a timestamped copy.
save_excel(data: pandas.DataFrame, dir_path: str, file_name: str, sheet_name: str, copy: bool = True) -> tuple[str, str]- Stores a DataFrame as an Excel file (in-memory). Optionally writes a timestamped copy.
save_json(data: dict | list, dir_path: str, file_name: str, copy: bool = True) -> tuple[str, str]- Stores JSON (pretty-printed,
ensure_ascii=False). Optionally writes a timestamped copy.
- Stores JSON (pretty-printed,
save_yaml(data: dict | list, dir_path: str, file_name: str, copy: bool = True) -> tuple[str, str]- Stores YAML via
yaml.dump(..., allow_unicode=True, sort_keys=False). Optionally writes a timestamped copy.
- Stores YAML via
save_triples(graph: rdflib.Graph, dir_path: str, file_name: str, format: str = "turtle", copy: bool = True) -> tuple[str, str]- Serializes an RDF graph and stores it. Optionally writes a timestamped copy.
save_powerpoint_presentation(presentation, dir_path: str, file_name: str, copy: bool = True) -> tuple[str, str]- Saves a presentation into an in-memory stream using
presentation.save(stream)and stores it. Optionally writes a timestamped copy.
- Saves a presentation into an in-memory stream using
Configuration/Dependencies
- Requires an object implementing
naas_abi_core.services.object_storage.ObjectStoragePort.IObjectStorageDomainwith:get_object(prefix_or_dir_path, key_or_file_name) -> bytesput_object(prefix: str, key: str, content: bytes) -> None
- External libraries used:
pandas(CSV/Excel)PyYAML(yaml)rdflib(Graph)
- Uses
naas_abi_core.loggerfor debug/warning/error logs.
Usage
import pandas as pd
from naas_abi_core.utils.StorageUtils import StorageUtils
# storage_service must implement IObjectStorageDomain
storage = StorageUtils(storage_service)
# Text
storage.save_text("hello", dir_path="mydir", file_name="hello.txt", copy=False)
print(storage.get_text("mydir", "hello.txt"))
# CSV
df = pd.DataFrame([{"a": 1, "b": 2}])
storage.save_csv(df, dir_path="mydir", file_name="data.csv", copy=False)
print(storage.get_csv("mydir", "data.csv"))
Caveats
- Error handling is non-throwing:
- Many
get_*methods returnNone,{},BytesIO(), or emptyDataFrame()on failure. save_*methods return(dir_path, file_name)even on failure (and log errors).
- Many
copy=Truecreates an additional object namedYYYYMMDDTHHMMSS_<original_name>in the samedir_path.get_jsonis annotated to returnDictbutjson.loadsmay return alistdepending on stored content.