StorageUtils
What it is
StorageUtils is a thin helper around an IObjectStorageDomain implementation that reads/writes common file formats (text, images, CSV, Excel, JSON, YAML, RDF triples, PowerPoint) to object storage and optionally creates a timestamped copy on write.
Public API
class StorageUtils(storage_service: IObjectStorageDomain)- Wraps an object storage service to provide convenience methods.
Read operations
get_text(dir_path, file_name, encoding="utf-8") -> str | None- Fetches and decodes a text object.
get_image(dir_path, file_name) -> bytes | None- Fetches raw bytes (intended for images).
get_csv(dir_path, file_name, sep=";", decimal=",", encoding="utf-8") -> pandas.DataFrame- Fetches and parses CSV content from bytes.
get_excel(dir_path, file_name, sheet_name, skiprows=0, usecols=None) -> pandas.DataFrame- Fetches and reads an Excel sheet from bytes.
get_json(dir_path, file_name) -> dict- Fetches, decodes as UTF-8, parses JSON.
get_yaml(dir_path, file_name) -> dict- Fetches, decodes as UTF-8, parses YAML with
yaml.safe_load.
- Fetches, decodes as UTF-8, parses YAML with
get_triples(dir_path, file_name, format="turtle") -> rdflib.Graph- Fetches and parses RDF into an
rdflib.Graph.
- Fetches and parses RDF into an
get_powerpoint_presentation(dir_path, file_name) -> io.BytesIO- Fetches bytes and returns a
BytesIOstream positioned at start.
- Fetches bytes and returns a
Write operations (optionally create a timestamped copy)
All save methods return (dir_path, file_name) and accept copy: bool = True to create an additional object named YYYYmmddTHHMMSS_<file_name> in the same prefix.
save_text(text, dir_path, file_name, encoding="utf-8", copy=True) -> (str, str)save_image(image: bytes, dir_path, file_name, copy=True) -> (str, str)save_csv(data: pandas.DataFrame, dir_path, file_name, sep=";", decimal=",", encoding="utf-8", copy=True) -> (str, str)save_excel(data: pandas.DataFrame, dir_path, file_name, sheet_name, copy=True) -> (str, str)save_json(data: dict | list, dir_path, file_name, copy=True) -> (str, str)- Uses
json.dumps(..., indent=4, ensure_ascii=False)encoded as UTF-8.
- Uses
save_yaml(data: dict | list, dir_path, file_name, copy=True) -> (str, str)- Uses
yaml.dump(..., allow_unicode=True, sort_keys=False)encoded as UTF-8.
- Uses
save_triples(graph: rdflib.Graph, dir_path, file_name, format="turtle", copy=True) -> (str, str)- Serializes via
graph.serialize(format=format, sort=False)encoded as UTF-8.
- Serializes via
save_powerpoint_presentation(presentation, dir_path, file_name, copy=True) -> (str, str)- Calls
presentation.save(BytesIO())and stores resulting bytes.
- Calls
Configuration/Dependencies
- Requires an object that implements
naas_abi_core.services.object_storage.ObjectStoragePort.IObjectStorageDomainwith:get_object(prefix: str, key: str) -> bytesput_object(prefix: str, key: str, content: bytes) -> None
- External libraries used:
pandas(CSV/Excel)PyYAML(yaml)rdflib(Graph)
- Logging via
naas_abi_core.logger.
Usage
import pandas as pd
from naas_abi_core.utils.StorageUtils import StorageUtils
# storage_service must implement IObjectStorageDomain (get_object/put_object).
storage = StorageUtils(storage_service)
# Save/load text
storage.save_text("hello", "my/prefix", "hello.txt", copy=True)
txt = storage.get_text("my/prefix", "hello.txt")
# Save/load CSV
df = pd.DataFrame([{"a": 1}, {"a": 2}])
storage.save_csv(df, "my/prefix", "data.csv")
df2 = storage.get_csv("my/prefix", "data.csv")
Caveats
- Error handling is "best-effort":
- Most
get_*methods return an empty object on failure (None,{},pd.DataFrame(),Graph(), orBytesIO()), after logging. save_*methods return(dir_path, file_name)even on failure (and log errors).
- Most
- The
copy=Trueoption creates an additional timestamped object in the same storage prefix; it does not modify local filesystems. - CSV defaults are locale-like (
sep=";",decimal=","), which may not match typical comma-separated CSVs.