Interface to ‘Metaflow’ https://metaflow.org/, a framework for constructing and managing data science workflows. ‘Metaflow’ implements a unified API for the entire data science project lifecycle, from initial prototyping all the way to production deployment. Key features encompass version control, scalability, and seamless integration with popular cloud orchestration tools. This R package enables data scientists to harness ‘Metaflow’s’ capabilities within their preferred R environment, facilitating efficient development and deployment of data science projects.
Installation
You can install the development version of metaflow from GitHub with:
devtools::install_github("bcgalvin/metaflow-r")Implemented Features
The metaflow package offers functionality for managing Metaflow named profiles and provides integration with the Metaflow S3 client. Here’s an overview of the main features:
Profile Management
The package provides functions to manage Metaflow profiles, allowing you to handle different configurations for your workflows. It respects METAFLOW_HOME and METAFLOW_PROFILE environment variables.
# List all profiles in Metaflow home directory
list_profiles()
# Get Active Metaflow Profile
get_active_profile()
# Update the Metaflow profile by name or path
update_profile(name = "my_profile")
# or
update_profile(path = "/path/to/profile.json")S3 Client
The metaflow S3 client is now fully implemented with R6 classes but there will be a more R-friendly interface for this soon. Check out the vignettes for the R equivalents to the metaflow s3 client python docs.
library(metaflow)
s3 <- S3$new(s3root='s3://metaflow-r-s3/tmp/s3demo/')
s3obj <- s3$get('fruit')
# > s3obj
# S3 Object:
# URL: s3://metaflow-r-s3/tmp/s3demo/fruit
# Key: fruit
# Prefix: s3://metaflow-r-s3/tmp/s3demo
# Size: 9 bytes
# Exists: TRUE
# Downloaded: TRUE
# Content Type: binary/octet-stream
# Last Modified: 2024-10-15 16:23:46
# Has Info: TRUE
# Encryption: AES256
cat('location', s3obj$url, '\n')
cat('key', s3obj$key, '\n')
cat('size', s3obj$size, '\n')
cat('local path', s3obj$path, '\n')
cat("bytes", as.character(s3obj$blob), "\n") # return is really a python bytes object but can't cat it
cat('unicode', s3obj$text, '\n')
cat('metadata', s3obj$metadata, '\n')
cat('content-type', s3obj$content_type, '\n')
cat('downloaded', s3obj$downloaded, '\n')
# location s3://metaflow-r-s3/tmp/s3demo/fruit
# key fruit
# size 9
# local path metaflow.s3.dcn4wxq6/metaflow.s3.one_file.o34_2d3z
# bytes pineapple
# unicode pineapple
# metadata NULL
# content-type binary/octet-stream
# downloaded TRUE
s3 <- S3$new()
res <- s3$get('s3://metaflow-r-s3/tmp/external_data')
# > res$text
# [1] "I know nothing about Metaflow"