Sample Code for Advanced Applications
If you are familiar with common operations, the MoXing Framework API document, and common Python code, you can refer to this section to use advanced MoXing Framework functions.
Closing a File After File Reading Is Completed
When an OBS file is read, an HTTP connection is called to read the network stream. You need to close the file after the file is read. To prevent you from forgetting to close a file, you are advised to use the with statement. When the with statement exits, the close() function of the mox.file.File object is automatically called.
1 2 |
with mox.file.File('obs://bucket_name/obs_file.txt', 'r') as f:
data = f.readlines()
|
Reading or Writing an OBS File Using pandas
- Use pandas to read an OBS file.
1 2 3 4
import pandas as pd import moxing as mox with mox.file.File("obs://bucket_name/b.txt", "r") as f: csv = pd.read_csv(f)
- Use pandas to write an OBS file.
1 2 3 4 5
import pandas as pd import moxing as mox df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) with mox.file.File("obs://bucket_name/b.txt", "w") as f: df.to_csv(f)
Reading an Image Using a File Object
When OpenCV is used to open an image, the OBS path cannot be passed and the image must be read using a file object. The following code cannot read the image:
1 2 |
import cv2
cv2.imread('obs://bucket_name/xxx.jpg', cv2.IMREAD_COLOR)
|
Modify the code as follows:
1 2 3 4 |
import cv2
import numpy as np
import moxing as mox
img = cv2.imdecode(np.fromstring(mox.file.read('obs://bucket_name/xxx.jpg', binary=True), np.uint8), cv2.IMREAD_COLOR)
|
Using an Existing API to Implement an API That Is Not Supported by mox.file
1 2 3 4 5 6 7 8 9 |
import os
import moxing as mox
_origin_isfile = os.path.isfile
def _patch_isfile(path):
return not mox.file.isdir(path)
setattr(os.path, 'isfile', _patch_isfile)
|
Reconstructing an API That Does Not Support OBS Paths to an API That Supports OBS Paths
In pandas, to_hdf and read_hdf used to read and write H5 files do not support OBS paths, nor do they support file objects to be entered. The following code may cause errors:
1 2 3 4 |
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])
df.to_hdf('obs://wolfros-net/hdftest.h5', key='df', mode='w')
pd.read_hdf('obs://wolfros-net/hdftest.h5')
|
The API compiled using the pandas source code is rewritten to support OBS paths.
- Write H5 to OBS = Write H5 to the local cache + Upload the local cache to OBS + Delete the local cache
- Read H5 from OBS = Download H5 to the local cache + Read the local cache + Delete the local cache
That is, write the following code at the beginning of the script to enable to_hdf and read_hdf to support OBS paths:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
import os
import moxing as mox
import pandas as pd
from pandas.io import pytables
from pandas.core.generic import NDFrame
to_hdf_origin = getattr(NDFrame, 'to_hdf')
read_hdf_origin = getattr(pytables, 'read_hdf')
def to_hdf_override(self, path_or_buf, key, **kwargs):
tmp_dir = '/cache/hdf_tmp'
file_name = os.path.basename(path_or_buf)
mox.file.make_dirs(tmp_dir)
local_file = os.path.join(tmp_dir, file_name)
to_hdf_origin(self, local_file, key, **kwargs)
mox.file.copy(local_file, path_or_buf)
mox.file.remove(local_file)
def read_hdf_override(path_or_buf, key=None, mode='r', **kwargs):
tmp_dir = '/cache/hdf_tmp'
file_name = os.path.basename(path_or_buf)
mox.file.make_dirs(tmp_dir)
local_file = os.path.join(tmp_dir, file_name)
mox.file.copy(path_or_buf, local_file)
result = read_hdf_origin(local_file, key, mode, **kwargs)
mox.file.remove(local_file)
return result
setattr(NDFrame, 'to_hdf', to_hdf_override)
setattr(pytables, 'read_hdf', read_hdf_override)
setattr(pd, 'read_hdf', read_hdf_override)
|
Use MoXing to Enable h5py.File to Support OBS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
import os
import h5py
import numpy as np
import moxing as mox
h5py_File_class = h5py.File
class OBSFile(h5py_File_class):
def __init__(self, name, *args, **kwargs):
self._tmp_name = None
self._target_name = name
if name.startswith('obs://'):
self._tmp_name = name.replace('/', '_')
if mox.file.exists(name):
mox.file.copy(name, os.path.join('cache', 'h5py_tmp', self._tmp_name))
name = self._tmp_name
super(OBSFile, self).__init__(name, *args, **kwargs)
def close(self):
if self._tmp_name:
mox.file.copy(self._tmp_name, self._target_name)
super(OBSFile, self).close()
setattr(h5py, 'File', OBSFile)
arr = np.random.randn(1000)
with h5py.File('obs://bucket/random.hdf5', 'r') as f:
f.create_dataset("default", data=arr)
with h5py.File('obs://bucket/random.hdf5', 'r') as f:
print(f.require_dataset("default", dtype=np.float32, shape=(1000,)))
|
Last Article: Sample Code for Common Operations
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.