Sample Code for Advanced Applications
If you are familiar with common operations, the MoXing Framework API document, and common Python code, you can refer to this section to use advanced MoXing Framework functions.
Closing a File After File Reading Is Completed
When an OBS file is read, an HTTP connection is called to read the network stream. You need to close the file after the file is read. To prevent you from forgetting to close a file, you are advised to use the with statement. When the with statement exits, the close() function of the mox.file.File object is automatically called.
1 2 3 |
import moxing as mox with mox.file.File('obs://bucket_name/obs_file.txt', 'r') as f: data = f.readlines() |
Reading or Writing an OBS File Using pandas
- Use pandas to read an OBS file.
1 2 3 4
import pandas as pd import moxing as mox with mox.file.File("obs://bucket_name/b.txt", "r") as f: csv = pd.read_csv(f)
- Use pandas to write an OBS file.
1 2 3 4 5
import pandas as pd import moxing as mox df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) with mox.file.File("obs://bucket_name/b.txt", "w") as f: df.to_csv(f)
Reading an Image Using a File Object
When OpenCV is used to open an image, the OBS path cannot be passed and the image must be read using a file object. The following code cannot read the image:
1 2 |
import cv2 cv2.imread('obs://bucket_name/xxx.jpg', cv2.IMREAD_COLOR) |
Modify the code as follows:
1 2 3 4 |
import cv2 import numpy as np import moxing as mox img = cv2.imdecode(np.fromstring(mox.file.read('obs://bucket_name/xxx.jpg', binary=True), np.uint8), cv2.IMREAD_COLOR) |
Reconstructing an API That Does Not Support OBS Paths to an API That Supports OBS Paths
In pandas, to_hdf and read_hdf used to read and write H5 files do not support OBS paths, nor do they support file objects to be entered. The following code may cause errors:
1 2 3 4 |
import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c']) df.to_hdf('obs://wolfros-net/hdftest.h5', key='df', mode='w') pd.read_hdf('obs://wolfros-net/hdftest.h5') |
The API compiled using the pandas source code is rewritten to support OBS paths.
- Write H5 to OBS = Write H5 to the local cache + Upload the local cache to OBS + Delete the local cache
- Read H5 from OBS = Download H5 to the local cache + Read the local cache + Delete the local cache
That is, write the following code at the beginning of the script to enable to_hdf and read_hdf to support OBS paths:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
import os import moxing as mox import pandas as pd from pandas.io import pytables from pandas.core.generic import NDFrame to_hdf_origin = getattr(NDFrame, 'to_hdf') read_hdf_origin = getattr(pytables, 'read_hdf') def to_hdf_override(self, path_or_buf, key, **kwargs): tmp_dir = '/cache/hdf_tmp' file_name = os.path.basename(path_or_buf) mox.file.make_dirs(tmp_dir) local_file = os.path.join(tmp_dir, file_name) to_hdf_origin(self, local_file, key, **kwargs) mox.file.copy(local_file, path_or_buf) mox.file.remove(local_file) def read_hdf_override(path_or_buf, key=None, mode='r', **kwargs): tmp_dir = '/cache/hdf_tmp' file_name = os.path.basename(path_or_buf) mox.file.make_dirs(tmp_dir) local_file = os.path.join(tmp_dir, file_name) mox.file.copy(path_or_buf, local_file) result = read_hdf_origin(local_file, key, mode, **kwargs) mox.file.remove(local_file) return result setattr(NDFrame, 'to_hdf', to_hdf_override) setattr(pytables, 'read_hdf', read_hdf_override) setattr(pd, 'read_hdf', read_hdf_override) |
Use MoXing to Enable h5py.File to Support OBS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
import os import h5py import numpy as np import moxing as mox h5py_File_class = h5py.File class OBSFile(h5py_File_class): def __init__(self, name, *args, **kwargs): self._tmp_name = None self._target_name = name if name.startswith('obs://'): self._tmp_name = name.replace('/', '_') if mox.file.exists(name): mox.file.copy(name, os.path.join('cache', 'h5py_tmp', self._tmp_name)) name = self._tmp_name super(OBSFile, self).__init__(name, *args, **kwargs) def close(self): if self._tmp_name: mox.file.copy(self._tmp_name, self._target_name) super(OBSFile, self).close() setattr(h5py, 'File', OBSFile) arr = np.random.randn(1000) with h5py.File('obs://bucket/random.hdf5', 'r') as f: f.create_dataset("default", data=arr) with h5py.File('obs://bucket/random.hdf5', 'r') as f: print(f.require_dataset("default", dtype=np.float32, shape=(1000,))) |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot