Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Common Operations of Alluxio

Updated on 2024-07-19 GMT+08:00

Preparations

  1. Create a cluster with Alluxio installed.
  2. Log in to the active Master node in a cluster as user root using the password set during cluster creation.
  3. Run the following command to configure environment variables:

    source /opt/client/bigdata_env

Using the Alluxio Shell

The Alluxio shell contains multiple command line operations that interact with Alluxio.

  • View a file system operation command list:

    alluxio fs

  • Run the ls command to list the files in Alluxio. For example, list all files in the root directory:

    alluxio fs ls /

  • Run the copyFromLocal command to copy local files to Alluxio:

    alluxio fs copyFromLocal /home/test_input.txt /test_input.txt

    Command output:
    Copied file:///home/test_input.txt to /test_input.txt
  • Run the ls command again to list the files in Alluxio. The copied test_input.txt file is listed:

    alluxio fs ls /

    Command output:
    12       PERSISTED 11-28-2019 17:10:17:449 100% /test_input.txt

    The test_input.txt file is displayed in Alluxio. The parameters in the file indicate the file size, whether the file is persistent, creation date, cache ratio of the file in Alluxio, and file name.

  • Run the cat command to print file content:

    alluxio fs cat /test_input.txt

    Command output:
    Test Alluxio

Mounting Function of Alluxio

Alluxio uses a unified namespace feature to unify the access to storage systems. For details, see https://docs.alluxio.io/os/user/2.0/en/advanced/Namespace-Management.html.

This feature allows users to mount different storage systems to an Alluxio namespace and seamlessly access files across storage systems through the Alluxio namespace.

  1. Create a directory as a mount point in Alluxio.
    alluxio fs mkdir /mnt
    Successfully created directory /mnt
  2. Mount an existing OBS file system to Alluxio. (Prerequisite: An agency with the OBS OperateAccess permission has been configured for the cluster.) The obs-mrstest file system is used as an example. Replace the file system name with the actual one.
    alluxio fs mount /mnt/obs obs://obs-mrstest/data
    Mounted obs://obs-mrstest/data at /mnt/obs
  3. List files in the OBS file system using the Alluxio namespace. Run the ls command to list the files in the OBS mount directory.
    alluxio fs ls /mnt/obs
    38       PERSISTED 11-28-2019 17:42:54:554   0% /mnt/obs/hive_load.txt
    12       PERSISTED 11-28-2019 17:43:07:743   0% /mnt/obs/test_input.txt

    You can also view the newly mounted files and directories on the Alluxio web UI.

  4. After the mounting is complete, you can seamlessly exchange data between different storage systems through the unified namespace of Alluxio. For example, run the ls -R command to list all files in a directory recursively:
    alluxio fs ls -R /
            0       PERSISTED 11-28-2019 11:15:19:719  DIR /app-logs
            1       PERSISTED 11-28-2019 11:18:36:885  DIR /apps
            1       PERSISTED 11-28-2019 11:18:40:209  DIR /apps/templeton
    239440292       PERSISTED 11-28-2019 11:18:40:209   0% /apps/templeton/hive.tar.gz
    .....
            1       PERSISTED 11-28-2019 19:00:23:879  DIR /mnt
            2       PERSISTED 11-28-2019 19:00:23:879  DIR /mnt/obs
           38       PERSISTED 11-28-2019 17:42:54:554   0% /mnt/obs/hive_load.txt
           12       PERSISTED 11-28-2019 17:43:07:743   0% /mnt/obs/test_input.txt
    .....

    The command output shows all files that are from the mounted storage system in the root directory of the Alluxio file system (the default directory is the HDFS root directory, that is, hdfs://hacluster/). The /app-logs and /apps directories are in HDFS, and the /mnt/obs/ directory is in OBS.

Using Alluxio to Accelerate Data Access

Alluxio can accelerate data access, because it uses memory to store data. Example commands are provided as follows:

  1. Upload the test_data.csv file (a sample that records recipes) to the /data directory of the obs-mrstest file system. Run the ls command to display the file status.
    alluxio fs ls /mnt/obs/test_data.csv
    294520189       PERSISTED 11-28-2019 19:38:55:000   0% /mnt/obs/test_data.csv

    The output indicates that the cache percentage of the file in Alluxio is 0%, that is, the file is not in Alluxio memory.

  2. Count the occurrence times of the word "milk" in the file, and calculate the time consumed.
    time alluxio fs cat /mnt/obs/test_data.csv | grep -c milk
    52180
    
    real    0m10.765s
    user    0m5.540s
    sys     0m0.696s
  3. Data is stored in memory after being read for the first time. When Alluxio reads data again, the data access speed is increased. For example, after running the cat command to obtain a file, run the ls command to check the file status.
    alluxio fs ls /mnt/obs/test_data.csv
    294520189       PERSISTED 11-28-2019 19:38:55:000 100% /mnt/obs/test_data.csv

    The output shows that the file has been fully loaded to Alluxio.

  4. Access the file again, count the occurrence times of the word "eggs", and calculate the time consumed.
    time alluxio fs cat /mnt/obs/test_data.csv | grep -c eggs
    59510
    
    real    0m5.777s
    user    0m5.992s
    sys     0m0.592s

    According to the comparison of the two time consumption records, the time consumed for accessing data stored in Alluxio memory is significantly reduced.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback