Migrating GitLab Intranet Repositories to CodeArts Repo in Batches
Background
Currently, CodeArts Repo only allows for repository migrations between public networks. There is no quick solution for migrating repositories from a platform built on the customer's intranet to CodeArts Repo. Therefore, we provide a script for migrating repositories from an intranet platform to CodeArts Repo in batches.
Configuring the SSH Public Key for Accessing CodeArts Repo
To migrate GitLab code repositories to CodeArts Repo in batches, you must install the Git Bash client and configure the locally generated SSH public key to CodeArts Repo. The procedure is as follows:
- Run Git Bash to check whether an SSH key has been generated locally.
If you select the RSA algorithm, run the following command in Git Bash:
cat ~/.ssh/id_rsa.pub
If you select the ED255219 algorithm, run the following command in Git Bash:
cat ~/.ssh/id_ed25519.pub
- If No such file or directory is displayed, no SSH key has been generated on your computer. Go to 2.
- If a character string starting with ssh-rsa or ssh-ed25519 is returned, an SSH key has been generated on your computer. If you want to use the generated key, go to 3. If you want to generate a new key, go to 2.
- Generate an SSH key. If you select the RSA algorithm, run the following command to generate a key in Git Bash:
ssh-keygen -t rsa -b 4096 -C your_email@example.com
In the preceding command, -t rsa indicates that an RSA key is generated, -b 4096 indicates the key length (which is more secure), and -C your_email@example.com indicates that comments are added to the generated public key file to help identify the purpose of the key pair.
If you select the ED25519 algorithm, run the following command to generate a key in Git Bash:
ssh-keygen -t ed25519 -b 521 -C your_email@example.com
In the preceding command, -t ed25519 indicates that an ED25519 key is generated, -b 521 indicates the key length (which is more secure), and -C your_email@example.com indicates that comments are added to the generated public key file to help identify the purpose of the key pair.
Press Enter. The key is stored in ~/.ssh/id_rsa or ~/.ssh/id_ed25519 by default, the corresponding public key file is ~/.ssh/id_rsa.pub or ~/.ssh/id_ed25519.pub.
- Copy the SSH public key to the clipboard. Run the corresponding command based on your operating system to copy the SSH public key to your clipboard.
- Windows:
clip < ~/.ssh/id_rsa.pub
- macOS:
pbcopy < ~/.ssh/id_rsa.pub
- Linux (xclip required):
xclip -sel clip < ~/.ssh/id_rsa.pub
- Windows:
- Log in to Repo and go to the code repository list page. Click the alias in the upper right corner and choose This Account Settings > Repo > SSH Keys. The SSH Keys page is displayed.
You can also click Set SSH Keys in the upper right corner of the code repository list page. The SSH Keys page is displayed.
- In Key Name, enter a name for your new key. Paste the SSH public key copied in 3 to Key and click OK. The message "The key has been set successfully. Click Return immediately, automatically jump after 3s without operation" is displayed, indicating that the key is set successfully.
Migrating GitLab Intranet Repositories to CodeArts Repo in Batches
- Go to Python official website to download and install Python3.
- Log in to GitLab and obtain private_token. In User settings, choose Access Tokens > Add new token.
- You need to generate an SSH public key locally and configure it in GitLab and CodeArts Repo. For details about how to configure it in CodeArts Repo, see Configuring the SSH Public Key for Accessing CodeArts Repo.
- Call the API for obtaining a user token (using a password). Use the password of your account to obtain a user token. Click the request example button on the right of the API debugging page, set parameters, click the debug button, and copy and save the obtained user token to the local host.
- Use the obtained user token to configure the config.json file. source_host_url indicates the GitLab API address on your intranet, and repo_api_prefix indicates the open API address of CodeArts Repo.
{ "source_host_url": "http://{source_host}/api/v4/projects?simple=true", "private_token": "private_token obtained from GitLab", "repo_api_prefix": "https://${open_api}", "x_auth_token": " User Token" }
- Log in to the CodeArts console, click , select a region, and click Access Service.
- On the CodeArts homepage, click Create Project, and select Scrum. If there is no project on the homepage, click Select on the Scrum card. After creating a project, save the project ID.
- Use the obtained project ID to configure the plan.json file. The following example shows the migration configuration of two code repositories. You can configure the file as required. g1/g2/g3 indicates the repository group path. If the path is not pre-created, it will be automatically generated according to the configuration.
[ ["path_with_namespace", "Project ID", "g1/g2/g3/Target repository name 1"], ["path_with_namespace", "Project ID", "g1/g2/g3/Target repository name 2"] ]
- To create a repository group, go to the CodeArts Repo homepage, click the drop-down list box next to New Repository, and select New Repository Group.
- Repository name: Start with a letter, digit, or underscore (_), and use letters, digits, hyphens (-), underscores (_), and periods (.). Do not end with .git, .atom, or periods (.).
- On the local Python console, create a migrate_to_repo.py file.
#!/usr/bin/python # -*- coding: UTF-8 -*- import json import logging import os import subprocess import time import urllib.parse import urllib.request from logging import handlers # Skip creating a repository with the same name. SKIP_SAME_NAME_REPO = True STATUS_OK = 200 STATUS_CREATED = 201 STATUS_INTERNAL_SERVER_ERROR = 500 STATUS_NOT_FOUND = 404 HTTP_METHOD_POST = "POST" CODE_UTF8 = 'utf-8' FILE_SOURCE_REPO_INFO = 'source_repos.json' FILE_TARGET_REPO_INFO = 'target_repos.json' FILE_CONFIG = 'config.json' FILE_PLAN = 'plan.json' FILE_LOG = 'migrate.log' X_AUTH_TOKEN = 'x-auth-token' class Logger(object): def __init__(self, filename): format_str = logging.Formatter('%(asctime)s - %(pathname)s[line:%(lineno)d] - %(levelname)s: %(message)s') self.logger = logging.getLogger(filename) self.logger.setLevel(logging.INFO) sh = logging.StreamHandler() sh.setFormatter(format_str) th = handlers.TimedRotatingFileHandler(filename=filename, when='D', backupCount=3, encoding=CODE_UTF8) th.setFormatter(format_str) self.logger.addHandler(sh) self.logger.addHandler(th) log = Logger(FILE_LOG) def make_request(url, data={}, headers={}, method='GET'): headers["Content-Type"] = 'application/json' headers['Accept-Charset'] = CODE_UTF8 params = json.dumps(data) params = bytes(params, 'utf8') try: import ssl ssl._create_default_https_context = ssl._create_unverified_context request = urllib.request.Request(url, data=params, headers=headers, method=method) r = urllib.request.urlopen(request) if r.status != STATUS_OK and r.status != STATUS_CREATED: log.logger.error('request error: ' + str(r.status)) return r.status, "" except urllib.request.HTTPError as e: log.logger.error('request with code: ' + str(e.code)) msg = str(e.read().decode(CODE_UTF8)) log.logger.error('request error: ' + msg) return STATUS_INTERNAL_SERVER_ERROR, msg content = r.read().decode(CODE_UTF8) return STATUS_OK, content def read_migrate_plan(): log.logger.info('read_migrate_plan start') with open(FILE_PLAN, 'r') as f: migrate_plans = json.load(f) plans = [] for m_plan in migrate_plans: if len(m_plan) != 3: log.logger.error("line format not match \"source_path_with_namespace\",\"project_id\",\"target_namespace\"") return STATUS_INTERNAL_SERVER_ERROR, [] namespace = m_plan[2].split("/") if len(namespace) < 1 or len(namespace) > 4: log.logger.error("group level support 0 to 3") return STATUS_INTERNAL_SERVER_ERROR, [] l = len(namespace) plan = { "path_with_namespace": m_plan[0], "project_id": m_plan[1], "groups": namespace[0:l - 1], "repo_name": namespace[l - 1] } plans.append(plan) return STATUS_OK, plans def get_repo_by_plan(namespace, repos): if namespace not in repos: log.logger.info("%s not found in gitlab, skip" % namespace) return STATUS_NOT_FOUND, {} repo = repos[namespace] return STATUS_OK, repo def repo_info_from_source(config): if os.path.exists(FILE_SOURCE_REPO_INFO): log.logger.info('get_repos skip: %s already exist' % FILE_SOURCE_REPO_INFO) return STATUS_OK log.logger.info('get_repos start') headers = {'PRIVATE-TOKEN': config['private_token']} url = config['source_host_url'] per_page = 100 page = 1 data = {} while True: url_with_page = "%s&page=%s&per_page=%s" % (url, page, per_page) status, content = make_request(url_with_page, headers=headers) if status != STATUS_OK: return status repos = json.loads(content) for repo in repos: namespace = repo['path_with_namespace'] repo_info = {'name': repo['name'], 'id': repo['id'], 'path_with_namespace': namespace, 'ssh_url': repo['ssh_url_to_repo']} data[namespace] = repo_info if len(repos) < per_page: break page = page + 1 with open(FILE_SOURCE_REPO_INFO, 'w') as f: json.dump(data, f, indent=4) log.logger.info('get_repos end with %s' % len(data)) return STATUS_OK def get_repo_dir(repo): return "repo_%s" % repo['id'] def exec_cmd(cmd, ssh_url, dir_name): log.logger.info("will exec %s %s" % (cmd, ssh_url)) pr = subprocess.Popen(cmd + " " + ssh_url, cwd=dir_name, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) (out, error) = pr.communicate() log.logger.info("stdout of %s is:%s" % (cmd, str(out))) log.logger.info("stderr of %s is:%s" % (cmd, str(error))) if "Error" in str(error) or "err" in str(error) or "failed" in str(error): log.logger.error("%s failed" % cmd) return STATUS_INTERNAL_SERVER_ERROR return STATUS_OK def clone_from_source(config, plans): log.logger.info('clone_repos start') with open(FILE_SOURCE_REPO_INFO, 'r') as f: repos = json.load(f) for plan in plans: status, repo = get_repo_by_plan(plan["path_with_namespace"], repos) if status == STATUS_NOT_FOUND: return status name = repo["name"] dir_name = get_repo_dir(repo) folder = os.path.exists(dir_name) if folder: log.logger.info("skip clone " + name) continue os.makedirs(dir_name) status = exec_cmd("git clone --mirror", repo['ssh_url'], dir_name) if status != STATUS_OK: return status log.logger.info('clone_repos end') return STATUS_OK def get_groups(config, project_id): log.logger.info('get_groups start') headers = {X_AUTH_TOKEN: config['x_auth_token']} api_prefix = config['repo_api_prefix'] limit = 100 offset = 0 data = {} while True: url_with_page = "%s/v4/%s/manageable-groups?offset=%s&limit=%s" % (api_prefix, project_id, offset, limit) status, content = make_request(url_with_page, headers=headers) if status != STATUS_OK: return status, dict() rows = json.loads(content) for row in rows: full_name = row['full_name'] data[full_name] = row if len(rows) < limit: break offset = offset + len(rows) log.logger.info('get_groups end with %s' % len(data)) return STATUS_OK, data def create_group(config, project_id, name, parent, has_parent): log.logger.info('create_group start') headers = {X_AUTH_TOKEN: config['x_auth_token']} api_prefix = config['repo_api_prefix'] data = { 'name': name, 'visibility': 'private', 'description': '' } if has_parent: data['parent_id'] = parent['id'] url = "%s/v4/%s/groups" % (api_prefix, project_id) status, content = make_request(url, data=data, headers=headers, method='POST') if status != STATUS_OK: log.logger.error('create_group error: %s', str(status)) return status return STATUS_OK # Specify a repository group to create a repository. def create_repo(config, project_id, name, parent, has_parent): log.logger.info('create_repo start') headers = {X_AUTH_TOKEN: config['x_auth_token']} api_prefix = config['repo_api_prefix'] data = { 'name': name, 'project_uuid': project_id, 'enable_readme': 0 } if has_parent: data['group_id'] = parent['id'] url = "%s/v1/repositories" % api_prefix status, content = make_request(url, data=data, headers=headers, method='POST') if "repository or repository group with the same name" in content: log.logger.info("repo %s already exist. %s" % (name, content)) log.logger.info("skip same name repo %s: %s" % (name, SKIP_SAME_NAME_REPO)) return check_repo_conflict(config, project_id, parent, name) elif status != STATUS_OK: log.logger.error('create_repo error: %s', str(status)) return status, "" response = json.loads(content) repo_uuid = response["result"]["repository_uuid"] # Check after the creation. for retry in range(1, 4): status, ssh_url = get_repo_detail(config, repo_uuid) if status != STATUS_OK: if retry == 3: return status, "" time.sleep(retry * 2) continue break return STATUS_OK, ssh_url def check_repo_conflict(config, project_id, group, name): if not SKIP_SAME_NAME_REPO: return STATUS_INTERNAL_SERVER_ERROR, "" log.logger.info('check_repo_conflict start') headers = {X_AUTH_TOKEN: config['x_auth_token']} api_prefix = config['repo_api_prefix'] url_with_page = "%s/v2/projects/%s/repositories?search=%s" % (api_prefix, project_id, name) status, content = make_request(url_with_page, headers=headers) if status != STATUS_OK: return status, "" rows = json.loads(content) for row in rows["result"]["repositories"]: if "full_name" in group and "group_name" in row: g = group["full_name"].replace(" ", "") if row["group_name"].endswith(g): return STATUS_OK, row["ssh_url"] elif "full_name" not in group and name == row['repository_name']: # For scenarios with no repository group. return STATUS_OK, row["ssh_url"] log.logger.info('check_repo_conflict end, failed to find: %s' % name) return STATUS_INTERNAL_SERVER_ERROR, "" def get_repo_detail(config, repo_uuid): log.logger.info('get_repo_detail start') headers = {X_AUTH_TOKEN: config['x_auth_token']} api_prefix = config['repo_api_prefix'] url_with_page = "%s/v2/repositories/%s" % (api_prefix, repo_uuid) status, content = make_request(url_with_page, headers=headers) if status != STATUS_OK: return status, "" rows = json.loads(content) log.logger.info('get_repo_detail end') return STATUS_OK, rows["result"]["ssh_url"] def process_plan(config, plan): # Obtain the repository group list of a project. project_id = plan["project_id"] status, group_dict = get_groups(config, project_id) if status != STATUS_OK: return status, "" group = "" last_group = {} has_group = False for g in plan["groups"]: # Check the target repository group. If the target repository group exists, check the next layer. if group == "": group = " %s" % g else: group = "%s / %s" % (group, g) if group in group_dict: last_group = group_dict[group] has_group = True continue # If the file does not exist, create one and update it. status = create_group(config, project_id, g, last_group, has_group) if status != STATUS_OK: return status, "" status, group_dict = get_groups(config, project_id) if status != STATUS_OK: return status, "" last_group = group_dict[group] has_group = True status, ssh_url = create_repo(config, project_id, plan["repo_name"], last_group, has_group) if status != STATUS_OK: return status, "" return status, ssh_url def create_group_and_repos(config, plans): if os.path.exists(FILE_TARGET_REPO_INFO): log.logger.info('create_group_and_repos skip: %s already exist' % FILE_TARGET_REPO_INFO) return STATUS_OK log.logger.info('create_group_and_repos start') with open(FILE_SOURCE_REPO_INFO, 'r') as f: repos = json.load(f) target_repo_info = {} for plan in plans: status, ssh_url = process_plan(config, plan) if status != STATUS_OK: return status status, repo = get_repo_by_plan(plan["path_with_namespace"], repos) if status == STATUS_NOT_FOUND: return repo['codehub_sshUrl'] = ssh_url target_repo_info[repo['path_with_namespace']] = repo with open(FILE_TARGET_REPO_INFO, 'w') as f: json.dump(target_repo_info, f, indent=4) log.logger.info('create_group_and_repos end') return STATUS_OK def push_to_target(config, plans): log.logger.info('push_repos start') with open(FILE_TARGET_REPO_INFO, 'r') as f: repos = json.load(f) for r in repos: repo = repos[r] name = repo["name"] dir_name = get_repo_dir(repo) status = exec_cmd("git config remote.origin.url", repo['codehub_sshUrl'], dir_name + "/" + name + ".git") if status != STATUS_OK: log.logger.error("%s git config failed" % name) return status = exec_cmd("git push --mirror -f", "", dir_name + "/" + name + ".git") if status != STATUS_OK: log.logger.error("%s git push failed" % name) return log.logger.info('push_repos end') def main(): with open(FILE_CONFIG, 'r') as f: config = json.load(f) # read plan status, plans = read_migrate_plan() if status != STATUS_OK: return # Obtain the list of self-built GitLab repositories and export the result to the FILE_SOURCE_REPO_INFO file. if repo_info_from_source(config) != STATUS_OK: return # Clone the repository to your local host. status = clone_from_source(config, plans) if status != STATUS_OK: return # Call the CodeArts API to create a repository and record its address in the FILE_SOURCE_REPO_INFO file. if create_group_and_repos(config, plans) != STATUS_OK: return # Push code using SSH. Configure the SSH key in CodeArts Repo first. push_to_target(config, plans) if __name__ == '__main__': main()
- Run the following command to start the script and migrate code repositories in batches:
python migrate_to_repo.py
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot