GT

【Cloud Infrastructure Engineer (SRE/Datastore)】Large SaaS Company | Hybrid Work / JLPT N2 level

G Talent
Posted 23 hours ago
Visa sponsorship
Japan
Engineering & Development

Support summary

Relocation support

No relocation support identified.

Visa sponsorship

Explicitly identified in the job description.

About this role

★Cloud Infrastructure Engineer (SRE/Data Store) | Featured Venture Company

  • Business Level Japanese Required (N2 Level)


◆ Location: Tokyo / Osaka

◆ Flextime & Hybrid Work

◆ Global Team

◆ Annual salary: 6 million yen - 15 million yen


-------------【About the company】-------------


The company is a manufacturer of groupware, a tool designed to foster teamwork through information sharing. They provide end-to-end service, handling everything from product planning and development to sales, operation, and support in-house.


The mission is to improve teamwork in organizations around the world. They develop groupware that supports thorough information sharing, smooth communication, and improves individual learning and satisfaction. Their products are used by a wide variety of organizations—including companies, local governments, universities, hospitals, and NPOs—to manage schedules, customer information, and other crucial business data. To date, they have contributed to improving teamwork for over 12 million users worldwide.


In addition to nine locations across Japan, the company has been expanding globally since it first entered Shanghai, China, in 2007. It has since grown into the United States, Southeast Asia, and Oceania, and now operates from seven international locations. The challenge to become "world number one" is truly underway.


■Groupware Business

The company develops, provides, and supports tools that foster teamwork through information sharing and communication.


■Main Products

・Cloud business application builder

・Groupware for small-to-medium enterprises (SMEs)

・Groupware for large and medium-sized enterprises

・Shared email management system


-------------【 Job Description】-------------


【Overview】

The company's in-house cloud service handles petabyte-scale user data using OSS (Open Source Software) and proprietary middleware. In recent years, increases in user numbers and the diversification of usage have led to a growth in data volume and a demand for higher performance. This requires storage that offers lower management costs, higher scalability, and superior performance.

The company is recruiting members to develop and operate the storage for a new platform, as well as to educate internal users.


The Data Store Team is divided into three sub-teams, based primarily on the middleware they handle.


・Log Infrastructure Team

The Log Infrastructure Team develops and operates the data aggregation platform used for handling access logs, service usage statistics, and customer contract information. They support various data utilization needs, including internal business intelligence, support operations, and providing analysis data to partners. They also collect application and system software logs, which are used for investigation and troubleshooting.


Currently, the company provides cloud services using a VM infrastructure on bare-metal servers. As they plan to migrate to a container-based platform using Kubernetes—where processes are generated dynamically with finer granularity—the team is also advancing observability by combining logs with metrics and tracing.


・DBRE (Database Reliability Engineering) Team

The DBRE Team manages the large-scale database clusters that support the in-house cloud service and develops/operates various components that enable database utilization.


The team will be actively involved in migrating to a new database platform. This new platform will utilize a proprietary Kubernetes MySQL Operator to build and automate the MySQL cluster.


・Storage Team

The Storage Team builds storage on the on-premise Kubernetes cluster using Rook, Ceph, and a proprietary OSS (Open Source Software) CSI plugin. They are active contributors to the OSS they use, regularly submitting issues and pull requests. Additionally, they are developing the backend storage service utilized by the company's cloud applications using Ceph. The team will collaborate with application teams to migrate the existing storage infrastructure to the new platform.


*Note: The company is currently hiring for positions within the Log Infrastructure Team and the DBRE Team.


【Responsibilities】

Log Infrastructure Team

・A log collection and analysis platform to support data utilization across the company.

・An observability platform to ensure high site reliability.


DBRE (Database Reliability Engineering) Team

・Operating the database clusters used by the company's cloud service.

・Designing and developing monitoring systems and SLOs (Service Level Objectives) for the database clusters.

・Developing a deployment pipeline to manage and operate a large number of database instances.

・Developing microservices used to distribute the company's cloud service data across a large number of database instances.

・Verifying and implementing MySQL version upgrades and new features.


Storage Team

・Developing and operating storage infrastructure using Ceph and Rook.

・Developing the company's proprietary Kubernetes CSI plugin.

・Working with TopoLVM.

・Designing and supporting the migration from existing infrastructure to the new platform.


【Development Environment】

・Log Infrastructure Team

Go

Python

・DBRE Team

Go

Python

・Storage Team

Go

Python

C++


-------------【 Requirements】-------------


【Required】

Basic knowledge of Linux server operations

Possession of at least one of the preferred skills

Empathy with the company's philosophy


【Preferred】

・Log Infrastructure Team

Experience developing and operating data platforms built with Kafka and Hadoop

Experience developing and operating observability components such as VictoriaMetrics, Loki, and Tempo

・DBRE Team

Experience operating MySQL in large-scale environments

Experience developing API servers using the Go language

Experience practicing Infrastructure as Code (IaC) and GitOps

Experience participating in incident response and on-call rotations

Proficiency in at least one prominent DBMS or storage-related technology

・Storage Team

Experience building and operating large-scale storage systems

Experience building systems with system failure and BCP awareness

Experience developing web applications considering load from massive access and large data volumes

Experience using Kubernetes

Experience operating Ceph

Deep knowledge of Linux kernel file systems and block device subsystems

Experience implementing Kubernetes CSI Plugins


--------------------------------------------------


【Working Time 】

Flextime System


【Welfare/Holidays】

Remote work allowance (5,000/month)

Employee stock ownership plan (company matches purchase of up to 10% of paycheck)

Skill development support/Language Learning Support (120,000 yen a year)

New tech gear

Book Purchasing

Commuter allowance

Comprehensive health and social insurance

Parental Leave

Relocation and Visa Sponsorship to Japan offered

20 days of paid leave

5 days of Proactive Personal Leave

5 days of Family Care / Personal Sick Leave

Two days off per week (Saturday and Sunday), Public holidays in Japan

-------------------------------------------------------------------------

Similar jobs