Max Garmash

AI/ML & Data Engineering — consulting and development

I eliminate work that doesn't deserve human attention. Raw data and manual processes become predictable AI services and automation — in weeks, not months.

Discuss your project View projects

Approach

If an action repeats without thinking — it’s a defect, not work.

Data shouldn’t be moved by hand. Reports shouldn’t be assembled manually. Routine questions shouldn’t cost a specialist’s time.

In 22 years the tools have changed — ETL pipelines, orchestrators, AI agents — but the principle hasn’t: every process that repeats without thinking can and should be handed to a system. The sharper this line is drawn, the more people work on what actually requires a human mind.

What I do

Data & ETL audit

Fixed 2-week sprint. I identify pipeline bottlenecks, failure points, and data quality risks. You get a prioritized roadmap and target architecture: what to fix first and how long it takes.

KafkaAirflowNiFiHadoopPostgreSQLData GovernanceClickHouseDataOpsSQL

AI/ML MVP

4–6 weeks from hypothesis to test production. Scoring, recommendations, forecasting, RAG-powered documentation assistant — I take one problem and deliver a measurable result you can show to the business.

OpenAI APILlamaRAGMLOpsPythonVector DBFine-tuningPrompt EngineeringEmbeddingsNLPFastAPI

Integration & pipelines

End-to-end data flows between CRM, ERP, SAP, analytics, and external services. Normalization, cleansing, routing — so data doesn’t get lost in transit, reports add up, and decisions come faster.

NiFiKafkaAirflowSAPREST APISOAPPostgreSQLMSSQLn8nRabbitMQRedis1CCRMERPGraphQL

AI process automation

Internal AI tools for your team: support chatbots, ticket classification, document summarization, automated reports. Each of these processes repeats without thinking — which means a human shouldn’t be doing it.

n8nLLMAI AgentsRAGOpenAI APIPythonWhisper

Who it's for

→ Data across a dozen systems — SAP, 1C, CRM, Excel — and no single source of truth
→ AI prototype exists, but not enough expertise to bring it to production and scale
→ CRM, ERP, analytics live in silos — data gets lost at handoffs, diverges across branches
→ Team drowning in routine — tickets, doc searches, reports — no time left for strategy
→ Plenty of data already, but no data team or infrastructure to process it
→ Legacy landscape: modernization is overdue, but unclear where to start without breaking what works

Projects

ETL platform for an industrial conglomerate

NDA · industrial sector

Context & Challenge

A major industrial group with multiple manufacturing divisions across Russian regions. Data from SAP, MSSQL, MySQL, and PostgreSQL existed in silos, reporting was assembled manually, and feeding parameters into the SCADA system required complex business logic for transformation and validation.

Solution

Designed and deployed a centralized ETL platform: Apache NiFi for orchestrating flows between sources, Airflow for scheduling batch jobs, Kafka for real-time event streaming. Built a normalization and cleansing layer, plus an experimental transformation aggregate for preparing data for SCADA with complex routing and validation at every stage.

Result

Unified data flow across all divisions. Reporting time — from days to hours. Manual reconciliation eliminated entirely. The SCADA subsystem now receives verified data automatically.

Stack

Apache NiFiAirflowKafkaSAPMSSQLMySQLPostgreSQL

Social graph analysis and opinion leader detection

NDA · government agency (CIS)

Context & Challenge

A government agency in a CIS country needed continuous social media monitoring: collecting and storing graph data (user connections, communities, audience overlaps), identifying influence clusters, and tracking the emergence of new opinion leaders in near real-time.

Solution

Deployed a Cloudera Hadoop cluster (HDFS, YARN, Hive, HBase, Spark) for storing and processing large data volumes. Crawling was implemented in Python (Scrapy) with distributed task scheduling. Graph structures stored in Neo4j, cluster analysis and opinion leader ranking via Spark GraphX. Automated analytical report generation on a schedule.

Result

The system processes millions of nodes and edges. New opinion leaders are detected automatically within 24 hours of reaching critical mass. Analytical reports are generated without manual intervention.

Stack

HadoopHDFSSparkSpark GraphXHBaseHiveNeo4jScrapyPython

Distributed crawling and data pipeline

NDA · commercial client

Context & Challenge

The client needed regular data collection from dozens of external web sources with aggressive anti-bot protection. Data had to be normalized, deduplicated, loaded into Hadoop and the client's internal systems, and made available through management dashboards.

Solution

Built a distributed crawler with rotation through a pool of proxy providers, adaptive algorithms for bypassing rate limits and CAPTCHA, and intelligent request throttling. ETL pipeline: cleansing and normalization → loading into HDFS/Hive → data marts for BI. Management dashboards in Superset. Legal and logistical aspects of data collection were addressed.

Result

Stable automated collection from 50+ sources — zero manual steps. Data available to analysts via dashboards and to ML engineers via Hive/Spark within hours of appearing at the source.

Stack

PythonScrapyHadoopHDFSHiveSparkSuperset

RAG platform for employee training and support

NDA · industrial client

Context & Challenge

The client's technical documentation consisted of thousands of PDF pages in English: regulations, equipment specifications, and operating manuals. Field engineers and operators work in Russian and spent hours searching for the right sections. Critical requirement: zero hallucinations — answers strictly from the documentation text, with no free interpretation allowed due to the domain specifics.

Solution

Built a RAG platform: PDF document loading and parsing with structure preservation (sections, tables, diagrams), chunking with semantic boundary awareness, and indexing into a vector database. The retrieval layer uses hybrid search (semantic + keyword) for precise extraction of relevant fragments. Response generation via GPT-4 with strict prompt engineering: the model answers only based on retrieved fragments, every claim includes a reference to the specific document, section, and page. Cross-lingual capability: question in Russian → search across English documents → answer in Russian with citations from the original.

Result

Information lookup time dropped from hours to seconds. Employees receive precise answers with direct source references; hallucinations are eliminated at the architecture level. The platform is used daily by multiple departments.

Stack

GPT-4RAGVector DBPythonPDF parsingHybrid search

Backend for Gosuslugi (Russian Federal e-Government Portal)

gosuslugi.ru

Context & Challenge

Russia's unified federal e-government portal. Development of several high-load backend subsystems with strict security requirements (including GOST cryptography) and integration with dozens of government agency systems.

Solution

Led the development team, which I built from scratch and grew to 15 people. Stack: Spring, MyBatis, Oracle, RabbitMQ, CXF (SOAP integrations with government agencies), CryptoPro for GOST encryption. Introduced Scrum and CI processes.

Result

Subsystems launched in production, serving tens of millions of citizens. The team continued operating after the handover.

Stack

JavaSpringOracleRabbitMQCryptoProJBoss

How I work

1

Discovery

1–2 weeks. Data, ETL, and infrastructure audit. You get a current-state map and clear priorities.

2

Solution design

Target architecture, risk assessment, cost and timeline estimates. You know what you’re paying for.

3

PoC / MVP

Rapid prototype on real data. A result you can show to the business.

4

Production

Deployment, monitoring, documentation. Your team can operate independently.

5

Support

Iterations, evolution, knowledge transfer. Dependency on me decreases every month.

Ready to discuss your project?

30–40 minute call → we identify bottlenecks → proposal with timeline and cost estimate. Even if we don’t start — you get a fresh perspective on your processes.

Discuss your project

Experience — 22+ years

2022 — present

CTO · Octocode

AI solutions, data pipelines, product and team management. Client projects, proprietary products, and AI/Data consulting.

2017 — present

CEO · Self-employed

Full cycle: from architecture to delivery. Strategic and operational project management, hiring and mentoring.

2015 — 2017

Deputy CTO · I-Sys

BigData platform (Cloudera Hadoop), social media analytics system, FASI grant. Building out the BigData technology practice.

2014 — 2015

Head of Web Development · Softline

Managing a 50-person development office. Introduced CI, code review, and workload management systems.

2011 — 2014

Team / Department Lead · I-Sys

Backend for gosuslugi.ru, RTB platform, grew the team from scratch to 15 people.

Early career: 2004 — 2011

2010 — 2011 — CTO, Sinertech — electronic gradebook system, Scrum adoption

2008 — 2010 — Java Developer, Comments (Moscow) — integrations for NCR, QIWI, Biblio Globus

2006 — 2009 — Software Developer, 36.6 — enterprise information systems

2004 — 2006 — C++/C# Developer, SystemSoft / VITA PLUS — industrial controllers, MFC, .NET

From C++ and industrial systems to first CTO role in 6 years.

Tech stack

AI / ML

LlamaRAGMLOpsFine-tuningPrompt EngineeringEmbeddingsVector DBPyTorchscikit-learnNLPWhisperAI AgentsComputer Vision

Data / ETL

SparkKafkaAirflowNiFiHadoopHDFSHiveHBaseOozieClickHouseYARNMapReduceFlumeData Governance

Databases

PostgreSQLMongoDBOracleMSSQLMySQLElasticsearchRedisClickHouseQdrantSQLiteCouchDBMariaDB

Code

JavaPythonJavaScriptTypeScriptGroovySQLBash

DevOps

DockerAnsibleJenkinsGitLab CIGitHub ActionsKubernetesTerraformHelmNginxLinuxGitPrometheusGrafanaAWSGCPAzureArgoCDVaultConsul

Languages

English (C1 — Advanced)Russian (native)

FAQ

Who will I be working with?

With me directly. When needed, I bring in my team from Octocode for a specific project — developers, data analysts, DevOps engineers.

How does the engagement start?

A 30–40 minute call to understand the problem → brief audit of the current situation → proposal with timeline, cost, and risk estimates.

What engagement models do you offer?

Fixed sprint (2–6 weeks) with clear deliverables, Time & Material for longer projects, or fractional CTO / Head of Data — N hours per month on a recurring basis.

Do you work with remote teams?

Yes. Over 15 years of experience managing distributed teams. Available for on-site visits when needed.

How do you handle confidentiality?

I work under NDA. All data stays on your infrastructure. Access is minimal and revoked after project completion.

What do you need from our team to start?

2–3 hours of a key person’s time for context. After that — a point of contact and system access. I do the heavy lifting.

What’s the ballpark budget?

Depends on the format. Fixed sprint — clear price upfront. T&M — monthly estimate with controls. In both cases — transparent breakdown, no hidden costs. Specific numbers — after a 30-minute task review.

Contact

max@garmash.org Telegram GitHub LinkedIn