# Nati Shenkute

> This file is optimized for LLM consumption. It contains the complete professional and intellectual profile of Nati Shenkute, synthesized from all 12 views of cv.nati.sh.
>
> **What is cv.nati.sh?** An audience-adaptive CV — a single-page application that renders differently depending on who's reading. Twelve views: commercial, academic, crypto, developer, executive, investor, recruiter, HR, designer, friend, LLM, and default. Same person, different lenses. The system itself is part of the portfolio.

## Identity

- **Name**: Nati Shenkute (ናቲ ሸንቁጤ)
- **Email**: shenkuten@gmail.com
- **Website**: [nati.sh](https://nati.sh)
- **LinkedIn**: [linkedin.com/in/natitaw](https://linkedin.com/in/natitaw)
- **Location**: Remote (Bangkok / EU)
- **Nationality**: Ethiopian-Swedish
- **Languages**: English (native), Dutch (fluent), Amharic (fluent), Swedish (basic)

## Professional Summary

Intelligence infrastructure engineer. I build predictive systems that give organisations a measurable competitive advantage — models that predict, decompose signals into executable actions, and directly impact revenue. I operate as a sole analytics function within GTM organisations, designing the predictive layer underneath commercial strategy.

At Synthesia (Series E, $4.1B), I built 10 production intelligence modules as sole commercial data scientist. My algorithms beat commercially available alternatives by 60% on the metrics that matter — translating to >€6M in annual revenue impact. The mathematics underneath is not standard: a category-theoretic framework for commercial intelligence, where different data signals unite in geometric and statistical cohesion.

Currently at Jupiter Exchange (Solana DeFi), applying the same intelligence infrastructure to on-chain data — wallet analytics, regime detection, competitive intelligence, and campaign measurement.

## In His Own Words (from /friend)

"I help companies understand their customers using data. Companies have tons of information about how people use their product, whether they're happy, and how much they're paying. I build systems that turn all of that into useful predictions — like which customers might leave, which ones might buy more, and how much money the company will make next quarter. Basically, I'm the person who helps the sales and business teams make smarter decisions by giving them numbers they can actually trust, instead of guessing."

"Here's the part most people in my field wouldn't say out loud: I think customer behavior is geometry. Not as a metaphor — I mean it literally. When someone uses a product, they're tracing a path through a kind of space, and the shape of that space tells you things. Where is it curved? Where is it flat? Where does someone's trajectory bend toward leaving vs. staying?"

"I don't really think in straight lines. I'll have five problems going at once and let them talk to each other until the underlying structure reveals itself."

"The thing I genuinely enjoy is when a real business problem turns out to have beautiful structure underneath. Like: we needed to predict which customers would cancel. Boring question on the surface. But when you look at the math, users aren't just 'active' or 'gone' — they cycle. They leave and come back. So I built a model from survival analysis and Markov chains that predicted the system would settle at about 18% active users at any given time. Eighteen months later, the actual number was 20%. That kind of thing — where first-principles reasoning just... works — is what keeps me doing this."

"My dad told me 'knowledge is like a spring — it's a fountain.' Don't restrict what you share to control who gets credit. The underground reservoir is the asset, not the water on the surface. I think about that a lot when deciding what to build and how to build it. I'd rather make the system that keeps producing insight than the one-off analysis that gets a meeting."

### In Amharic (አማርኛ)

ናቲ ሸንቁጤ ስባል — Data Scientist ነኝ ማለት ኩባንያዎች ደንበኞቻቸውን እንዲረዱ ስርዓቶችን የምሰራ ሰው ነኝ። ደንበኞቻቸው ሊሄዱ እንደሚችሉ፣ ተጨማሪ ሊገዙ እንደሚችሉ ወይም ባለፈው ሩብ ዓመት ምን ያህል ገንዘብ እንደሚያስገኙ መተንበይ — ይሄ ነው ስራዬ። ሁለት ማስተርስ ዲግሪዎች አሉኝ ከ University of Twente ኔዘርላንድ — Computer Science እና Chemical Engineering። ለእኔ አስደሳች የሚሆነው ተራ የንግድ ጥያቄዎች ከስር ቆንጆ የሂሳብ ቅርጽ ሲኖራቸው ነው።

### In Dutch (Nederlands)

Ik ben een data scientist die systemen bouwt om bedrijven te helpen hun klanten te begrijpen — voorspellen welke klanten dreigen te vertrekken, welke klaar zijn om meer te kopen, en hoeveel omzet er volgend kwartaal binnenkomt. Twee masters van de Universiteit Twente (informatica en scheikundige technologie). Wat mij drijft is wanneer gewone zakelijke problemen wiskundig mooi blijken te zijn.

## Education

All degrees from the University of Twente, Netherlands:

- **MSc Computer Science** — Machine Learning, Statistics & Probability
- **MSc Chemical Engineering** — Differential Equations, Process Technology
- **BSc Advanced Technology** — Engineering Mathematics and Physics

The combination is the point: engineering teaches systems and dynamics thinking, CS provides the tools to build with. The math underneath both fields is the same math.

## Career

### Jupiter Exchange — Data Scientist, 2026–Present
Solana's largest DEX aggregator. On-chain intelligence infrastructure: wallet health scoring, behavioral clustering, market regime detection, competitive intelligence, campaign measurement with causal A/B testing.

### Synthesia — Commercial Data Scientist, 2025–2026
Series E, $4.1B (Google, Nvidia). Sole intelligence infrastructure for the GTM organisation. 10 production modules built from scratch in 10 months. Weekly reporting to CRO & VP Sales. Algorithms beat market alternatives by 60% — >€6M annual revenue impact. The customer success organisation was revolutionised around the Health Score.

### Normative — Commercial Data Scientist / BI Lead, 2023–2025
Series A, Google-backed carbon accounting platform. Sole commercial intelligence. Deal forecasting >90% accuracy. Launched PLG channel from scratch. Data architecture for Carbon Network product. Weekly 1:1 with CRO.

### Red Bull — Data Scientist, 2022–2023
Global Data Science team. Real-time production batch quality ML on live sensor data. Flavor-compound identification tool for R&D. Supply chain DS across logistics, production, marketing.

### Diet Doctor — Data Scientist (first hire), 2021–2022
4M+ user subscription health platform. Reported directly to COO. Real-time CLV model contributing to $4M investment round. Churn models + personalised recommendation producing 3–4% retention lift via A/B testing.

### Additional
- **HelloHistory.ai / Humy.ai** — Data Scientist & Engineer. Grew AI coaching app to 20,000 MAU, 20M+ message exchanges.
- **NORTHE (NDA)** — Contract Data Engineer, Series A startup. Full BigQuery/DataFlow infrastructure from inception.

## Production Systems (Built at Synthesia)

### 1. Bayesian Churn Intelligence System
- **Result**: 77% accuracy, 4–5 month advance warning, >€2M ARR saved
- **Scope**: >3,500 enterprise accounts, deployed to CS compensation
- **Method**: Multi-factor Bayesian model with mutual information-weighted evidence, Beta-distributed posteriors, segment-specific learning rates. One behavioral signal carries 1–2 orders of magnitude more mutual information than all others — the approximate minimal sufficient statistic for churn prediction. Temporal anchoring reduces false positives by 30–40%.

### 2. Account Expansion Scoring
- **Result**: 96.6% AUC, 83% accuracy
- **Scope**: Directly integrated into quota allocation & territory planning
- **Method**: Dual-head probabilistic model: P(expansion) × E[ΔARR | expansion]. Product usage velocity, feature breadth, category adoption signals, CRM metadata.

### 3. Intent Signal Model
- **Result**: 83.9% AUC, 4× recall over baseline, 58% faster detection
- **Scope**: >300K contacts, 22-dimensional feature space
- **Method**: Gradient boosting replacing a 1-dimensional cumulative score threshold with 22-dimensional behavioral pattern recognition. Catches 74% of converters vs 18% for the existing system.

### 4. Health Score Architecture
- **Result**: 60% better than commercial alternatives, >€6M annual impact
- **Scope**: Deployed to Planhat, integrated into QBRs, forecasts, incentive structures
- **Method**: Bayesian health scoring with Beta-distributed confidence tracking. Mutual information weighting discovers optimal evidence combination per segment. Decomposes risk signals into simple actions CS teams can execute. Became the operating system for the customer success organisation.

### 5. Opportunity Attribution (DAG)
- **Result**: 99.97% value conservation accuracy
- **Scope**: Complex Salesforce merge topology
- **Method**: Directed Acyclic Graph construction from opportunity merges. Proportional attribution based on contribution weights for fair commission calculation.

### 6. Territory Optimisation
- **Result**: 18% efficiency increase
- **Method**: Constrained optimisation balancing territory coverage expansion with sales velocity maintenance. Segment-fair ranking to prevent gaming through account mix manipulation.

### 7. User Intelligence (GMM Clustering)
- **Result**: Behavioral archetypes without survey data
- **Method**: Gaussian Mixture Model soft-clustering from product telemetry. Entropy quantification. Influence mapping identifies users whose behavior predicts account-level outcomes.

### 8. Revenue Forecasting
- **Result**: 15% accuracy improvement over baseline
- **Method**: Risk-adjusted renewal probability forecasting. Two forecast vectors: retention (renewal_prob × opp_arr) and growth (expansion_prob × expected_arr). Probabilistic bands, not point estimates.

### 9. Marketing Funnel Forecasting
- **Result**: Multi-quarter forward projection for annual planning
- **Method**: 6-component ensemble (seasonal, trend, momentum, growth) with tunable business weights. 50+ forecast series.

### 10. Commercial Data Infrastructure
- **Result**: Full stack: ingestion → analytics → action
- **Stack**: Snowflake, dbt, Census, Airflow, Omni, Salesforce
- All nine intelligence modules deployed on this substrate.

## Capability Ontology

### Mathematical Methods
- **Bayesian inference**: Beta-conjugate priors, sequential posterior updating, credible intervals
- **Survival analysis**: Cox proportional hazards, Kaplan-Meier estimation, cure models, Gamma frailty
- **Semi-Markov processes**: two-state sojourn distributions, renewal equations, prevalence modelling
- **Information theory**: mutual information for feature selection, information-theoretic evidence weighting
- **Gaussian Mixture Models**: soft clustering, entropy quantification, behavioral archetype discovery
- **Gradient boosting**: pseudo-residual fitting, feature importance, high-dimensional classification
- **Causal inference**: A/B testing with holdout groups, collider bias identification, selection bias correction
- **Optimisation**: constrained territory allocation, precision-recall calibration

### Infrastructure
- **Warehousing**: Snowflake, BigQuery, ClickHouse
- **Transformation**: dbt (DAGs, materialisation strategies, incremental models)
- **Orchestration**: Airflow (DAG scheduling, S3 artifact management)
- **Reverse ETL**: Census, Workato (warehouse to CRM sync)
- **Languages**: Python (pandas, scikit-learn, PyTorch, scipy), SQL, R
- **Visualisation**: Omni, Looker, Dune Analytics
- **CRM**: Salesforce (flows, custom objects, reporting, merge topology)
- **Cloud**: GCP, AWS, Azure, Docker
- **On-chain**: Solana instruction parsing, Dune Analytics (Trino SQL)

### Domains
Churn prediction and intervention, revenue and deal forecasting, account expansion scoring, intent signal modelling, opportunity attribution, territory optimisation, user behavioral clustering, customer lifetime value, marketing funnel forecasting, product-led growth analytics, wallet intelligence, market regime detection, DeFi competitive intelligence, on-chain campaign measurement.

## Novel Research Contributions

### Semi-Markov User Activity Model (The Lambda Model)
Two-state semi-Markov process (Active/Inactive). Kaplan-Meier survival estimation, first-return density via discrete convolution, coupled renewal equations. Predicted ~18% steady-state activity from first principles; validated at ~20% over 18 months. ~47pp gap between cumulative return and point-in-time activity. Naive Markov overpredicts (0.43 vs 0.20).

### Collider Bias in Conditional Survival Analysis
Identified collider structure explaining why consistency dominates retention prediction but shows C~0.50 in Cox regression for return from inactivity. Empirical phase transition at consistency threshold: transition probability shifts from 0.68 to 0.18.

### Information-Theoretic Bayesian Churn Prediction
Beta-conjugate sequential learning with segment-specific learning rates. One behavioral signal carries 1–2 orders of magnitude more mutual information than remaining signals — the approximate minimal sufficient statistic.

### Micro-to-Macro Causal Aggregation
User-level return probability causally determines account-level retention through linear aggregation. Semi-Markov dynamics compose linearly into the commercial quantity.

### Category Theory of Commercial Intelligence
A theoretical framework where commercial data signals are unified through category-theoretic structure — functors between data categories, natural transformations as model updates, cohomological modules for health/expansion/intent. The framework is isomorphic to algebraic geometry: descent = Bayesian updating, sites = tier-specific learning rates, cohomologies = intelligence modules.

## Mathematical Formulations (from /academic)

### I. Semi-Markov User Activity Model
Two-state process S = {Active, Inactive}. Kaplan-Meier survival estimator for sojourn distributions. First-return density via discrete convolution f_return(t) = sum over k of [f_AI(t-k) * f_IA(k)]. Coupled renewal equations in matrix form. Stationary distribution with cure model decomposition: fraction c never returns, fraction (1-c) follows the renewal. Predicted ~18% steady-state from first principles; validated at ~20%.

### II. Collider Bias in Conditional Survival
Conditioning on "returned from inactivity" induces collider structure. Consistency dominates retention prediction (high concordance) but shows C~0.50 in Cox regression for return. Empirical phase transition at consistency threshold: p_AI shifts from 0.68 to 0.18.

### III. Bayesian Sequential Learning
Beta-conjugate prior Beta(alpha_0, beta_0). Weekly posterior update: alpha_t = alpha_{t-1} + eta_s * w_i * x_i, where eta_s = segment-specific learning rate, w_i = mutual information weight for signal i. One signal carries 1–2 orders of magnitude more MI than all others combined.

### IV. Cox Proportional Hazards with Gamma Frailty
h(t|x, z) = h_0(t) * z * exp(beta' * x). Multiplicative Gamma frailty z ~ Gamma(k, k) for user-level heterogeneity. Hazard ratios inhabit (R+, ×) isomorphic to (R, +) via exponential map.

### V. High-Dimensional Intent Classification
Gradient boosting ensemble F(x) = sum of f_m(x), initialized at population log-odds. Top 2 features account for ~59% of total importance. Conversion signal concentrates on a low-dimensional submanifold of the full feature space.

### VI. Micro-to-Macro Causal Aggregation
User-level return probability causally determines account-level retention through linear aggregation. Bottom-up revenue forecasting: retention vector (p_renewal × ARR) + growth vector (p_expand × E[ΔARR]).

## SaaS → DeFi Transfer (from /crypto)

| Capability | SaaS Evidence | DeFi Application |
|---|---|---|
| Churn / Attrition | 77% accuracy, 4–5mo warning | Wallet frequency decay → dormancy prediction |
| Growth Scoring | 96.6% AUC expansion model | Wallet volume growth propensity |
| User Segmentation | GMM soft persona clustering | Trade metadata clustering (whale/retail/bot) |
| Revenue Models | 15% forecast improvement | Fee revenue = health × mix × regime |
| Attribution | 99.97% value conservation | Ultra routing — true market share |
| A/B Testing | Holdout-based causal lift | On-chain campaign ROI measurement |

## DeFi Capabilities

### Wallet Intelligence
Health scoring, churn prediction, and behavioral clustering (GMM) from on-chain metadata. Timing, size, and frequency reveal everything without KYC.

### On-Chain Data Engineering
Raw Solana instruction parsing, decoded tables, cross-platform joins. Built pipelines where no decoded tables existed.

### Market Regime Detection
Volume trend, volatility, SOL momentum, net flow. Every downstream model conditions on regime state.

### Competitive Intelligence
Per-wallet loyalty scoring across protocols. Cross-platform attribution showing true market share vs naive calculations.

### Campaign Measurement
A/B testing with holdout groups for on-chain campaigns. Causal lift measurement, not correlation.

### Revenue Forecasting
Active wallets × frequency × fee × regime. Probabilistic bands. Same framework as SaaS, different data.

## Technical Stack

- **ML/Stats**: Bayesian inference (Beta-conjugate priors, sequential updating), survival analysis (Cox PH, Kaplan-Meier, cure models, Gamma frailty), semi-Markov processes, mutual information, GMM, gradient boosting, SHAP, A/B testing, PyTorch, scikit-learn
- **Data Engineering**: Snowflake, BigQuery, ClickHouse, dbt, Airflow, Census, Workato, Docker
- **Visualisation**: Omni, Looker, Dune Analytics, SQL
- **CRM**: Salesforce (flows, custom objects, reporting, merge topology), Planhat, HubSpot
- **Cloud**: GCP, AWS, Azure
- **On-Chain**: Solana instruction parsing, Dune Analytics (Trino SQL), DeFi protocol analytics
- **Languages**: Python, SQL, R

## Key Metrics Summary

| Metric | Value | Context |
|--------|-------|---------|
| Churn prediction accuracy | 77% | 4–5 month advance warning |
| Expansion scoring AUC | 96.6% | Integrated into quota planning |
| Intent model AUC | 83.9% | 4× recall over baseline |
| Deal forecasting accuracy | >90% | Used in board-level calls |
| Health Score vs market | 60% better | >€6M annual revenue impact |
| Attribution value conservation | 99.97% | Complex merge topology |
| Territory efficiency gain | 18% | Constrained optimisation |
| Retention lift (A/B tested) | 3–4% | Personalised recommendation |
| Production modules (10 months) | 10 | Sole data scientist |

## Sites

| URL | Description |
|-----|-------------|
| [cv.nati.sh](https://cv.nati.sh) | Audience-adaptive CV — renders differently depending on who's reading |
| [cv.nati.sh/commercial](https://cv.nati.sh/commercial) | GTM / intelligence infrastructure view |
| [cv.nati.sh/academic](https://cv.nati.sh/academic) | Research paper format with mathematical notation |
| [cv.nati.sh/crypto](https://cv.nati.sh/crypto) | DeFi / on-chain intelligence view |
| [cv.nati.sh/developer](https://cv.nati.sh/developer) | Terminal-style technical view |
| [cv.nati.sh/executive](https://cv.nati.sh/executive) | Executive brief format |
| [cv.nati.sh/investor](https://cv.nati.sh/investor) | Metrics-forward investor view |
| [cv.nati.sh/recruiter](https://cv.nati.sh/recruiter) | ATS-optimized recruiter view |
| [cv.nati.sh/hr](https://cv.nati.sh/hr) | Traditional HR format |
| [cv.nati.sh/designer](https://cv.nati.sh/designer) | Minimalist design view |
| [cv.nati.sh/friend](https://cv.nati.sh/friend) | Conversational view (includes Amharic and Dutch) |
| [cv.nati.sh/llm](https://cv.nati.sh/llm) | Structured data view optimized for LLM parsing |
| [chain.nati.sh](https://chain.nati.sh) | On-chain intelligence architecture for DeFi |
| [math.nati.sh](https://math.nati.sh) | Interactive mathematical constellation — geometry meets poetry |
| [o.nati.sh](https://o.nati.sh) | Autonomous reasoning visualisation — animated agent lattice |
| [graph.nati.sh](https://graph.nati.sh) | Graph visualisation tool |
| [moon.nati.sh](https://moon.nati.sh) | Lunar terminal — phases, position, timing |
| [map.nati.sh](https://map.nati.sh) | Network graph of the full nati.sh domain |

## Executive Summary (from /executive)

Commercial data scientist who builds the intelligence infrastructure between product data and revenue decisions. Operates as a sole analytics function within GTM organisations, designing systems that directly shape commercial strategy — not dashboards, but the predictive layer underneath them. Built 10 interconnected production modules at Synthesia ($4.1B) in 10 months: churn prediction driving over €2M in saved ARR, expansion models shaping territory planning, intent scoring replacing manual SDR prioritisation, and forecasting systems used in board-level revenue calls. Dual MSc (Computer Science + Chemical Engineering) with a bias toward mathematical rigour applied to business-critical decisions.

## Investor TL;DR (from /investor)

Solo commercial data scientist building predictive intelligence infrastructure at scale. Track record of measurable business impact: 77% churn prediction accuracy (Synthesia, $4.1B), 96.6% AUC expansion scoring, >90% deal forecasting accuracy (Normative), and CLV models directly contributing to a $4M investment round (Diet Doctor). Operates as a one-person analytics team reporting to C-suite.

## Side Projects

| Site | Description |
|---|---|
| cv.nati.sh | This site — audience-adaptive CV, 12 views from the same data |
| math.nati.sh | Interactive mathematical constellation — geometry meets poetry |
| o.nati.sh | Autonomous reasoning visualisation — animated lattice of agents |
| chain.nati.sh | On-chain intelligence architecture for DeFi |
| graph.nati.sh | Graph visualisation tool — exploring data as connected nodes |
| moon.nati.sh | Lunar terminal — phases, position, timing |
| map.nati.sh | Network graph of the full nati.sh domain |

## Intellectual Framework

Customer behavior is geometry — not as metaphor, literally. When someone uses a product, they trace a path through a space. The shape of that space (curvature, flatness, trajectory bends) determines outcomes. The dual MSc combination (CS + Chemical Engineering) converges on this: engineering teaches systems and dynamics thinking, CS provides the implementation tools. The math underneath both fields is identical.

The Category Theory of Commercial Intelligence is a theoretical framework formalizing how different commercial data signals (churn, expansion, intent, attribution) unite through categorical structure. The framework is isomorphic to algebraic geometry: Bayesian updating = descent, tier-specific learning rates = sites, health/expansion/intent modules = cohomologies. This mapping is literal, not metaphorical — discovered retroactively, not designed.

---

*This document is served at cv.nati.sh/bot.md for LLM discovery. For the interactive human experience, visit [cv.nati.sh](https://cv.nati.sh).*