Available for Data / AI infrastructure roles

DATA
ENGINEER

AI, Data and Cloud enthusiast building large-scale data pipelines and agentic AI systems. Years of data engineering experience in Marketing & Business and Satellite data.

4+
Years in data
Geospatial
Satellite data
Business
Analytics data
4.0
MSc CS GPA
Top 1%
Undergrad rank

About Me

// who I am
Shrey Niraula

I am Shrey Niraula, a data engineer and computer science graduate. Growing up in Japan, I spent countless hours around games and computers, which sparked a curiosity about how software works and eventually inspired me to start building things of my own.

That curiosity guided me through Electronics Engineering and into a career centered on data, software, and AI. Today, I work at the intersection of data engineering and agentic systems, exploring how intelligent applications can combine reliable data foundations with autonomous reasoning and decision-making.

Having spent part of my childhood in Japan, I still enjoy keeping up with the language and culture. Outside of work, you will usually find me watching films and anime, listening to music, or diving into whatever new curiosity has captured my attention.

Nepali (native) Japanese (fluent) English (fluent)

Skills

// my stack
Programming Languages
Python
Scala
SQL
Cloud & Big Data
AWS (EC2, S3, Lambda)
Athena · Glue · DynamoDB · EMR · MSK
Spark
Apache Airflow
Databases & Caching
MySQL
Redshift
Dremio
Iceberg
Postgres
Redis
Software Engineering
Flask
Docker
RabbitMQ
Git
Terraform
RESTful APIs
React
Linux
CI/CD
OOP
Multithreading
Distributed Systems
Data Concepts
Data Lake
Data Warehousing
Data Modeling
SCD
Data Orchestration
Data Governance
ETL / ELT
OLAP
AI / ML
Ollama
LangGraph
RAG
Prompt Engineering
Vector Databases
Data Science
Pandas
NumPy
Matplotlib
Statistical Modeling

Experience

// where I have worked
Aug 2024 to Present Current

Data Engineer

Huntsville, AL, USA
NASA IMPACT (CSDA Program)
  • Architected an Agentic Cloud Decision Framework using self-hosted LLMs (Ollama) for tool-augmented reasoning over petabyte-scale storage inventories, with a code-interpreter agent that automates multi-tier storage recommendations between MCP and NGAP and Human-in-the-Loop (HITL) checkpoints for final governance.
  • Migrated legacy vendor ingestion workflows (Airbus and BlackSky) into a partitioned Airflow pipeline using PySTAC for metadata standardization, ensuring end-to-end reliability through integrated Pytest validation suites for expansion logic and transformations.
  • Automated the manual MAXAR checksum verification workflow with an Airflow-orchestrated solution that exports metadata from DynamoDB and detects missing checksums via Athena, eliminating 100 percent of manual validation effort.
  • Migrated legacy Airflow DAGs to modern standards and optimized key workflows, reducing pipeline runtime.
Apache AirflowAWS (Athena, DynamoDB, S3, Glue)PySTACMetadata mappingPytest
May 2022 to Jul 2024

Mid-Level Data Engineer

Lalitpur, Nepal
GrowByData Services
  • Owned end-to-end integration from POC, acquisition and ETL to reporting backend logic for LeadGen and Amazon data sources.
  • Re-architected and scaled Pinterest, Facebook, TikTok, SEO, Google Ads, Bing Ads, and Shopify ingestion pipelines using multithreading and memory optimization, achieving 2 to 5 times faster acquisition speeds.
  • Migrated a monolithic Redshift architecture into a lakehouse-based system (S3, Glue Catalog, Dremio), reducing compute cost by over 2 times and eliminating CPU bottlenecks.
  • Refactored data anonymization logic using Python decorators, cutting demo report development time across 100+ reports from 3 months to 2 weeks.
  • Integrated Redis caching into the reporting layer, reducing query latency by around 70 percent.
  • Mentored junior engineers and trained interns in Python, SQL, Spark ETL, dimensional modeling, and lakehouse architecture.
LakehouseDatalakeEMRGlueCrawlerRedisOptimizationETL OrchestrationDremio
May 2021 to May 2022

Associate Data Engineer

Lalitpur, Nepal
GrowByData Services
  • Built the first production acquisition pipeline for Pinterest Ads with OAuth2 authentication.
  • Scaled existing Facebook Ads acquisition to handle the throttle-limit issue, ensuring stable acquisition.
  • Resolved a production issue in ETL pipelines, gaining hands-on experience with star-schema modeling and Slowly Changing Dimension (SCD).
  • Migrated Talend ETLs into distributed Spark (Scala and SparkSQL), cutting Redshift CPU load by 2 to 3 times and reducing ETL latency by around 3 times.
Data WarehousingData ModelingTalendPySparkScalaSparkSQLRedshiftStar-Schema DesignOAuth2

Education

// where I studied
GPA 4.0

The University of Alabama in Huntsville

Aug 2024 to 2026 Huntsville, AL, USA
Master's in Computer Science, Data Science Concentration

Thesis: Agentic Cloud Decision Framework, multi-agent local LLM orchestration and tool-augmented reasoning for large-scale NASA cloud storage.

Advanced DatabaseData Structures & AlgorithmsBig Data ComputingCloud ComputingMachine Learning
81.70% (Rank 4th, top 1%)

Tribhuvan University, IOE, Pulchowk Campus

2016 to 2021 Lalitpur, Nepal
Bachelor's in Electronics and Communication Engineering

Ncell Excellence Cash Award (2019) for topping the consecutive 3rd and 4th semesters.

Object Oriented ProgrammingDatabaseArtificial IntelligenceData MiningLinear AlgebraComputer Architecture

Projects

// swipe through the work
Self-Hosted Agentic LLM Framework
01 / 05
MS Thesis 2026

Self-Hosted Agentic LLM Framework

Multi-agentic systems via local LLMs, with typed outputs, deterministic verification, and a self-correction loop, producing auditable results.

Pydantic AIOllamaMulti-AgentText-to-SQLPython
Visual SLAM in Dynamic Scenes
02 / 05
Final Year 2020

Visual SLAM in Dynamic Scenes

Visual SLAM project to localize and navigate a mobile robot in a dynamic environment.

Visual SLAMOpenVSLAMROSICNetPyTorchSemantic Segmentation
Precision Livestock Farming
03 / 05
Locus Winner 2019

Precision Livestock Farming

A data-collection and analysis system that monitors broiler chickens to estimate welfare and automate their habitat.

YOLOSORTSound AnalysisIoTMobile AppAutomation
Traffic Management & Analysis
04 / 05
2019

Traffic Management & Analysis

An intelligent traffic system for Kathmandu Valley that detects flow, analyzes data, and optimizes signal timing across major intersections.

PythonFlaskJavaScriptAJAX / jQueryPTV VissimData CollectionSimulationData Modeling
05 / 05
2019

ABU Robocon 2019 Simulator

A graphics project that simulates ABU Robocon 2019 robots and the arena to find the best robot configuration for the competition.

C++OpenGLCamera CalibrationLighting EffectsModel Rendering

Participations

// hackathons & programs
International Hackathon 2020

Quantum Hack

First International Digital Hackathon in Nepal
AI Research 2019

Nepal Winter School in AI

AI-NAAMII Winter School, Pokhara
Hardware Systems 2019

Hardware Project Competition

Locus Nepal
Hackathon 2019

Everest Hack

Everest Hack Nepal
Assistive Tech 2018

Disaster Hack

AT-Hackathon Nepal
Innovation 2018

Locus 2018

Locus Nepal

Certifications

// courses I completed
JEES Dec 2023

Japanese-Language Proficiency Test (JLPT) N2

Verify
Coursera Aug 2020

Front-End Web Development with React

Verify
Coursera Aug 2020

Server-side Development with NodeJS, Express and MongoDB

Verify
Coursera Jun 2020

SQL for Data Science

Verify
Coursera Jan 2020

Machine Learning (Andrew Ng)

Verify
edX Sep 2020

Applied Deep Learning Capstone Project (DL0320EN)

Verify

Contacts

// open to work and good conversations