top of page
  • White LinkedIn Icon
  • GitHub

Why should you hire me?

 I am an AWS Certified Solutions Architect Associate with strong technical expertise and hands-on experience in developing ETL data pipelines, data transformation processes, and creating dashboards. I am skilled in Python (Pandas, Numpy), SQL, Apache Spark (PySpark, SparkSQL), AWS Lambda, AWS Glue, AWS EMR, Snowflake, and various AWS services. I am also proficient in orchestrating workflows using Apache Airflow and Docker. My experience in creating interactive Tableau dashboards from various data warehousing services ensures that I can effectively drive data-driven decision-making to help answer your business questions.

IMG_5128_edited_edited.jpg

Ravi Shankar Poosa

Data Engineer | Data Analyst | AWS certified Solutions Architect Associate.

Email:

Address:

Overland Park, Kansas, United States.

WORK EXPERIENCE

EXPERIENCE
R_edited.png

December 2020 – December 2022

Capgemini

Data Engineer

  •  Led the preparation of of ETL data pipelines, optimizing data ingestion and transformation of insurance data using Apache Spark within AWS Glue and AWS EMR, improving processing efficiency by 30% through various Spark optimization techniques.

  • Developed Spark scripts using PySpark and SparkSQL for handling null values, data masking, standardization, and metric calculations, reducing data processing errors by 25% and enhancing data quality.

  • Utilized AWS Glue to discover and catalog data from AWS S3, establishing a central metadata repository for data schemas and locations.

  • Configured AWS EMR to execute PySpark scripts for transforming claims data, employing Spark optimization techniques to efficiently load Parquet-formatted data into Snowflake, leading to a 40% reduction in data loading times and improved query performance.

  • Worked closely with data analysts, business stakeholders, and AWS administrators to document transformation scripts, job configurations, and data handling practices, ensuring alignment across teams.

  • Conducted comprehensive analysis of insurance data in Snowflake and PostgreSQL, identifying key trends, patterns, and outliers. Created dashboards comparing current monthly data with the same month in previous years, driving a 15% improvement in claims management and supporting strategic decisions.

  • Designed interactive dashboards in Tableau to visualize key performance indicators (KPIs) and metrics related to insurance operations. These dashboards facilitated data-driven decision-making and increased stakeholder engagement by 25%.

  • Generated regular and ad-hoc reports, conducting statistical analysis and data modeling. This work derived actionable insights and recommendations for operational improvements, leading to a 10% enhancement in operational efficiency.

EXPERTISE

PROJECTS

CONTACT

FDA Drug Adverse Events ETL Project

  • GitHub
  • icons8-tableau-software-144
Architecture_diagram.png
Frame 1 (73).png
  • Led the end-to-end development of an ETL pipeline to extract, transform, and load FDA drug adverse events using Open source FDA API with modern data engineering tools and cloud technologies, and visualize transformed data using Tableau Desktop.

  • Designed and implemented Python scripts leveraging Pandas for data extraction and transformation from the FDA's open-source API into AWS S3.

  • Orchestrated the ETL workflow using Apache Airflow, ensuring robust data pipeline automation and scheduling.

  • Utilized Docker for containerizing the Airflow environment, enhancing scalability and deployment efficiency.

  • Implemented data modeling and storage solutions in Snowflake to load transformed data with optimized data accessibility and query performance.

  • Developed interactive dashboards by connection tableau to snowflake.

  • Technologies used: Python, Pandas, AWS S3, Snowflake, Apache Airflow, Docker Desktop, Tableau Desktop.

​​

  • GitHub
  • icons8-tableau-software-144

Real-Time Insurance Claims Data ETL Pipeline

architecture_diagram_edited.jpg
Frame 1 (71).png
  • Led the development of a real-time ETL pipeline to process insurance data sourced from Kaggle, leveraging Apache Airflow for orchestration, Snowflake for data modeling, storage, change data capture, and tableau for data visualization.
  • Designed and implemented Python scripts using Pandas for data extraction, cleaning, normalization, and transformation from the Kaggle API into AWS S3.

  • Orchestrated the ETL workflow using Apache Airflow hosted on an EC2 instance for initial data extracting and preprocessing, ensuring scalability and automation.

  • Using AWS S3 as a data lake, implemented advanced data ingestion techniques in Snowflake, such as storage integration object, stages and snow pipe to ingest data into staging tables.

  • Implemented change data capture using Snowflake streams and tasks and load transformed data into analytics tables.

  • Created Tableau live connection to snowflake tables for real time dashboards.

  • Technologies used: Python, Pandas, AWS S3, Snowflake, Apache Airflow, EC2, Tableau Desktop, VS code​

​

  • GitHub
  • icons8-tableau-software-144

YouTube Data Analysis ETL Using AWS Cloud Services

youtube_data_analysis_architecture_edite
Frame 1 (74).png
  • Developed an ETL pipeline to streamline analysis of structured and semi-structured YouTube data using AWS Cloud Services and generating data insights using Tableau.
  • Designed and implemented ETL processes for data in AWS S3 by utilizing AWS Glue for data cataloging and transformation csv data, and AWS Lambda for handling json data to transform into parquet format.
  • Implemented monitoring and scalability using AWS CloudWatch.
  • Loaded processed parquet data into AWS Redshift for analytics and to visualize insights from YouTube video metrics across different regions using Tableau Desktop. 
  • Technologies used: AWS S3, AWS Glue, AWS Lambda, AWS Athena, AWS Redshift, Tableau Desktop, Apache Spark, AWS CloudWatch, Python, Pandas, SQL.

​​

 

Washington State Electric Vehicles ETL Data Pipeline

  • GitHub
  • icons8-tableau-software-144
ELECTRIC_VEHICLES_ETL.drawio.png
Frame 1 (72).png
  • ​Developed an ETL pipeline to process and analyze electric vehicle population data for Washington State using Apache Airflow, AWS Cloud Services, Snowflake, and Tableau.
  • Designed and implemented ETL processes to extract raw CSV data from AWS S3, utilizing Apache Spark on AWS EMR for data cleaning, standardization, and conversion into fact and dimension tables in Parquet format.
  •  Orchestrated the ETL workflow using Apache Airflow, and deployed on an AWS EC2 instance using Docker for containerization and scalability.
  •  Loaded the processed Parquet data into Snowflake for efficient querying and analytics.
  • Created a Tableau dashboard to visualize insights from the electric vehicle data, highlighting trends in adoption.
  • Technologies used: Pyspark, SparkSQL, AWS EMR,  AWS S3, Airflow, AWS Ec2, Docker, Snowflake, Tableau, SQL.

​​

 

SKILLS

TABLEAU DASHBOARDS

Frame 1 (45)_edited.png
EDUCATION

EDUCATION

Jan  2023 - Present 

University of Central Missouri, Warrensburg MO.

Master’s  of  Science in Computer Science

Jan 2017 - August 2021. 

Jawaharlal Nehru Technological University - Hyderabad, Telangana, India.

Bachelor of Technology in Computer Science Engineering

SKILLS

Tools & Technologies

Frame 2 (8).png
Frame 1 (53)_edited.png
EDUCATION

CERTIFICATIONS & BADGES

Frame 1 (46).png
aws-certified-solutions-architect-associate (2).png

Leetcode Badges

  • icons8-link-48
  • icons8-link-48
bottom of page