Ravi Shankar Poosa

Data Analyst | Data Engineer | AWS certified Solutions Architect Associate.

📍 United States.

Why should you hire me?

I am an AWS Certified Solutions Architect Associate with over 4+ years of professional experience and strong technical expertise in developing ETL data pipelines, data transformation processes, and creating dashboards. I am skilled in Python (Pandas, Numpy), SQL, Apache Spark (PySpark, SparkSQL), Databricks, Snowflake, AWS Lambda, AWS Glue, AWS EMR, and various other AWS services. I am also proficient in orchestrating workflows using Apache Airflow and Docker. My experience in creating interactive Power BI and Tableau dashboards from various data warehousing services ensures that I can effectively drive data-driven decision-making to help answer your business questions.

EXPERIENCE

WORK EXPERIENCE

Data Engineer

Analyzed claims data from Snowflake and PostgreSQL for a leading U.S. insurance provider, resulting in a 15% improvement in claims management and facilitating better decision-making.
Created Tableau dashboards to visualize key metrics like claims frequency, severity, and loss ratios to make informed decisions.
Generated ad-hoc reports comparing current monthly data with past trends, leading to a 10% improvement in performance by uncovering deviations and areas for improvement.
Worked closely with stakeholders and data teams to ensure reports and dashboards were accurate and met their requirements.
Conducted data validation and audits in Snowflake and PostgreSQL using SQL, ensuring data accuracy and consistency.
Contributed to the Data Engineering team in building ETL pipelines for insurance claims data using Apache Spark within AWS Glue and AWS EMR, improving data processing efficiency by 30%.
Developed PySpark and SparkSQL scripts for data standardization, null value handling, and data masking, reducing processing errors by 25% and enhancing data quality.
Used AWS EMR to run PySpark scripts that transformed claims data into Parquet format for loading into Snowflake, reducing data load times by 40% and improving query performance.

Molina Healthcare

Developed automated ETL pipelines using Python (Pandas) and SQL to process over 5 million rows of retail sales and customer transaction data weekly.
Extracted data from APIs, in-store systems, and website clickstreams, and stored it in PostgreSQL databases.
Cleaned and transformed raw data using Pandas, handling missing values, standardizing formats, and applying business logic for churn and discount calculations.
Wrote complex SQL queries to perform aggregations and multi-table joins, reducing manual reporting time by 60%.
Built interactive Power BI dashboards to visualize KPIs like top-performing products, inventory turnover, and customer retention.
Enabled an 18% reduction in stockouts and a 12% increase in campaign ROI by delivering timely data insights to marketing and operations teams.

Brio Technologies

Data Analyst

-Processed over 50 million rows of healthcare claims and EHR data using PySpark on Databricks to support clinical and operational analytics.
Developed scalable PySpark jobs in Databricks to cleanse, join, and transform raw data stored in AWS S3, writing optimized outputs in Parquet format.
Stored transformed data in Snowflake with HIPAA-compliant masking policies to ensure secure and governed analytics.
Orchestrated daily ETL workflows using Apache Airflow to ensure timely delivery of clean datasets for analysis.
Integrated CI/CD pipelines using GitHub Actions and Databricks CLI to automate notebook deployment and validation workflows.
Built Tableau dashboards to visualize trends in ER visits, readmissions, and preventive care compliance across regions.
Reduced ETL job failures by 40% by implementing structured logging and alerting within the Databricks and Airflow environments.
Enabled care teams to proactively intervene with high-risk patients, contributing to a 15% reduction in preventable readmissions and improved HEDIS scores.

Data Engineer

Capgemini

Key Insights captured from this Dashboard.

Total listings: Analyzing the scale of Airbnb activity in Los Angeles.
- This step aims to understand the overall size of the Airbnb market by looking at the total number of active listings. It helps understand the rental activity across different cities in the county.
Total hosts: Examining the number of active hosts contributing to the market.
- By analyzing the number of hosts, we can explore the host ecosystem in Los Angeles, including trends in individual hosts versus larger property managers.
Percentage of instant bookable listings: Understanding the proportion of listings that offer instant booking.
- This metric evaluates how many listings are immediately bookable without requiring host approval. The analysis focuses on guest convenience and host preferences.
Percentage of super hosts: Identifying the distribution of experienced and high-performing hosts.
- This analysis highlights how many hosts qualify as super hosts, providing insights into the quality of hosting services across different cities.
Listings and hosts by year: Observing trends in Airbnb growth and changes over time.
- This step tracks historical data to identify periods of growth or decline in the number of listings and hosts. It provides an understanding of market dynamics over time.
Average prices and listings by cities: Mapping pricing trends and listing density across different areas.
- This analysis uses geographical data to show how listing prices and density vary by city. It helps pinpoint areas with high demand and premium pricing.
Top hosts by listings and average ratings: Highlighting hosts with the most listings and highest ratings.
- This analysis identifies the top-performing hosts in terms of listing volume and guest satisfaction. It provides insights into successful hosting practices.

Huberman Lab Podcast Analytics: Powered by GPT-4o.

Try Now

Led the development of an analytics platform for Andrew Huberman's podcast videos, enabling personalized video recommendations and real-time insights.
Scraped video metadata, channel details, and viewer comments from YouTube using Python and YouTube Data API.
Processed and transformed data using PySpark on Amazon EMR.
Performed sentiment and emotion analysis on viewer comments with OpenAI's GPT-4o Mini API, identifying specific emotions and sentiments for deeper audience insights.
Utilized AWS S3 as a data lake for raw and processed data.
Loaded transformed data into Snowflake for efficient analytics and reporting.
Built dynamic visualizations in Power BI, showcasing metrics such as view counts, audience sentiment, and engagement trends.
Developed a React-based web application to embed the Power BI dashboard, enabling interactive data exploration.
Integrated a Flask-based chatbot powered by GPT-4 APIs to assist users in discovering relevant podcast videos and gaining insights.
Technologies Used: Python, PySpark, AWS S3, Snowflake, React, Flask, GPT-4o Mini API, Power BI, Amazon EMR.

huberman_podcast_analytics_dashboard_Architecture_Diagram.jpg

Projects

FDA Drug Adverse Events ETL Project

Led the end-to-end development of an ETL pipeline to extract, transform, and load FDA drug adverse events using Open source FDA API with modern data engineering tools and cloud technologies, and visualize transformed data using Tableau Desktop.
Designed and implemented Python scripts leveraging Pandas for data extraction and transformation from the FDA's open-source API into AWS S3.
Orchestrated the ETL workflow using Apache Airflow, ensuring robust data pipeline automation and scheduling.
Utilized Docker for containerizing the Airflow environment, enhancing scalability and deployment efficiency.
Implemented data modeling and storage solutions in Snowflake to load transformed data with optimized data accessibility and query performance.
Developed interactive dashboards by connection tableau to snowflake.
Technologies used: Python, Pandas, AWS S3, Snowflake, Apache Airflow, Docker Desktop, Tableau Desktop.

Some of the Key Insights captured from this Dashboard.

Distribution of hybrid and battery vehicles across demographics in Washington state.
Heat map visualization of top EV utility providers.
Most popular vehicle models by manufacturer.
Leading EV manufacturers by sales volume.
Year-over-year vehicle adoption trends by brand.
Vehicle type adoption trends over time.
Geospatial mapping showing EV adoption density across cities and counties in Washington State

Real-Time Insurance Claims Data ETL Pipeline

Led the development of a real-time ETL pipeline to process insurance data sourced from Kaggle, leveraging Apache Airflow for orchestration, Snowflake for data modeling, storage, change data capture, and tableau for data visualization.
Designed and implemented Python scripts using Pandas for data extraction, cleaning, normalization, and transformation from the Kaggle API into AWS S3.
Orchestrated the ETL workflow using Apache Airflow hosted on an EC2 instance for initial data extracting and preprocessing, ensuring scalability and automation.
Using AWS S3 as a data lake, implemented advanced data ingestion techniques in Snowflake, such as storage integration object, stages and snow pipe to ingest data into staging tables.
Implemented change data capture using Snowflake streams and tasks and load transformed data into analytics tables.
Created Tableau live connection to snowflake tables for real time dashboards.
Technologies used: Python, Pandas, AWS S3, Snowflake, Apache Airflow, EC2, Tableau Desktop, VS code

Some of the Key Insights captured from this Dashboard.

Distribution of claims by transmission type (manual vs. automatic).
Analysis of insurance claims by age group.
Relationship between insurance subscription length and claim frequency.
Claims distribution across different vehicle fuel types.
Correlation between vehicle features and insurance claims.

YouTube Data Analysis ETL Using AWS Cloud Services

youtube_data_analysis_architecture_edite

Developed an ETL pipeline to streamline analysis of structured and semi-structured YouTube data using AWS Cloud Services and generating data insights using Tableau.
Designed and implemented ETL processes for data in AWS S3 by utilizing AWS Glue for data cataloging and transformation csv data, and AWS Lambda for handling json data to transform into parquet format.
Implemented monitoring and scalability using AWS CloudWatch.
Loaded processed parquet data into AWS Redshift for analytics and to visualize insights from YouTube video metrics across different regions using Tableau Desktop.
Technologies used: AWS S3, AWS Glue, AWS Lambda, AWS Athena, AWS Redshift, Tableau Desktop, Apache Spark, AWS CloudWatch, Python, Pandas, SQL.

Some of the Key Insights captured from this Dashboard.

Comparison of video metrics across regions: Canada, Great Britain, and United States.
Identification of regions contributing the most to views, likes, dislikes, and comments.
Video categories with the highest view counts across regions.
Most liked and disliked video categories across different regions.

Washington State Electric Vehicles ETL Data Pipeline

Developed an ETL pipeline to process and analyze electric vehicle population data for Washington State using Apache Airflow, AWS Cloud Services, Snowflake, and Tableau.
Designed and implemented ETL processes to extract raw CSV data from AWS S3, utilizing Apache Spark on AWS EMR for data cleaning, standardization, and conversion into fact and dimension tables in Parquet format.
Orchestrated the ETL workflow using Apache Airflow, and deployed on an AWS EC2 instance using Docker for containerization and scalability.
Loaded the processed Parquet data into Snowflake for efficient querying and analytics.
Created a Tableau dashboard to visualize insights from the electric vehicle data, highlighting trends in adoption.
Technologies used: Pyspark, SparkSQL, AWS EMR, AWS S3, Airflow, AWS Ec2, Docker, Snowflake, Tableau, SQL.

Some of the Key Insights captured from this Dashboard.

Distribution of hybrid and battery vehicles across demographics in Washington state.
Heat map visualization of top EV utility providers.
Most popular vehicle models by manufacturer.
Leading EV manufacturers by sales volume.
Year-over-year vehicle adoption trends by brand.
Vehicle type adoption trends over time.
Geospatial mapping showing EV adoption density across cities and counties in Washington State

PROJECTS

Los Angeles Airbnb Analytics

CONTACT

TABLEAU DASHBOARDS

EDUCATION

EDUCATIONS

EDUCATION

University of Central Missouri, Warrensburg MO.

Master’s of Science in Computer Science

Jawaharlal Nehru Technological University - Hyderabad, Telangana, India.

Bachelor of Technology in Computer Science Engineering

SKILLS

Tools & Technologies

aws-certified-solutions-architect-associate (2).png

Ravi Shankar Poosa

Why should you hire me?

WORK EXPERIENCE

Huberman Lab Podcast Analytics: Powered by GPT-4o.

FDA Drug Adverse Events ETL Project

Real-Time Insurance Claims Data ETL Pipeline

Led the development of a real-time ETL pipeline to process insurance data sourced from Kaggle, leveraging Apache Airflow for orchestration, Snowflake for data modeling, storage, change data capture, and tableau for data visualization.

YouTube Data Analysis ETL Using AWS Cloud Services

Developed an ETL pipeline to streamline analysis of structured and semi-structured YouTube data using AWS Cloud Services and generating data insights using Tableau.

Designed and implemented ETL processes for data in AWS S3 by utilizing AWS Glue for data cataloging and transformation csv data, and AWS Lambda for handling json data to transform into parquet format.

Implemented monitoring and scalability using AWS CloudWatch.

Loaded processed parquet data into AWS Redshift for analytics and to visualize insights from YouTube video metrics across different regions using Tableau Desktop.

Technologies used: AWS S3, AWS Glue, AWS Lambda, AWS Athena, AWS Redshift, Tableau Desktop, Apache Spark, AWS CloudWatch, Python, Pandas, SQL.

Washington State Electric Vehicles ETL Data Pipeline

Developed an ETL pipeline to process and analyze electric vehicle population data for Washington State using Apache Airflow, AWS Cloud Services, Snowflake, and Tableau.

Designed and implemented ETL processes to extract raw CSV data from AWS S3, utilizing Apache Spark on AWS EMR for data cleaning, standardization, and conversion into fact and dimension tables in Parquet format.

Orchestrated the ETL workflow using Apache Airflow, and deployed on an AWS EC2 instance using Docker for containerization and scalability.

Loaded the processed Parquet data into Snowflake for efficient querying and analytics.

Created a Tableau dashboard to visualize insights from the electric vehicle data, highlighting trends in adoption.

Technologies used: Pyspark, SparkSQL, AWS EMR, AWS S3, Airflow, AWS Ec2, Docker, Snowflake, Tableau, SQL.

PROJECTS

Los Angeles Airbnb Analytics

TABLEAU DASHBOARDS

EDUCATION

Tools & Technologies

Leetcode Badges

CERTIFICATIONS & BADGES

Ravi Shankar Poosa

Why should you hire me?

WORK EXPERIENCE

Huberman Lab Podcast Analytics: Powered by GPT-4o.

FDA Drug Adverse Events ETL Project

Real-Time Insurance Claims Data ETL Pipeline

Led the development of a real-time ETL pipeline to process insurance data sourced from Kaggle, leveraging Apache Airflow for orchestration, Snowflake for data modeling, storage, change data capture, and tableau for data visualization.

YouTube Data Analysis ETL Using AWS Cloud Services

Developed an ETL pipeline to streamline analysis of structured and semi-structured YouTube data using AWS Cloud Services and generating data insights using Tableau.

Designed and implemented ETL processes for data in AWS S3 by utilizing AWS Glue for data cataloging and transformation csv data, and AWS Lambda for handling json data to transform into parquet format.

Implemented monitoring and scalability using AWS CloudWatch.

Loaded processed parquet data into AWS Redshift for analytics and to visualize insights from YouTube video metrics across different regions using Tableau Desktop.

Technologies used: AWS S3, AWS Glue, AWS Lambda, AWS Athena, AWS Redshift, Tableau Desktop, Apache Spark, AWS CloudWatch, Python, Pandas, SQL.

Washington State Electric Vehicles ETL Data Pipeline

​Developed an ETL pipeline to process and analyze electric vehicle population data for Washington State using Apache Airflow, AWS Cloud Services, Snowflake, and Tableau.

Designed and implemented ETL processes to extract raw CSV data from AWS S3, utilizing Apache Spark on AWS EMR for data cleaning, standardization, and conversion into fact and dimension tables in Parquet format.

Orchestrated the ETL workflow using Apache Airflow, and deployed on an AWS EC2 instance using Docker for containerization and scalability.

Loaded the processed Parquet data into Snowflake for efficient querying and analytics.

Created a Tableau dashboard to visualize insights from the electric vehicle data, highlighting trends in adoption.

Technologies used: Pyspark, SparkSQL, AWS EMR, AWS S3, Airflow, AWS Ec2, Docker, Snowflake, Tableau, SQL.

PROJECTS

Los Angeles Airbnb Analytics

TABLEAU DASHBOARDS

EDUCATION

Tools & Technologies

Leetcode Badges

CERTIFICATIONS & BADGES

Developed an ETL pipeline to process and analyze electric vehicle population data for Washington State using Apache Airflow, AWS Cloud Services, Snowflake, and Tableau.