top of page
IMG_5117_edited.png

Ravi Shankar Poosa

Data Analyst | Data Engineer | AWS certified Solutions Architect Associate.

📍 United States.

  • White LinkedIn Icon
  • GitHub

Why should you hire me?

 I am an AWS Certified Solutions Architect Associate with strong technical expertise and hands-on experience in developing ETL data pipelines, data transformation processes, and creating dashboards. I am skilled in Python (Pandas, Numpy), SQL, Apache Spark (PySpark, SparkSQL), AWS Lambda, AWS Glue, AWS EMR, Snowflake, and various AWS services. I am also proficient in orchestrating workflows using Apache Airflow and Docker. My experience in creating interactive Power BI and Tableau dashboards from various data warehousing services ensures that I can effectively drive data-driven decision-making to help answer your business questions.

Frame 1 (46).png
EXPERIENCE

December 2020 – December 2022

Data Analyst

Capgemini

  • Analyzed claims data from Snowflake and PostgreSQL for a leading U.S. insurance provider, identifying trends and insights that improved claims management by 15% and supported better decision-making.

  • Created Tableau dashboards to visualize key metrics like claims frequency, severity, and loss ratios, helping stakeholders make informed decisions.

  • Generated ad-hoc reports comparing current monthly data with past trends, leading to a 10% improvement in performance by uncovering deviations and areas for improvement.

  • Worked closely with business stakeholders and data teams to ensure reports and dashboards were accurate and met their requirements.

  • Conducted data validation and audits in Snowflake and PostgreSQL using SQL, ensuring data accuracy and consistency for analysis and reporting.

  • Contributed to the Data Engineering team in building ETL pipelines for insurance claims data using Apache Spark within AWS Glue and AWS EMR, improving data processing efficiency by 30%.

  • Developed PySpark and SparkSQL scripts for data standardization, null value handling, and data masking, reducing processing errors by 25% and enhancing data quality.

  • Used AWS EMR to run PySpark scripts that transformed claims data into Parquet format for loading into Snowflake, reducing data load times by 40% and improving query performance.

WORK EXPERIENCE

Frame 1 (46).png
Airbnb_Listings_Animation.gif

 Key Insights captured from this Dashboard.

  • Total listings: Analyzing the scale of Airbnb activity in Los Angeles.

    • This step aims to understand the overall size of the Airbnb market by looking at the total number of active listings. It helps understand the rental activity across different cities in the county.

  • Total hosts: Examining the number of active hosts contributing to the market.

    • By analyzing the number of hosts, we can explore the host ecosystem in Los Angeles, including trends in individual hosts versus larger property managers.

  • Percentage of instant bookable listings: Understanding the proportion of listings that offer instant booking.

    • This metric evaluates how many listings are immediately bookable without requiring host approval. The analysis focuses on guest convenience and host preferences.

  • Percentage of super hosts: Identifying the distribution of experienced and high-performing hosts.

    • This analysis highlights how many hosts qualify as super hosts, providing insights into the quality of hosting services across different cities.

  • Listings and hosts by year: Observing trends in Airbnb growth and changes over time.

    • This step tracks historical data to identify periods of growth or decline in the number of listings and hosts. It provides an understanding of market dynamics over time.

  • Average prices and listings by cities: Mapping pricing trends and listing density across different areas.

    • This analysis uses geographical data to show how listing prices and density vary by city. It helps pinpoint areas with high demand and premium pricing.

  • Top hosts by listings and average ratings: Highlighting hosts with the most listings and highest ratings.

    • This analysis identifies the top-performing hosts in terms of listing volume and guest satisfaction. It provides insights into successful hosting practices.

Huberman Lab Podcast Analytics: Powered by GPT-4o.

  • docs
  • GitHub
Huberman_Lab_Podcast_Analytics_1.gif
  • Led the development of an analytics platform for Andrew Huberman's podcast videos, enabling personalized video recommendations and real-time insights.

  • Scraped video metadata, channel details, and viewer comments from YouTube using Python and YouTube Data API.

  • Processed and transformed data using PySpark on Amazon EMR.

  • Performed sentiment and emotion analysis on viewer comments with OpenAI's GPT-4o Mini API, identifying specific emotions and sentiments for deeper audience insights.

  • Utilized AWS S3 as a data lake for raw and processed data.

  • Loaded transformed data into Snowflake for efficient analytics and reporting.

  • Built dynamic visualizations in Power BI, showcasing metrics such as view counts, audience sentiment, and engagement trends.

  • Developed a React-based web application to embed the Power BI dashboard, enabling interactive data exploration.

  • Integrated a Flask-based chatbot powered by GPT-4 APIs to assist users in discovering relevant podcast videos and gaining insights.

  • Technologies Used: Python, PySpark, AWS S3, Snowflake, React, Flask, GPT-4o Mini API, Power BI, Amazon EMR.

huberman_podcast_analytics_dashboard_Architecture_Diagram.jpg
Projects

FDA Drug Adverse Events ETL Project

  • GitHub
  • icons8-tableau-software-144
Architecture_diagram.png
2020_Animation.gif
  • Led the end-to-end development of an ETL pipeline to extract, transform, and load FDA drug adverse events using Open source FDA API with modern data engineering tools and cloud technologies, and visualize transformed data using Tableau Desktop.

  • Designed and implemented Python scripts leveraging Pandas for data extraction and transformation from the FDA's open-source API into AWS S3.

  • Orchestrated the ETL workflow using Apache Airflow, ensuring robust data pipeline automation and scheduling.

  • Utilized Docker for containerizing the Airflow environment, enhancing scalability and deployment efficiency.

  • Implemented data modeling and storage solutions in Snowflake to load transformed data with optimized data accessibility and query performance.

  • Developed interactive dashboards by connection tableau to snowflake.

  • Technologies used: Python, Pandas, AWS S3, Snowflake, Apache Airflow, Docker Desktop, Tableau Desktop.

​​

Some of the Key Insights captured from this Dashboard.

  • Distribution of hybrid and battery vehicles across demographics in Washington state.

  • Heat map visualization of top EV utility providers.

  • Most popular vehicle models by manufacturer.

  • Leading EV manufacturers by sales volume.

  • Year-over-year vehicle adoption trends by brand.

  • Vehicle type adoption trends over time.

  • Geospatial mapping showing EV adoption density across cities and counties in Washington State

Real-Time Insurance Claims Data ETL Pipeline

  • GitHub
  • icons8-tableau-software-144
architecture_diagram_edited.jpg
Auto _Animation.gif
  • Led the development of a real-time ETL pipeline to process insurance data sourced from Kaggle, leveraging Apache Airflow for orchestration, Snowflake for data modeling, storage, change data capture, and tableau for data visualization.
  • Designed and implemented Python scripts using Pandas for data extraction, cleaning, normalization, and transformation from the Kaggle API into AWS S3.

  • Orchestrated the ETL workflow using Apache Airflow hosted on an EC2 instance for initial data extracting and preprocessing, ensuring scalability and automation.

  • Using AWS S3 as a data lake, implemented advanced data ingestion techniques in Snowflake, such as storage integration object, stages and snow pipe to ingest data into staging tables.

  • Implemented change data capture using Snowflake streams and tasks and load transformed data into analytics tables.

  • Created Tableau live connection to snowflake tables for real time dashboards.

  • Technologies used: Python, Pandas, AWS S3, Snowflake, Apache Airflow, EC2, Tableau Desktop, VS code​

Some of the Key Insights captured from this Dashboard.

  • Distribution of claims by transmission type (manual vs. automatic).

  • Analysis of insurance claims by age group.

  • Relationship between insurance subscription length and claim frequency.

  • Claims distribution across different vehicle fuel types.

  • Correlation between vehicle features and insurance claims.

YouTube Data Analysis ETL Using AWS Cloud Services

  • GitHub
  • icons8-tableau-software-144
youtube_data_analysis_architecture_edite
Youtube_Data_metrics_Animation.gif
  • Developed an ETL pipeline to streamline analysis of structured and semi-structured YouTube data using AWS Cloud Services and generating data insights using Tableau.
  • Designed and implemented ETL processes for data in AWS S3 by utilizing AWS Glue for data cataloging and transformation csv data, and AWS Lambda for handling json data to transform into parquet format.
  • Implemented monitoring and scalability using AWS CloudWatch.
  • Loaded processed parquet data into AWS Redshift for analytics and to visualize insights from YouTube video metrics across different regions using Tableau Desktop. 
  • Technologies used: AWS S3, AWS Glue, AWS Lambda, AWS Athena, AWS Redshift, Tableau Desktop, Apache Spark, AWS CloudWatch, Python, Pandas, SQL.

 

Some of the Key Insights captured from this Dashboard.

  • Comparison of video metrics across regions: Canada, Great Britain, and United States.

  • Identification of regions contributing the most to views, likes, dislikes, and comments.

  • Video categories with the highest view counts across regions.

  • Most liked and disliked video categories across different regions.

 

  • GitHub
  • icons8-tableau-software-144

Washington State Electric Vehicles ETL Data Pipeline

ELECTRIC_VEHICLES_ETL.drawio.png
Washingtonstate_Animation.gif
  • ​Developed an ETL pipeline to process and analyze electric vehicle population data for Washington State using Apache Airflow, AWS Cloud Services, Snowflake, and Tableau.
  • Designed and implemented ETL processes to extract raw CSV data from AWS S3, utilizing Apache Spark on AWS EMR for data cleaning, standardization, and conversion into fact and dimension tables in Parquet format.
  •  Orchestrated the ETL workflow using Apache Airflow, and deployed on an AWS EC2 instance using Docker for containerization and scalability.
  •  Loaded the processed Parquet data into Snowflake for efficient querying and analytics.
  • Created a Tableau dashboard to visualize insights from the electric vehicle data, highlighting trends in adoption.
  • Technologies used: Pyspark, SparkSQL, AWS EMR,  AWS S3, Airflow, AWS Ec2, Docker, Snowflake, Tableau, SQL.

​​

 

Some of the Key Insights captured from this Dashboard.

  • Distribution of hybrid and battery vehicles across demographics in Washington state.

  • Heat map visualization of top EV utility providers.

  • Most popular vehicle models by manufacturer.

  • Leading EV manufacturers by sales volume.

  • Year-over-year vehicle adoption trends by brand.

  • Vehicle type adoption trends over time.

  • Geospatial mapping showing EV adoption density across cities and counties in Washington State

 

PROJECTS

  • GitHub
  • icons8-power-bi-2021-48

Los Angeles Airbnb Analytics

CONTACT
Frame 1 (46).png
Frame 1 (45)_edited.png

TABLEAU DASHBOARDS

EDUCATION
EDUCATIONS
Frame 1 (46).png

EDUCATION

Jan  2023 - Present 

University of Central Missouri, Warrensburg MO.

Master’s  of  Science in Computer Science

Jan 2017 - August 2021. 

Jawaharlal Nehru Technological University - Hyderabad, Telangana, India.

Bachelor of Technology in Computer Science Engineering

Frame 1 (46).png
SKILLS

Tools & Technologies

Frame 2 (8).png
Frame 1 (53)_edited.png
Frame 1 (46).png
Frame 1 (46).png
aws-certified-solutions-architect-associate (2).png

Leetcode Badges

  • icons8-link-48
  • icons8-link-48

CERTIFICATIONS & BADGES

bottom of page