Data Engineer with an interest in machine learning. Experienced in artificial neural network modeling, web scraping, ETL processes, and automation. Knowledgeable in AWS (S3, EC2, IAM) with a B2 level of English proficiency. During my experience, I developed tools for data extraction and analysis, including web scraping applications to identify potential clients, artificial neural network models for image classification, and datasets created through web scraping.
Final Grade: 9.23/10
Final Grade: 9/10
Purpose: Reduce the time and complexity involved in plankton classification for university students, who previously took up to 5 hours to classify a single image. A convolutional neural network model was trained using open data to classify the students' images.
Results:
Tools and Technologies: TensorFlow, scikit-learn, Google Colab Pro, Selenium, Pandas, Hugging Face (for storage and API), Anvil (for the web application).
Role in Project: Responsible for data collection and cleaning, architecture selection, modeling three proposals (single model, sequential, and hierarchical), and creating the API for predictions.
Challenges and Solutions:
Period: June 2024 - November 2024 (6 months).
See more on GitHubPurpose: Reduce the time and workload of the marketing department by automating the tedious process of manually collecting information about potential hotel clients.
Results:
Tools and Technologies: Python (libraries: requests, BeautifulSoup, pandas, tabulate), openxlsx for exporting data to Excel and CSV.
Role in Project: Designed and implemented the entire solution, including UI design, data extraction and cleaning, and result storage. Also acted as a liaison to gather requirements from the marketing manager.
Challenges and Solutions:
Period: July 2023 - August 2023 (1.5 months).
See more on GitHub
Purpose: This project is part of the Plankton Classifier and was developed to complement the original WHOI dataset (2006-2014) with additional images from IFCB Dashboard and PlanktonNet. The goal was to enrich classifications and image diversity, making them closer to a laboratory context.
Results:
Tools and Technologies: Selenium from Google Colab for image extraction, and pandas for storing, organizing, and concatenating results, grouping images into their respective classes.
Role in Project: Responsible for data extraction, cleaning, and dataset creation.
Challenges and Solutions:
Period: August 2024 (3 weeks).
See more on GitHub