Daniel's Avatar

Daniel

@indent4.bsky.social

πŸ€– ML Engineer β†’ Developing Apps & Sharing ML/AI Insights πŸš€ | Always Learning #MachineLearning #DataScience #AI

7 Followers  |  18 Following  |  25 Posts  |  Joined: 31.01.2025  |  1.7427

Latest posts by indent4.bsky.social on Bluesky

Follow me for insights on real-world MLπŸš€

If this thread helped, drop a like πŸ“· or repost πŸ“·

21.02.2025 17:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Why use the CPO Pattern?

βœ… Scalability – Add new models & components easily.
βœ… Maintainability – Elements are decoupled making it easy to change, test and debug individual parts without breaking the entire system.
βœ… Reusability – Reuse components across different ML projects.

21.02.2025 17:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

3️⃣ Orchestrators πŸ€–
The orchestrator is responsible for managing the pipeline execution.
This allows us to easily switch components (e.g., swap XGBoost for RandomForest).

21.02.2025 17:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

2️⃣ Pipelines πŸ›€οΈ
A TrainingPipeline ties together components into a single workflow.
Pipelines ensure a clear flow of data, from raw input to final evaluation.

21.02.2025 17:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

1️⃣ Components πŸ“¦
Each key function (data import, preprocessing, training, evaluation) is wrapped in its own class.
This makes it reusable, testable, and modular.

Example: A DataImporter class to load data!

21.02.2025 17:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€ Want to write production-grade ML code that’s clean, modular & maintainable?

Here’s a simple CPO (components, pipelines, orchestrators) pattern example to structure your code.

Let's break this down with a training pipeline exampleπŸ‘‡
#MachineLearning #DataScience #MLOps

21.02.2025 17:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

DataFramed, Super Data Science, The Data Scientist Show, Ken's Nearest Neighbours

19.02.2025 15:34 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Follow me for insights on real-world MLπŸš€

If this thread helped, drop a like πŸ“· or repost πŸ“·πŸ”

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

EDA isn’t just a formalityβ€”it’s the difference between a strong ML model and garbage results. πŸš€

βœ… Understand your data
βœ… Spot issues before modeling
βœ… Find strong predictors

These steps serve as a starting point. Analyse further depending on the needs of your project.

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ’‘ Why?

πŸ‘‰Detect multicollinearity (too many correlated features = model confusion)
πŸ‘‰Uncover hidden patterns
πŸ‘‰Reduce dimensionality if needed (PCA can help)

πŸ“Š Example: If two features are 99% correlated, you probably don’t need both in your model.

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

4️⃣ Multivariate Analysis πŸ”„ (Relationships Between Features)

Time to go deeper and check how features interact with each other.

πŸ”₯ Correlation heatmaps (sns.heatmap(df.corr(), annot=True))
πŸ”₯ Pairplots (sns.pairplot(df))
πŸ”₯ PCA (if needed) (PCA().fit_transform(df))

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ’‘ Why?

πŸ‘‰ Find strong predictors πŸ“ˆ
πŸ‘‰ Identify non-linear relationships (Choose a non-linear model)
πŸ‘‰ Detect leakage (some features might be too correlated with the target!)
πŸ“Š Example: A high correlation might mean a strong predictor… or data leakage. Always check!

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

3️⃣ Bivariate Analysis πŸ“Š

See how features relate to the target variable

πŸ“ˆ Numerical:
Correlation heatmaps (df.corr())
Scatter plots

πŸ“Š Categorical:
Box plots (sns.boxplot(x='category', y='target', data=df))
Grouped means (df.groupby('category')['target'].mean())

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ’‘ Why?

πŸ‘‰ Detect skewness (maybe you need log transformations)
πŸ‘‰Spot outliers
πŸ‘‰ Identify imbalanced categories

πŸ“Š Example: If a feature is heavily skewed, your linear models might struggleβ€”fix it early!

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

2️⃣ Univariate Analysis πŸ”

Let’s get to know each feature individually.

πŸ” What to check?

πŸ“Š Numerical: Histograms, box plots (df['feature'].hist())

πŸ“Š Categorical: Value counts, bar plots (df['feature'].value_counts())

πŸ“Š Outliers: Box plots (sns.boxplot(x=df['feature']))

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ’‘ Why? This helps spot early issues (wrong dtypes, missing data, duplicates, incorrect values) before they ruin your model.

πŸ“Š Example: Checking for missing values can reveal if you need imputation or feature removal.

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1️⃣ Understanding Data Structure & Metadata πŸ—οΈ

Before diving in, know your data.

πŸ” What to check?
βœ… Dataset shape (df.shape)
βœ… Data types (df.dtypes)
βœ… Descriptive statistics (df.describe())
βœ… Missing values (df.isnull().sum())
βœ… Unique values & duplicates

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€ Many newbies jump straight to feature eng and training ML models... then find models underperform.

Why? They skip Exploratory Data Analysis (EDA).
EDA is an important step for performance and explainability.

Here’s how to startπŸ‘‡
#DataAnalytics #100DaysOfML #MachineLearning

19.02.2025 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Should you separate your SQL servers into different resource groups? This depends on your specific use case. if you have a small number of SQL databases and they are tightly integrated with your app then keeping them in the same resource group as your app might be the way to go.

10.02.2025 00:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It's recommended to keep Data Lakes and Blob Storage in a different resource group than your application or project resource group. This limits attack potential security issues and eases resource & cost management.

10.02.2025 00:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It's a good idea to keep data associated with different projects in dedicated Blob storage containers.

10.02.2025 00:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Create dedicated resource groups for each large project or application

10.02.2025 00:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

On cloud platforms there are a number of different ways to store your data. Given that it can often be confusing to know what to know how best to organise your data.

Here's a couple of tips for working with blob storage, data lakes and sql servers:

#azure #dataengineering

10.02.2025 00:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Personally, I’m not a fan of building things with certain frameworks or patterns just because company x or developer y does it that way. Prefer to use as ideas to try, see the pros and cons and then use where it makes sense rather than following blindly.

10.02.2025 00:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The more I watch or read about successful people, the more I see that:

- They build things they want to use
- They are smart contrarians
- They don’t assume something will or won’t work, they experiment and adjust.

Also helps if you have rich people in your network😬

31.01.2025 15:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@indent4 is following 18 prominent accounts