Weekly | Top 10 GitHub Repos | Week 48 - 2024
Noteworthy data-ops & analytics repos that first shipped less than a year ago.
#10. amphi-ai/amphi-etl
Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.
Repo topic tags: data, data-pipelines, etl, rag-pipeline, structured-data, unstructured-data
Its up 12 new stars this week and ranked at #1477 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-03-20. Its license was listed as: Other. Its primary language is TypeScript.
#9. bjesus/pipet
a swiss-army tool for scraping and extracting data from online assets, made for hackers
Repo topic tags: css, curl, gjson, json, playwright, scraper, scraping
Its up 574 new stars this month and was created 94 days ago. It ranked at #48 by new stars relative to its age in days.
This repo was first pushed to Github on 2024-08-31. Its license was listed as: MIT License. Its primary language is Go.
#8. BemiHQ/BemiDB
Postgres read replica optimized for analytics
Repo topic tags: analytics, data-lakehouse, data-warehouse, duckdb, iceberg, olap, parquet, postgresql, replication, zero-etl, data-movement
Its up 997 new stars this month and ranked at #34 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-11-06. Its license was listed as: GNU Affero General Public License v3.0. Its primary language is Go.
#7. yolain/ComfyUI-Easy-Use
In order to make it easier to use the ComfyUI, I have made some optimizations and integrations to some commonly used nodes.
Its up 29 new stars this week and ranked at #580 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2023-12-10. Its license was listed as: GNU General Public License v3.0. Its primary language is Python.
#6. hesamsheikh/ml-goldmine
Machine Learning Journal for Intermediate to Advanced Topics.
Repo topic tags: data-science, documentation, large-language-models, learning-resources, llm, machine-learning, machine-learning-algorithms, study-notes
Its up 412 new stars this month and ranked at #73 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-09-18. Its primary language is Jupyter Notebook.
#5. microsoft/RD-Agent
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automate these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which let AI drive data-driven AI.
Repo topic tags: agent, ai, automation, data-mining, data-science, development, llm, research
Its up 205 new stars this month and was created 116 days ago. It ranked at #210 by new stars relative to its age in days.
This repo was first pushed to Github on 2024-08-09. Its license was listed as: MIT License. Its primary language is Python.
#4. skfolio/skfolio
Python library for portfolio optimization built on top of scikit-learn
Repo topic tags: asset-allocation, asset-management, convex-optimization, cvar-optimization, cvxpy, efficient-frontier, hierarchical-clustering, machine-learning, portfolio, portfolio-optimization, quantitative-finance, quantitative-investment, risk-parity, scikit-learn, trading-strategies
Its up 64 new stars this month and ranked at #1165 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2023-12-14. Its license was listed as: BSD 3-Clause "New" or "Revised" License. Its primary language is Python.
#3. postgresml/korvus
Korvus is a search SDK that unifies the entire RAG pipeline in a single database query. Built on top of Postgres with bindings for Python, JavaScript, Rust and C.
Repo topic tags: ai, embeddings, javascript, llm, ml, python, rag, search, sql
Its up 25 new stars this month and was created 146 days ago. It ranked at #1878 by new stars relative to its age in days.
This repo was first pushed to Github on 2024-07-10. Its license was listed as: MIT License. Its primary language is Rust.
#2. Multiwoven/multiwoven
🔥 The open-source reverse ETL, data activation platform for modern data teams.
Repo topic tags: data-analysis, data-engineering, data-ingestion, reverse-etl, data-pipeline, data-activation, etl, react, ruby, self-hosted, open-source, dbt
Its up 35 new stars this month and ranked at #1888 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-02-06. Its license was listed as: GNU Affero General Public License v3.0. Its primary language is Mixed/Unspecified.
#1. quarylabs/quary
Transform data together. Model, test and deploy as a team.
Repo topic tags: analytics, business-intelligence, data-modeling, elt
Its up 7 new stars this week and ranked at #2238 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-02-20. Its license was listed as: MIT License. Its primary language is Rust.