Monthly | Top 10 GitHub Repos | December 2024
Noteworthy data-ops & analytics repos that first shipped between one and three years ago.
#10. DAGWorks-Inc/hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Repo topic tags: data-science, python, dag, data-engineering, dataframe, etl, etl-framework, etl-pipeline, feature-engineering, featurization, machine-learning, numpy, pandas, software-engineering, data-analysis, lineage, llmops, mlops, orchestration, hacktoberfest
Its up 54 new stars this month and ranked at #1360 out of all github repos that first shipped between one and three years ago.
This repo was first pushed to Github on 2023-02-23. Its license was listed as: BSD 3-Clause Clear License. Its primary language is Jupyter Notebook.
#9. NannyML/nannyml
nannyml: post-deployment data science in python
Repo topic tags: machine-learning, ml, mlops, performance-monitoring, data-science, monitoring, python, data-drift, model-monitoring, data-analysis, visualization, deep-learning, jupyter-notebook, machinelearning, postdeploymentdatascience, performance-estimation
Its up 38 new stars this month and ranked at #1148 out of all github repos that first shipped between one and three years ago.
This repo was first pushed to Github on 2022-04-08. Its license was listed as: Apache License 2.0. Its primary language is Python.
#8. edtechre/pybroker
Algorithmic Trading in Python with Machine Learning
Repo topic tags: algotrading, backtesting, machine-learning, python, trading, quantitative-finance, stocks, framework, investment, ai, artificial-intelligence, algorithmic-trading, data-science, finance, trading-strategies, crypto, cryptocurrency
Its up 3 new stars on 2024-11-29 and ranked at #1273 out of all github repos that first shipped between one and three years ago.
This repo was first pushed to Github on 2023-01-17. Its license was listed as: Other. Its primary language is Python.
#7. chdb-io/chdb
chDB is an embedded OLAP SQL Engine 🚀 powered by ClickHouse
Repo topic tags: data-science, database, embedded-database, olap, python, sql, chdb, clickhouse, clickhouse-database, clickhouse-server
Its up 71 new stars this month and ranked at #1059 out of all github repos that first shipped between one and three years ago.
This repo was first pushed to Github on 2023-03-18. Its license was listed as: Apache License 2.0. Its primary language is C++.
#6. mybatis-flex/mybatis-flex
mybatis-flex is an elegant Mybatis Enhancement Framework
Repo topic tags: java, mybatis, mysql, orm, sql
Its up 60 new stars this week and ranked at #288 out of all github repos that first shipped between one and three years ago.
This repo was first pushed to Github on 2023-02-27. Its license was listed as: Apache License 2.0. Its primary language is Java.
#5. ddz16/TSFpaper
This repository contains a reading list of papers on Time Series Forecasting/Prediction (TSF) and Spatio-Temporal Forecasting/Prediction (STF). These papers are mainly categorized according to the type of model.
Repo topic tags: deep-learning, time-series, time-series-analysis, time-series-forecasting, time-series-prediction, deep-neural-networks, paper-lists, rnn, tcn, time-series-models, transformer, spatial-temporal-forecasting, spatio-temporal, spatio-temporal-data, spatio-temporal-prediction
Its up 101 new stars this month and ranked at #421 out of all github repos that first shipped between one and three years ago.
This repo was first pushed to Github on 2022-06-29. Its primary language is Mixed/Unspecified.
#4. ballerine-io/ballerine
Open-source infrastructure and data orchestration platform for risk decisioning
Repo topic tags: back-office, case-management, compliance, dashboard, flow, fraud, id-card-camera, identity-verification, idv, know-your-customer, kyc, liveliness, ocr, onboarding, orchestration, risk-management, rule-engine, sdk, svelte, kyb
Its up 1 new star on 2024-11-29 and was created 791 days ago. It ranked at #1042 by new stars relative to its age in days.
This repo was first pushed to Github on 2022-10-04. Its license was listed as: Other. Its primary language is TypeScript.
#3. instill-ai/instill-core
🔮 Instill Core is an open-source no-/low-code data, AI, and pipelines orchestration platform
Repo topic tags: unstructured-data, low-code, developer-tools, etl, no-code, open-source, hacktoberfest, ai, api, cli, generative-ai, golang, gpt, llm, pipeline, python, stable-diffusion, typescript
Its up 51 new stars this month and ranked at #872 out of all github repos that first shipped between one and three years ago.
This repo was first pushed to Github on 2022-01-13. Its license was listed as: Other. Its primary language is Makefile.
#2. HarderThenHarder/transformers_tasks
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
Repo topic tags: nlp, text-classification, text-matching, information-extraction, reinforcement-learning, transformers, text-generation
Its up 1 new star on 2024-11-29 and was created 751 days ago. It ranked at #1255 by new stars relative to its age in days.
This repo was first pushed to Github on 2022-11-13. Its primary language is Jupyter Notebook.
#1. alibaba/EasyNLP
EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
Repo topic tags: transformers, bert, nlp, pretrained-models, deep-learning, pytorch, fewshot-learning, knowledge-distillation, knowledge-pretraining, text-image-retrieval, text-to-image-synthesis, machine-learning, text-classification, transfer-learning
Its up 9 new stars this week and ranked at #1170 out of all github repos that first shipped between one and three years ago.
This repo was first pushed to Github on 2022-04-06. Its license was listed as: Apache License 2.0. Its primary language is Python.



