Weekly | Top 9 GitHub Repos | Week 47 - 2024
Noteworthy data-ops & analytics repos that first shipped less than a year ago.
#9. ucbepic/docetl
A system for complex LLM-powered document processing
Repo topic tags: data, etl, llm, python, data-pipelines, elt, workflow
Its up 376 new stars this month and ranked at #92 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-09-17. Its license was listed as: MIT License. Its primary language is Python.
#8. xuchengsheng/wx-dump-4j
wx-dump-4j是一款基于Java开发的微信数据分析工具。它不仅可以准确显示您的好友数、群聊数和当日的消息总量,还提供了过去15天内的每日消息统计功能,让您清晰了解自己的社交活跃度。此外它还能够识别并展示最近一个月内与您互动最频繁的前10位联系人。它还支持导出微信的聊天记录、联系人和群聊信息,甚至可以查看超过三天限制的朋友圈历史记录。
Repo topic tags: java, jna, spring, springboot, wechat
Its up 290 new stars this month and ranked at #261 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-01-25. Its license was listed as: MIT License. Its primary language is Java.
#7. CatchTheTornado/pdf-extract-api
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
Repo topic tags: api, extract, json, llm, pdf, anonymization, ocr, ocr-python, pii
Its up 1226 new stars this month and was created 33 days ago. It ranked at #16 by new stars relative to its age in days.
This repo was first pushed to Github on 2024-10-23. Its license was listed as: GNU General Public License v3.0. Its primary language is Python.
#6. codelion/optillm
Optimizing inference proxy for LLMs
Repo topic tags: agent, agentic-ai, agentic-workflow, agents, api-gateway, genai, large-language-models, llm, llm-inference, llmapi, mixture-of-experts, moa, openai, openai-api, optimization, proxy-server, agentic-framework
Its up 8 new stars on 2024-11-22 and was created 95 days ago. It ranked at #86 by new stars relative to its age in days.
This repo was first pushed to Github on 2024-08-22. Its license was listed as: Apache License 2.0. Its primary language is Python.
#5. PacktPublishing/LLM-Engineers-Handbook
The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices
Repo topic tags: genai, llm, llmops, mlops, rag, aws, fine-tuning-llm, llm-evaluation, ml-system-design
Its up 203 new stars this week and ranked at #73 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-04-09. Its license was listed as: MIT License. Its primary language is Python.
#4. zuoyebang/bitalostored
Bitalostored is a high-performance distributed storage system, core engine based on bitalosdb(self-developed), compatible with Redis protocol.
Repo topic tags: database, distributed-storage, high-performance, kvstore, nosql, redis, storage-engine, bitalosdb
Its up 83 new stars this week and ranked at #189 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-02-26. Its license was listed as: Apache License 2.0. Its primary language is Go.
#3. iterative/datachain
DataChain 🔗 Process and curate unstructured data using local ML models and LLM calls
Repo topic tags: ai, cv, data-wrangling, llm, llm-eval, multimodal, data-analytics, embeddings, mlops
Its up 980 new stars this month and was created 125 days ago. It ranked at #42 by new stars relative to its age in days.
This repo was first pushed to Github on 2024-07-23. Its license was listed as: Apache License 2.0. Its primary language is Python.
#2. rio-labs/rio
WebApps in pure Python. No JavaScript, HTML and CSS needed
Repo topic tags: python, ui, webapp, data-analysis, data-science, data-visualization, deep-learning, machine-learning
Its up 345 new stars this week and ranked at #40 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-04-13. Its license was listed as: Apache License 2.0. Its primary language is Python.
#1. ben-nour/SQL-tips-and-tricks
SQL tips and tricks
Repo topic tags: sql, tips-and-tricks, tips, snowflake
Its up 9 new stars on 2024-11-22 and ranked at #133 out of all github repos that first shipped less than a year ago.
This repo was first pushed to Github on 2024-09-19. Its license was listed as: MIT License. Its primary language is SQL.



