I noticed that GitHub shows language statistics as percentages for each repository, but it doesn’t show the actual line count. This would be really helpful when I want to understand how big a project is.
For example, if I see a project has 1,000 lines it’s probably simple, but if it has 50,000+ lines then it’s likely complex and large.
Is there a way to get the actual number of code lines for different programming languages in a GitHub repo? I’d prefer not to download the entire repository since some projects are huge and would take forever to clone.
I know there are ways to count lines locally after cloning, but that seems inefficient for just getting a quick overview of project size.
GitHub CLI is probably your best bet. Install gh and run gh api repos/owner/repo/languages
to get byte counts for each language - no downloading needed. Sure, it’s bytes not lines, but most languages average 25-40 characters per line with whitespace, so you can estimate pretty well. The real win is you can script this across multiple repos or automate it completely. I use this all the time when I need to quickly evaluate several projects. Plus the GitHub API handles auth automatically for private repos, which is nice.
I use tokei for this - works perfectly. Install it through cargo or grab the binary, then run tokei https://github.com/username/repo
on any public repo without cloning. You get exact line counts by language, plus comments and blank lines broken out separately. Way more accurate than guessing from API bytes since languages vary so much in verbosity. It’s fast and handles big repos no problem. Private repos need cloning first, but for public ones it saves tons of time and bandwidth.
u can use GitHub’s API! just look at the languages endpoint for any repo. it gives you bytes per lang. u can kinda estimate line count from that. def faster than cloning huge repos, that’s for sure!