DocFetcher review: free, open-source full-text search for your documents
Last updated: · written by the FileLocator team
DocFetcher answers a simple question: what if you could have indexed full-text search — the thing Copernic and X1 charge yearly for — completely free and open source? Built in Java on the Eclipse platform and powered by Apache Lucene, DocFetcher builds a content index of the folders you choose and then answers content queries in a blink. We put it through our 1.2-million-file test library to find out what "free" costs you in practice.
7.5/10
Verdict: remarkable free content search that asks for patience and a little hand-holding
DocFetcher delivers genuinely fast indexed content search with a powerful Lucene query syntax, runs on Windows, macOS and Linux, and even works portably from a USB stick. The trade-offs are real: a dated Java-based interface, manual index management, no email indexing, and a project whose development has been quiet for years. If you can live with that, it's the best zero-dollar content indexer around.
Who DocFetcher is for
DocFetcher fits researchers, students, writers, lawyers on a budget and archivists — anyone sitting on a big, fairly stable pile of documents they search by content, repeatedly. Index your thesis sources or your contract archive once, and every later query is near-instant and free forever. The portable mode adds a niche superpower: you can carry an entire searchable document archive, index and all, on a USB stick and run it on any machine without installing anything.
It's the wrong tool if you need email search (it doesn't index Outlook — see our Copernic review for that), if your files churn constantly, or if you want something that configures itself. DocFetcher expects you to decide what gets indexed and when the index gets rebuilt.
Key features
Lucene-powered content indexing
You point DocFetcher at folders, it extracts text from the documents inside and stores it in an Apache Lucene index — the same search engine library behind Elasticsearch. Supported formats cover the essentials: Microsoft Office (old and new), OpenOffice/LibreOffice, PDF, HTML, RTF, EPUB and plain text, with source code files treated as text. Once indexed, a content query across hundreds of thousands of documents returns in around a second.
A real query language
Because it's Lucene underneath, DocFetcher speaks a proper query syntax rather than a single naive search box. You get OR and AND boolean operators, wildcards (repor*), fuzzy matching (liability~ catches misspellings), phrase search with quotes, proximity search ("grant funding"~5) and per-field queries like filename:budget. It's a different dialect from regex — if you want true regular expressions over file contents, see grepWin — but for document research it's arguably more useful, and the fuzzy operator has no equivalent in most rivals.
Result pane with highlighted previews
Results show in a sortable table, and selecting one renders a text preview with every hit highlighted and next/previous-hit buttons. It's plain text rather than the formatted rendering X1 does, but it's quick and does the essential job: confirming a match without opening the file.
Portable mode
The portable build packages the app, your indexes, and (if you like) the documents themselves into one folder that runs from anywhere — USB stick, external drive, network share — with relative paths kept intact. Combined with the cross-platform Java base, the same searchable archive works on Windows, macOS and Linux.
Folder watching
On Windows, DocFetcher's daemon can watch indexed folders and queue changed files for re-indexing, keeping the index reasonably fresh automatically. Elsewhere — and for network shares — you'll be rebuilding or updating indexes manually from the right-click menu.
Performance in our testing
We indexed the document-heavy half of our 1.2-million-file library (roughly 400,000 Office, PDF, HTML and text files) on our Ryzen 7 / 32 GB / NVMe machine. Initial indexing is the toll booth: it took a few hours, with one CPU core pegged most of the time and PDFs proving by far the slowest format. That's in line with Copernic and X1 — full-text extraction is simply expensive — but DocFetcher gives you less feedback while it grinds, and a handful of malformed PDFs produced errors we had to dismiss.
After indexing, the experience flips. Content queries — boolean, fuzzy, phrase — consistently came back in about a second, on a library where a raw on-demand scan takes the better part of an hour. Memory use was moderate for a Java app: typically a few hundred megabytes while searching, more during indexing. The index itself landed at a few gigabytes. One caveat from extended use: if you forget to update an index, DocFetcher will cheerfully return stale results and miss new files — the freshness burden is on you in a way it isn't with always-on commercial indexers.
Ease of use
Functional, dated, learnable — that's the honest arc. The SWT-based interface looks like 2010 and behaves like it too: dense panes, small icons, a scope panel where indexes are created and updated by right-clicking. Nothing is hard once you know the workflow (create index → wait → search), but the app does little to teach it, and the manual is where you'll learn the good query syntax. You'll also need Java present (bundled in some packages, a separate install in others, depending on platform and version). None of this is a dealbreaker; all of it explains why DocFetcher scores below the commercial tools on polish while matching them on raw search quality.
Development is the other asterisk. The last stable release is years old, and project activity has been largely dormant for some time (a community fork has carried some fixes). The tool still works fine in our testing on current Windows 11 — Lucene doesn't rot — but don't expect new format support or UI modernization.
Pricing
Free. DocFetcher is open source under the Eclipse Public License, with no paid tier, no nag screens, no telemetry, and no account. Download it from docfetcher.sourceforge.io. The only costs are your time (index setup and upkeep) and disk space for the index. For what commercial desktop indexers charge per year, that's a striking amount of capability for nothing — which is why it headlines our best free file search tools list for content search.
What we like
- Completely free and open source — no tiers, no nags
- Fast Lucene-backed content queries once indexed
- Powerful syntax: boolean, wildcard, fuzzy, phrase, proximity
- Portable mode carries a searchable archive on a USB stick
- Cross-platform: Windows, macOS and Linux
What to know
- Java dependency and a visibly dated interface
- Index management is largely manual — stale indexes miss new files
- No email indexing at all (no Outlook, no mbox)
- Development largely dormant; last stable release is years old
- Initial indexing takes hours on large, PDF-heavy libraries
Alternatives to consider
If indexing upkeep puts you off, Agent Ransack takes the opposite approach: free on-demand content search with boolean and regex support and no index to maintain — slower per query, zero maintenance. For find-and-replace jobs with full regular expressions and capture groups, grepWin is the free specialist. And if you'd happily pay to add Outlook email and automatic index freshness to the picture, Copernic Desktop Search is the commercial version of DocFetcher's idea. Our free file search tools roundup puts all the no-cost options side by side, our content search guide explains the indexed-versus-live tradeoff, and PDF-heavy users should read our guide to searching inside PDFs — including what to do about scanned documents DocFetcher can't read. Linux users will find the wider ecosystem (Recoll, ripgrep and friends) in our Linux file search guide.
Frequently asked questions
Is DocFetcher still maintained?
Officially, barely — the last stable release is years old and activity on the original project has been quiet for a long time, though a community fork has picked up some maintenance. In our testing the stable release still runs fine on Windows 11, macOS and mainstream Linux distributions.
Can DocFetcher search scanned PDFs?
No. DocFetcher extracts existing text; a scanned PDF without an OCR text layer is just pictures to it. Run OCR first (our search-inside-PDFs guide covers free ways to do that), then index the OCR'd copies.
Does DocFetcher search filenames too, or only contents?
Both — the filename: field targets names specifically, and plain queries match names as well as contents. But it only sees what you've indexed. For instant filename search across whole drives, a dedicated filename tool is faster and needs no setup.
Final verdict
DocFetcher earns 7.5/10 as the best free content indexer we've tested. The search core is excellent — Lucene queries over your document archive in a second, with fuzzy and proximity operators the commercial tools don't expose — and portable mode is unique in this category. The deductions are all about everything around the core: Java baggage, a 2010-vintage UI, manual index care, no email, and a development pulse you have to take on faith. For a stable document archive you search constantly, DocFetcher is an easy recommendation at the unbeatable price of free; for fast-changing files or mailbox search, pick a different tool.
Want the full rankings?
We compare DocFetcher against nine other search tools — free and paid — across speed, content search and upkeep.
keep exploring
Related reading
Agent Ransack review
Free content search with no index to maintain — the zero-upkeep alternative.
Best free file search tools
Every capable no-cost search tool ranked, from filename finders to content indexers.
How to search inside PDFs
Make every PDF findable — including OCR for scanned documents DocFetcher can't read.