Thursday 23 November 2023

Find Anything In Your Documents - Fast.

Not sure if any of you have this problem: You have a Google Docs chock full o' documents you've written or collected over a decade, you have a Documents folder on your home directory on your server that's also chocka doccas, you *know* you had one (or wrote one) that was about the exact topic you want to write about - and . . . *blank*

I was desperate enough to forego all formatting and images if it meant I could just feed all of that into a huge database-based app that let me keep future notes on it as well - even if that meant b&w, dreary reams of text to go through and meant changing my whole workflow. I looked them up. Argh. So few features that I wanted.

I had a brainfart and asked ChatGPT. Among other suggestions, Copernic Desktop got thrown at me. But. (And this is my recurring plaint, the song of my pensioner people:) I can't afford an extra monthly fee... But I did go to one of those "apps just like xyzzy" sites and found a heap more. Near the top of the heap was a free open source software named DocFetcher. Installed it just this morning and I don't think I need to look further. 

DocFetcher is a bit more tech-fiddly to set up if you've never done this before, but even as it comes right out of the install, all you need to do is read the first page, point it at your Documents folder or whatever (the first page tells you how) and that would answer most of your needs. So don't be scared of it. It's bloody marvellous. 

If you know regexes (REGular EXpressions) then fine tuning what you want is a piece of cake. I just needed it to ignore MP3s and MP4s because why would I want to search for text in those? And so " .*\.mp* " was pretty much all I added to the exclusions list, which sped things up hugely. 

My Documents folder has text, Word docs, PDFs, videos, images, spreadsheets - but only the videos take ages for DF to search and are generally not great sources of text anyway. Images - I'm not sure if DF does OCR (Optical Character Recognition) on those but on the off chance, I'll save myself the trouble of writing another one or two dozen regexes to exclude those.

And it's fast enough anyway - PDFs only slow it a bit, and all the other formats seem to get recognised and recorded. 

But what about a way to grab stuff off my Google Docs? A moment's head-scratching and a flash of light: Install Google Drive, let it synchronise locally, and then point DF at that folder, same exclusions - and now I have all my text searchable inside this one app. (For those that don't know, Google Docs stores all your documents in Google Drive but - as far as I know, at this point in time - those documents don't count towards your Gb space quota. So every document appears in your Google Drive folder when you install it, with the extension ".gdoc" )

So now I can type in "non-struct" and all document with non-struct in them will show up for me. ("non-struct" is non-structural and refers to lumber from the timber stores and hardware stores around the place that I have a few pages with dimensions etc noted down.

I've found that DF opens documents in their default applications, which means your Google Docs will show up in your web browser, docx in your word processor, etc. 

Any of that helpful for you? I hope you found something useful in this short article. And I'm hoping you'll help me by sharing this post and my many others like it to your social and messaging networks please. Also if you want to spread the word just ask them to search for "teds news stand" online and they (and you!) can see my latest twenty or so posts across all my blogs, and sign up for the once-a-week newsletter so you'll always know when my next posts are coming out.

You can also help by donating the cost of a cup of coffee, one-time or monthly. And use the Mastodon link to chat with me. 

Thank you for your attention, hope to see you in the next article!

No comments: