Microsoft open-sourced a Python tool for converting files and office documents to Markdown
Microsoft open-sourced a Python tool for converting files and office documents to Markdown
GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown.
FWIW if you are interested in such tooling consider also
soffice
andpandoc
which have (as far as I can tell) similar features but have been existing for years now and are not related to Microsoft.Edit: not related to Microsoft AND Google, seems the transcription aspect (which IMHO is still weird in that context but OK) is done via Google servers, cf https://lemmy.ml/post/23629310/15586865
The single exception to this (which is actually buried fairly deep in the feature list) is the audio transcription tool. I didn't take a closer look at what is used to perform this, but at least it's not "just" document conversion like pandoc.
Thanks for the clarification but I'm a bit confused here, like audio transcription, STT, done by e.g. Whisper? If so what's the use case? When I think of Office documents audio transcription is not something I have in mind.