Discussion
Search code, repositories, users, issues, pull requests...
jascha_eng: FWIW TJ is not your average vibe coder imo: https://www.linkedin.com/in/todd-j-green/In september he burned through 3000$ in API credits though, but I think that's before we finally bought max plans for everyone that wanted it.
gplprotects: > ParadeDB, is guarded behind AGPLWhat a wonderful ad for ParadeDB, and clear signal that "TigerData" is a pernicious entity.
tjgreen: Okay then!
shreyssh: Nice work. pg_search has been on my radar for a while, having BM25 natively in Postgres instead of bolting on Elasticsearch is a huge DX win. Curious about the index build time on larger datasets though. I'm working with ~2M row tables and the bottleneck for most Postgres extensions I've tried isn't query speed, it's the initial indexing. Any benchmarks on that?
lsaferite: You: > "TigerData" is a pernicious entityTigerData: > pg_textsearch v1.0 is freely available via open source (Postgres license)They deemed AGPL untenable for their business and decided to create an OSS solution that used a license they were comfortable with and they are somehow "pernicious"? Perhaps take a moment to reflect on your characterization of a group that just contributed an alternative OSS project for a specific task. Not only that, but they used a VERY permissive license. I'd argue that they are being a better OSS community member for selecting a more permissive license.
gplprotects: https://malus.sh/
gmassman: Very exciting! Congrats on the release, this will be a huge benefit to all folks building RAG/rerank systems on top of Postgres. Looking forward to testing it out myself.
tjgreen: Yep, there are numbers in the blog post and repo. We are able to index MS-MARCO v2 (138M documents, around 50GB of raw data) in a bit under 18 minutes.
tjgreen: For 2M scale dataset, you should be able to index in about 1 minute on low-end hardware. See the MS-MARCO v1 (8M documents) numbers, measured on cheap Github runners.