<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Metastax - Curated Data Engineering, ML/AI &amp; Analytics</title>
    <link>https://metastax.com</link>
    <description>AI-curated links for data engineering, ML/AI, and analytics practitioners. Updated every 6 hours.</description>
    <language>en-us</language>
    <lastBuildDate>Fri, 08 May 2026 07:27:17 GMT</lastBuildDate>
    <atom:link href="https://metastax.com/feed.xml" rel="self" type="application/rss+xml"/>
    <item>
      <title>Announcing the Program of DuckCon #7 Amsterdam</title>
      <link>https://duckdb.org/2026/05/08/announcing-duckcon7.html</link>
      <guid isPermaLink="false">metastax-5792</guid>
      <pubDate>Fri, 08 May 2026 00:00:00 GMT</pubDate>
      <description>The program for DuckCon #7 Amsterdam, a DuckDB user conference, has been announced. The event will be held on June 24, 2026, and will run from 15:00 to 20:00 CEST.</description>
      <source url="https://duckdb.org/2026/05/08/announcing-duckcon7.html">DuckDB Blog</source>
      <category>duckdb</category>
      <category>analytics</category>
    </item>
    <item>
      <title>What Are Table Formats and Why Were They Needed?</title>
      <link>https://www.dremio.com/blog/what-are-table-formats-and-why-were-they-needed/</link>
      <guid isPermaLink="false">metastax-5759</guid>
      <pubDate>Thu, 07 May 2026 16:00:00 GMT</pubDate>
      <description>This is Part 1 of a 15-part Apache Iceberg Masterclass. This article covers the fundamental question: what problem do table formats solve, and why does the choice between them matter? A data lake without a table format is a collection of files. It has no concept of a transaction, no mechanism to pre</description>
      <source url="https://www.dremio.com/blog/what-are-table-formats-and-why-were-they-needed/">Dremio Blog</source>
      <category>iceberg</category>
      <category>lakehouse</category>
    </item>
    <item>
      <title>Container Design Patterns for Distributed Systems</title>
      <link>https://blog.bytebytego.com/p/container-design-patterns-for-distributed</link>
      <guid isPermaLink="false">metastax-5735</guid>
      <pubDate>Thu, 07 May 2026 15:31:08 GMT</pubDate>
      <description>This article presents container design patterns categorized by their coordination scope, providing a structured overview of common practices for distributed systems.</description>
      <source url="https://blog.bytebytego.com/p/container-design-patterns-for-distributed">ByteByteGo</source>
      <category>architecture</category>
      <category>engineering</category>
    </item>
    <item>
      <title>Most agent reliability problems are data engineering problems</title>
      <link>https://sderosiaux.substack.com/p/from-prompt-engineering-to-data-engineering</link>
      <guid isPermaLink="false">metastax-5748</guid>
      <pubDate>Thu, 07 May 2026 15:29:26 GMT</pubDate>
      <description>The article posits that many agent reliability issues stem from underlying data engineering problems.</description>
      <source url="https://sderosiaux.substack.com/p/from-prompt-engineering-to-data-engineering">Hacker News - Quality &amp; Governance</source>
      <category>data-quality</category>
      <category>governance</category>
    </item>
    <item>
      <title>Using ClickHouse as a Kafka sink? Async inserts change the equation</title>
      <link>https://www.reddit.com/r/apachekafka/comments/1t683y2/using_clickhouse_as_a_kafka_sink_async_inserts/</link>
      <guid isPermaLink="false">metastax-5723</guid>
      <pubDate>Thu, 07 May 2026 11:46:51 GMT</pubDate>
      <description>The post discusses using ClickHouse as a Kafka sink, focusing on how async insert mode helps with high message rates but has buffering and dedupe behaviors that aren&apos;t always obvious.</description>
      <source url="https://www.reddit.com/r/apachekafka/comments/1t683y2/using_clickhouse_as_a_kafka_sink_async_inserts/">r/apachekafka</source>
      <category>kafka</category>
      <category>streaming</category>
    </item>
    <item>
      <title>Parloa builds service agents customers want to talk to</title>
      <link>https://openai.com/index/parloa</link>
      <guid isPermaLink="false">metastax-5716</guid>
      <pubDate>Thu, 07 May 2026 11:00:00 GMT</pubDate>
      <description>Parloa leverages OpenAI models to power scalable, voice-driven AI customer service agents, enabling enterprises to design, simulate, and deploy reliable, real-time interactions.</description>
      <source url="https://openai.com/index/parloa">OpenAI Blog</source>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI agents on Ray Serve: Single to multi-agent architecture</title>
      <link>https://anyscale.com/blog/ai-agents-on-ray-serve-single-to-multi-agent-architecture</link>
      <guid isPermaLink="false">metastax-5787</guid>
      <pubDate>Thu, 07 May 2026 10:00:00 GMT</pubDate>
      <description>This Anyscale blog post discusses the architecture of AI agents on Ray Serve, covering both single-agent and multi-agent architectures. It explores how to build and deploy AI agents using Ray.</description>
      <source url="https://anyscale.com/blog/ai-agents-on-ray-serve-single-to-multi-agent-architecture">Anyscale Blog</source>
      <category>agents</category>
      <category>mlops</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Everyone gets faster writes: Turning off FPW on Neon</title>
      <link>https://neon.com/blog/turning-off-fpw-for-faster-writes</link>
      <guid isPermaLink="false">metastax-5734</guid>
      <pubDate>Thu, 07 May 2026 10:00:00 GMT</pubDate>
      <description>Neon decoupled storage and compute to deliver up to a 5x performance increase on write-heavy workloads by disabling full-page writes.</description>
      <source url="https://neon.com/blog/turning-off-fpw-for-faster-writes">Neon Blog</source>
      <category>postgres</category>
      <category>engineering</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Delta Grows Up: Writes, Unity Catalog and Time Travel</title>
      <link>https://duckdb.org/2026/05/07/delta-uc-updates.html</link>
      <guid isPermaLink="false">metastax-5696</guid>
      <pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate>
      <description>DuckDB&apos;s Delta Lake and Unity Catalog extensions are no longer experimental. The post details the progress of these extensions.</description>
      <source url="https://duckdb.org/2026/05/07/delta-uc-updates.html">DuckDB Blog</source>
      <category>duckdb</category>
      <category>delta</category>
      <category>lakehouse</category>
    </item>
    <item>
      <title>Iceberg Default Column Values: Schema Evolution Without the Backfill</title>
      <link>https://www.dremio.com/blog/dremio-iceberg-v3-default-column-values/</link>
      <guid isPermaLink="false">metastax-5672</guid>
      <pubDate>Wed, 06 May 2026 21:13:17 GMT</pubDate>
      <description>Adding a column to a large production table used to require a plan involving migration scripts, maintenance windows, and backfill jobs that rewrite every data file to include the new column. Iceberg default column values eliminate the need for backfills during schema evolution.</description>
      <source url="https://www.dremio.com/blog/dremio-iceberg-v3-default-column-values/">Dremio Blog</source>
      <category>lakehouse</category>
      <category>iceberg</category>
    </item>
    <item>
      <title>vLLM V0 to V1: Correctness Before Corrections in RL</title>
      <link>https://huggingface.co/blog/ServiceNow-AI/correctness-before-corrections</link>
      <guid isPermaLink="false">metastax-5638</guid>
      <pubDate>Wed, 06 May 2026 19:06:55 GMT</pubDate>
      <description>The post details improvements to vLLM, focusing on correctness before corrections in reinforcement learning.</description>
      <source url="https://huggingface.co/blog/ServiceNow-AI/correctness-before-corrections">Hugging Face Blog</source>
      <category>ml</category>
      <category>llm</category>
      <category>mlops</category>
    </item>
    <item>
      <title>When DNSSEC goes wrong: how we responded to the .de TLD outage</title>
      <link>https://blog.cloudflare.com/de-tld-outage-dnssec/</link>
      <guid isPermaLink="false">metastax-5646</guid>
      <pubDate>Wed, 06 May 2026 17:00:00 GMT</pubDate>
      <description>On May 5, 2026, DENIC published broken DNSSEC signatures for the .de TLD, making millions of domains unreachable. Here&apos;s what 1.1.1.1 saw, how serve stale cushioned the impact, and how we restored resolution.</description>
      <source url="https://blog.cloudflare.com/de-tld-outage-dnssec/">Cloudflare Blog</source>
      <category>engineering</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Stop guessing in production: Full fidelity tracing at scale with ClickHouse and Odigos</title>
      <link>https://clickhouse.com/blog/odigos-full-fidelity-tracing</link>
      <guid isPermaLink="false">metastax-5755</guid>
      <pubDate>Wed, 06 May 2026 15:09:11 GMT</pubDate>
      <description>How ClickStack and Odigos eliminate observability gaps with zero-code eBPF instrumentation and full-fidelity distributed tracing at scale.</description>
      <source url="https://clickhouse.com/blog/odigos-full-fidelity-tracing">ClickHouse Blog</source>
      <category>clickhouse</category>
      <category>observability</category>
    </item>
    <item>
      <title>DuckLake in Action: Manage Lakehouses, Run SQL &amp; Build Notebooks with Rosetta DBT Studio</title>
      <link>https://www.reddit.com/r/DuckDB/comments/1t58cbz/ducklake_in_action_manage_lakehouses_run_sql/</link>
      <guid isPermaLink="false">metastax-5623</guid>
      <pubDate>Wed, 06 May 2026 09:59:27 GMT</pubDate>
      <description>This post links to a video demonstrating the DuckLake workflow using Rosetta DBT Studio. The presentation covers creating and importing lakehouse instances, exploring metadata, running SQL queries, and building reusable SQL Notebooks.</description>
      <source url="https://www.reddit.com/r/DuckDB/comments/1t58cbz/ducklake_in_action_manage_lakehouses_run_sql/">r/duckdb</source>
      <category>duckdb</category>
      <category>lakehouse</category>
      <category>dbt</category>
    </item>
    <item>
      <title>Agentic analytics starts with query-ready data: the write-side cost of Snowflake vs. ClickHouse</title>
      <link>https://clickhouse.com/blog/write-side-cost-performance-snowflake-clickhouse</link>
      <guid isPermaLink="false">metastax-5754</guid>
      <pubDate>Wed, 06 May 2026 06:46:34 GMT</pubDate>
      <description>Agentic analytics makes query-readiness a write-side cost problem. This post compares Snowflake and ClickHouse under continuous ingest, showing how ClickHouse obtains query-ready data at 22× lower cost and delivers 31× better write-side cost-performance.</description>
      <source url="https://clickhouse.com/blog/write-side-cost-performance-snowflake-clickhouse">ClickHouse Blog</source>
      <category>clickhouse</category>
      <category>snowflake</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Singular Bank helps bankers move fast with ChatGPT and Codex</title>
      <link>https://openai.com/index/singular-bank</link>
      <guid isPermaLink="false">metastax-5670</guid>
      <pubDate>Wed, 06 May 2026 00:00:00 GMT</pubDate>
      <description>Singular Bank built Singularity, an internal assistant using ChatGPT and Codex to help bankers save 60–90 minutes daily on meeting prep, portfolio analysis, and follow-up.</description>
      <source url="https://openai.com/index/singular-bank">OpenAI Blog</source>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>Uber uses OpenAI to help people earn smarter and book faster</title>
      <link>https://openai.com/index/uber</link>
      <guid isPermaLink="false">metastax-5671</guid>
      <pubDate>Wed, 06 May 2026 00:00:00 GMT</pubDate>
      <description>Uber uses OpenAI to power AI assistants and voice features that help drivers earn smarter and riders book faster across a global real-time marketplace.</description>
      <source url="https://openai.com/index/uber">OpenAI Blog</source>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>Our AI started a cafe in Stockholm</title>
      <link>https://simonwillison.net/2026/May/5/our-ai-started-a-cafe-in-stockholm/#atom-everything</link>
      <guid isPermaLink="false">metastax-5575</guid>
      <pubDate>Tue, 05 May 2026 22:14:21 GMT</pubDate>
      <description>Simon Willison describes how he used AI agents to launch and run a cafe in Stockholm, detailing the architecture and lessons learned.</description>
      <source url="https://simonwillison.net/2026/May/5/our-ai-started-a-cafe-in-stockholm/#atom-everything">Simon Willison</source>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>Integrating AI Into Apache Kafka Architectures: Patterns and Best Practices</title>
      <link>https://www.confluent.io/blog/ai-kafka-integration-patterns/</link>
      <guid isPermaLink="false">metastax-5569</guid>
      <pubDate>Tue, 05 May 2026 19:40:44 GMT</pubDate>
      <description>The article compares three patterns for integrating AI inference with Apache Kafka: external RPC, embedded, and sidecar, focusing on avoiding consumer rebalances, cutting costs, and scaling LLM pipelines.</description>
      <source url="https://www.confluent.io/blog/ai-kafka-integration-patterns/">Confluent Blog</source>
      <category>kafka</category>
      <category>streaming</category>
    </item>
    <item>
      <title>Jikkou 1.0 is out — Iceberg, multi-cluster orchestration, and Confluent Cloud RBAC</title>
      <link>https://www.reddit.com/r/apachekafka/comments/1t4k1ix/jikkou_10_is_out_iceberg_multicluster/</link>
      <guid isPermaLink="false">metastax-5563</guid>
      <pubDate>Tue, 05 May 2026 16:12:13 GMT</pubDate>
      <description>Jikkou 1.0 is out, featuring Apache Iceberg integration for declarative management of namespaces, tables, and views. It also includes multi-cluster orchestration and Confluent Cloud RBAC.</description>
      <source url="https://www.reddit.com/r/apachekafka/comments/1t4k1ix/jikkou_10_is_out_iceberg_multicluster/">r/apachekafka</source>
      <category>kafka</category>
      <category>streaming</category>
      <category>iceberg</category>
    </item>
    <item>
      <title>Benchmark demonstrates 5-37x improved performance for query on Iceberg tables</title>
      <link>https://startree.ai/resources/iceberg-query-benchmark-vs-trino-vs-clickhouse/</link>
      <guid isPermaLink="false">metastax-5522</guid>
      <pubDate>Tue, 05 May 2026 16:07:47 GMT</pubDate>
      <description>StarTree claims a 5-37x performance improvement for queries on Iceberg tables compared to Trino and ClickHouse, based on their benchmark.</description>
      <source url="https://startree.ai/resources/iceberg-query-benchmark-vs-trino-vs-clickhouse/">Hacker News - Data</source>
      <category>iceberg</category>
      <category>trino</category>
      <category>clickhouse</category>
      <category>lakehouse</category>
      <category>benchmarking</category>
    </item>
    <item>
      <title>RAG Hallucinates — I Built a Self-Healing Layer That Fixes It in Real Time</title>
      <link>https://towardsdatascience.com/rag-hallucinates-i-built-a-self-healing-layer-that-fixes-it-in-real-time/</link>
      <guid isPermaLink="false">metastax-5532</guid>
      <pubDate>Tue, 05 May 2026 13:30:00 GMT</pubDate>
      <description>This article presents a self-healing layer designed to detect and correct hallucinations in RAG systems before they reach users by addressing issues in reasoning.</description>
      <source url="https://towardsdatascience.com/rag-hallucinates-i-built-a-self-healing-layer-that-fixes-it-in-real-time/">Towards Data Science</source>
      <category>llm</category>
      <category>ml</category>
    </item>
    <item>
      <title>Comparing ClickHouse versions with clickhousectl</title>
      <link>https://clickhouse.com/blog/clickhousectl-compare-versions</link>
      <guid isPermaLink="false">metastax-5758</guid>
      <pubDate>Tue, 05 May 2026 10:29:49 GMT</pubDate>
      <description>We use clickhousectl to spin up multiple ClickHouse versions side by side and benchmark two recent performance improvements.</description>
      <source url="https://clickhouse.com/blog/clickhousectl-compare-versions">ClickHouse Blog</source>
      <category>clickhouse</category>
      <category>analytics</category>
      <category>engineering</category>
    </item>
    <item>
      <title>Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)</title>
      <link>https://openai.com/index/mrc-supercomputer-networking</link>
      <guid isPermaLink="false">metastax-5603</guid>
      <pubDate>Tue, 05 May 2026 10:00:00 GMT</pubDate>
      <description>OpenAI introduces MRC (Multipath Reliable Connection), a new supercomputer networking protocol released via OCP to improve resilience and performance in large-scale AI training clusters.</description>
      <source url="https://openai.com/index/mrc-supercomputer-networking">OpenAI Blog</source>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>Little&apos;s Law in practice with Cloud Topics</title>
      <link>https://www.redpanda.com/blog/littles-law-in-practice-with-cloud-topics</link>
      <guid isPermaLink="false">metastax-5644</guid>
      <pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate>
      <description>From spinning disks to CPUs to cloud object storage, shifting bottlenecks have shaped Redpanda&apos;s architecture. Here’s what Cloud Topics revealed about today’s demand for high-latency storage.</description>
      <source url="https://www.redpanda.com/blog/littles-law-in-practice-with-cloud-topics">Redpanda Blog</source>
      <category>streaming</category>
      <category>kafka</category>
    </item>
    <item>
      <title>OpenAI and PwC collaborate to reimagine the office of the CFO</title>
      <link>https://openai.com/index/openai-pwc-finance-collaboration</link>
      <guid isPermaLink="false">metastax-5491</guid>
      <pubDate>Mon, 04 May 2026 21:00:00 GMT</pubDate>
      <description>OpenAI and PwC are collaborating to help businesses automate finance workflows using AI agents, with the aim of improving forecasting, strengthening controls, and modernizing the CFO function.</description>
      <source url="https://openai.com/index/openai-pwc-finance-collaboration">OpenAI Blog</source>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>PGKeeper: Figma&apos;s Postgres connection pooler Renaissance era</title>
      <link>https://www.figma.com/blog/pgkeeper-building-the-bouncer-we-needed-for-postgres/</link>
      <guid isPermaLink="false">metastax-5446</guid>
      <pubDate>Mon, 04 May 2026 16:49:41 GMT</pubDate>
      <description>Figma details the architecture and implementation of PGKeeper, their custom Postgres connection pooler, explaining why existing solutions like PgBouncer didn&apos;t meet their needs.</description>
      <source url="https://www.figma.com/blog/pgkeeper-building-the-bouncer-we-needed-for-postgres/">Hacker News - Data</source>
      <category>postgres</category>
      <category>engineering</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Goodbye limitations, hello data: How Qonto is rethinking observability with ClickHouse Cloud</title>
      <link>https://clickhouse.com/blog/qonto</link>
      <guid isPermaLink="false">metastax-5757</guid>
      <pubDate>Mon, 04 May 2026 12:35:21 GMT</pubDate>
      <description>How Qonto uses ClickHouse Cloud to power observability at scale — replacing sampling and hour-capped queries with two-week query windows, 99.84% compression on high-cardinality data, and an AI incident companion built on the ClickHouse MCP server.</description>
      <source url="https://clickhouse.com/blog/qonto">ClickHouse Blog</source>
      <category>clickhouse</category>
      <category>analytics</category>
      <category>observability</category>
    </item>
    <item>
      <title>The DuckLake Spec is so Simple, Even a Clanker Can Build One for Dataframes</title>
      <link>https://duckdb.org/2026/05/04/ducklake-dataframe.html</link>
      <guid isPermaLink="false">metastax-5434</guid>
      <pubDate>Mon, 04 May 2026 00:00:00 GMT</pubDate>
      <description>This blog post discusses the simplicity of the DuckLake specification for dataframes.</description>
      <source url="https://duckdb.org/2026/05/04/ducklake-dataframe.html">DuckDB Blog</source>
      <category>duckdb</category>
      <category>analytics</category>
    </item>
    <item>
      <title>How OpenAI delivers low-latency voice AI at scale</title>
      <link>https://openai.com/index/delivering-low-latency-voice-ai-at-scale</link>
      <guid isPermaLink="false">metastax-5460</guid>
      <pubDate>Mon, 04 May 2026 00:00:00 GMT</pubDate>
      <description>OpenAI describes how it rebuilt its WebRTC stack to power real-time Voice AI with low latency, global scale, and seamless conversational turn-taking.</description>
      <source url="https://openai.com/index/delivering-low-latency-voice-ai-at-scale">OpenAI Blog</source>
      <category>llm</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Icestream – enabling efficient streaming writes in Apache Iceberg</title>
      <link>https://github.com/jordepic/Icestream</link>
      <guid isPermaLink="false">metastax-5387</guid>
      <pubDate>Sun, 03 May 2026 20:50:09 GMT</pubDate>
      <description>Icestream is a project enabling efficient streaming writes in Apache Iceberg.</description>
      <source url="https://github.com/jordepic/Icestream">Hacker News - Data</source>
      <category>iceberg</category>
      <category>streaming</category>
    </item>
    <item>
      <title>I&apos;m working on a conversational analytics agent builder with dedicated DuckDB support</title>
      <link>https://www.reddit.com/r/DuckDB/comments/1t2t17p/im_working_on_a_conversational_analytics_agent/</link>
      <guid isPermaLink="false">metastax-5385</guid>
      <pubDate>Sun, 03 May 2026 18:16:46 GMT</pubDate>
      <description>A developer is building a no-code agent builder that uses DuckDB to create conversational analytics agents. These agents can respond to queries with interactive charts and UI, allowing users to query databases more easily.</description>
      <source url="https://www.reddit.com/r/DuckDB/comments/1t2t17p/im_working_on_a_conversational_analytics_agent/">r/duckdb</source>
      <category>duckdb</category>
      <category>agents</category>
      <category>analytics</category>
    </item>
    <item>
      <title>How to Work and Compound with AI</title>
      <link>https://eugeneyan.com//writing/working-with-ai/</link>
      <guid isPermaLink="false">metastax-5490</guid>
      <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
      <description>This post proposes a framework for leveraging AI, emphasizing context as infrastructure, taste as configuration, verification for autonomy, scaling through delegation, and closing feedback loops for continuous improvement.</description>
      <source url="https://eugeneyan.com//writing/working-with-ai/">Eugene Yan</source>
      <category>ml</category>
      <category>llm</category>
    </item>
    <item>
      <title>[Extension] ducksmiles — community extension for chemistry data (SMILES / InChI / PDB)</title>
      <link>https://www.reddit.com/r/DuckDB/comments/1t1pctx/extension_ducksmiles_community_extension_for/</link>
      <guid isPermaLink="false">metastax-5332</guid>
      <pubDate>Sat, 02 May 2026 13:04:35 GMT</pubDate>
      <description>A user shares a community extension for DuckDB called `ducksmiles` that enables processing chemistry data, including SMILES, InChI, and PDB formats, directly within DuckDB. The extension supports functions like `mol_formula` and `mol_weight` and integrates with `read_csv_auto`, `read_text`, and `httpfs` for data ingestion.</description>
      <source url="https://www.reddit.com/r/DuckDB/comments/1t1pctx/extension_ducksmiles_community_extension_for/">r/duckdb</source>
      <category>duckdb</category>
      <category>engineering</category>
    </item>
    <item>
      <title>Code Orange: Fail Small is complete. The result is a stronger Cloudflare network</title>
      <link>https://blog.cloudflare.com/code-orange-fail-small-complete/</link>
      <guid isPermaLink="false">metastax-5311</guid>
      <pubDate>Fri, 01 May 2026 21:07:30 GMT</pubDate>
      <description>Cloudflare completed an engineering effort to make its infrastructure more resilient using tools like Snapstone and the Engineering Codex. They implemented safer configuration changes and automated best practices to prevent future incidents.</description>
      <source url="https://blog.cloudflare.com/code-orange-fail-small-complete/">Cloudflare Blog</source>
      <category>engineering</category>
      <category>architecture</category>
    </item>
    <item>
      <title>How we accelerated transpilation by compiling SQLGlot with mypyc | Blog | Fivetran</title>
      <link>https://www.fivetran.com/blog/how-we-accelerated-transpilation-by-compiling-sqlglot-with-mypyc</link>
      <guid isPermaLink="false">metastax-5272</guid>
      <pubDate>Fri, 01 May 2026 13:00:02 GMT</pubDate>
      <description>Fivetran accelerated the transpilation of SQL dialects in SQLGlot by compiling it with mypyc, resulting in faster translation between different SQL dialects for query engines.</description>
      <source url="https://www.fivetran.com/blog/how-we-accelerated-transpilation-by-compiling-sqlglot-with-mypyc">Fivetran Blog</source>
      <category>data-engineering</category>
      <category>etl</category>
      <category>sql</category>
    </item>
    <item>
      <title>Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding</title>
      <link>https://arxiv.org/abs/2604.28028</link>
      <guid isPermaLink="false">metastax-5261</guid>
      <pubDate>Fri, 01 May 2026 04:00:00 GMT</pubDate>
      <description>This arXiv paper introduces a method to improve Text-to-SQL accuracy by using template-constrained decoding, particularly for recurring questions. It addresses challenges in real-world deployment of Text-to-SQL models, especially in complex or unseen schemas.</description>
      <source url="https://arxiv.org/abs/2604.28028">arXiv Databases</source>
      <category>semantic-layer</category>
      <category>llm</category>
    </item>
    <item>
      <title>DuckDB infers NULL-only columns as JSON type — here&apos;s how we fixed it with a canonical sample</title>
      <link>https://www.reddit.com/r/DuckDB/comments/1t03slk/duckdb_infers_nullonly_columns_as_json_type_heres/</link>
      <guid isPermaLink="false">metastax-5269</guid>
      <pubDate>Thu, 30 Apr 2026 17:57:00 GMT</pubDate>
      <description>DuckDB infers NULL-only columns as the generic JSON type, causing staging issues when real values appear later. The solution involves using a synthetic canonical sample to ensure correct type inference from the outset.</description>
      <source url="https://www.reddit.com/r/DuckDB/comments/1t03slk/duckdb_infers_nullonly_columns_as_json_type_heres/">r/duckdb</source>
      <category>duckdb</category>
      <category>data-quality</category>
    </item>
    <item>
      <title>Why AI Engineers Are Moving Beyond LangChain to Native Agent Architectures</title>
      <link>https://towardsdatascience.com/why-ai-engineers-are-moving-beyond-langchain-to-native-agent-architectures/</link>
      <guid isPermaLink="false">metastax-5245</guid>
      <pubDate>Thu, 30 Apr 2026 12:00:00 GMT</pubDate>
      <description>This post discusses the shift of AI engineers from using frameworks like LangChain to building native agent architectures for production LLM applications.</description>
      <source url="https://towardsdatascience.com/why-ai-engineers-are-moving-beyond-langchain-to-native-agent-architectures/">Towards Data Science</source>
      <category>agents</category>
      <category>llm</category>
      <category>mlops</category>
    </item>
    <item>
      <title>Where the goblins came from</title>
      <link>https://openai.com/index/where-the-goblins-came-from</link>
      <guid isPermaLink="false">metastax-5228</guid>
      <pubDate>Wed, 29 Apr 2026 20:00:00 GMT</pubDate>
      <description>The post discusses the timeline, root cause, and fixes behind &quot;goblin outputs,&quot; which are personality-driven quirks in GPT-5 behavior.</description>
      <source url="https://openai.com/index/where-the-goblins-came-from">OpenAI Blog</source>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>A DuckDB extension for vector search indexes with pluggable quantization</title>
      <link>https://github.com/Icemap/duckdb-vector-index</link>
      <guid isPermaLink="false">metastax-5115</guid>
      <pubDate>Wed, 29 Apr 2026 19:52:04 GMT</pubDate>
      <description>This post links to a DuckDB extension for vector search indexes with pluggable quantization.</description>
      <source url="https://github.com/Icemap/duckdb-vector-index">Hacker News - Data</source>
      <category>duckdb</category>
      <category>vector-db</category>
    </item>
    <item>
      <title>Giving agents the ability to pay</title>
      <link>https://stripe.com/blog/giving-agents-the-ability-to-pay</link>
      <guid isPermaLink="false">metastax-5138</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description>Stripe introduces Link’s wallet for agents, offering programmatic access to generate one-time-use cards or Shared Payment Tokens, built on Stripe’s new Issuing for agents.</description>
      <source url="https://stripe.com/blog/giving-agents-the-ability-to-pay">Stripe Engineering</source>
      <category>engineering</category>
      <category>architecture</category>
    </item>
    <item>
      <title>I turned recurring Kafka production failures into a practical troubleshooting guide</title>
      <link>https://www.reddit.com/r/apachekafka/comments/1syg4qb/i_turned_recurring_kafka_production_failures_into/</link>
      <guid isPermaLink="false">metastax-5170</guid>
      <pubDate>Tue, 28 Apr 2026 22:17:35 GMT</pubDate>
      <description>A user shares a practical troubleshooting guide derived from recurring Kafka production failures, covering issues like consumer lag, producers writing without consumers reading, and duplicate processing after restart due to offset commit problems.</description>
      <source url="https://www.reddit.com/r/apachekafka/comments/1syg4qb/i_turned_recurring_kafka_production_failures_into/">r/apachekafka</source>
      <category>kafka</category>
      <category>streaming</category>
    </item>
    <item>
      <title>How Stripe Detects Fraudulent Transactions Within 100 ms</title>
      <link>https://blog.bytebytego.com/p/how-stripe-detects-fraudulent-transactions</link>
      <guid isPermaLink="false">metastax-5085</guid>
      <pubDate>Tue, 28 Apr 2026 03:03:50 GMT</pubDate>
      <description>This article explores Stripe&apos;s Radar system for detecting fraudulent transactions within 100 ms, detailing the architectural decisions behind its effectiveness.</description>
      <source url="https://blog.bytebytego.com/p/how-stripe-detects-fraudulent-transactions">ByteByteGo</source>
      <category>architecture</category>
      <category>engineering</category>
    </item>
    <item>
      <title>Iceberg Deletion Vectors: The Better Way to Delete Rows</title>
      <link>https://www.dremio.com/blog/dremio-iceberg-v3-deletion-vectors/</link>
      <guid isPermaLink="false">metastax-5033</guid>
      <pubDate>Mon, 27 Apr 2026 13:08:39 GMT</pubDate>
      <description>The post discusses how Iceberg deletion vectors offer a more efficient way to handle row deletions in data lakehouses, where deleting rows can be an expensive operation due to the immutable nature of Parquet files.</description>
      <source url="https://www.dremio.com/blog/dremio-iceberg-v3-deletion-vectors/">Dremio Blog</source>
      <category>lakehouse</category>
      <category>iceberg</category>
    </item>
    <item>
      <title>Choco automates food distribution with AI agents</title>
      <link>https://openai.com/index/choco</link>
      <guid isPermaLink="false">metastax-5032</guid>
      <pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate>
      <description>Choco used OpenAI APIs to streamline food distribution, boost productivity, and unlock growth, providing a customer story on the real-world impact of AI.</description>
      <source url="https://openai.com/index/choco">OpenAI Blog</source>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>The Journey from Scattered Data to an Apache Iceberg Lakehouse with Governed Agentic Analytics</title>
      <link>https://www.dremio.com/blog/the-journey-from-scattered-data-to-an-apache-iceberg-lakehouse-with-governed-agentic-analytics/</link>
      <guid isPermaLink="false">metastax-4967</guid>
      <pubDate>Sun, 26 Apr 2026 08:11:40 GMT</pubDate>
      <description>The article outlines a strategy for modernizing data platforms by migrating to an Apache Iceberg lakehouse. It suggests avoiding long ETL pipeline builds and focusing on faster time-to-value for analysts.</description>
      <source url="https://www.dremio.com/blog/the-journey-from-scattered-data-to-an-apache-iceberg-lakehouse-with-governed-agentic-analytics/">Dremio Blog</source>
      <category>lakehouse</category>
      <category>iceberg</category>
    </item>
    <item>
      <title>Pgrx: Build Postgres Extensions with Rust</title>
      <link>https://github.com/pgcentralfoundation/pgrx</link>
      <guid isPermaLink="false">metastax-4939</guid>
      <pubDate>Sat, 25 Apr 2026 08:16:19 GMT</pubDate>
      <description>Pgrx is a framework for building PostgreSQL extensions using Rust, enabling developers to leverage Rust&apos;s safety and performance features within the Postgres environment.</description>
      <source url="https://github.com/pgcentralfoundation/pgrx">Hacker News - Data</source>
      <category>postgres</category>
      <category>engineering</category>
    </item>
    <item>
      <title>What&apos;s New in pg_clickhouse - JSONB Support, SQL value functions, Streaming, and more</title>
      <link>https://clickhouse.com/blog/pg_clickhouse-whats-new-april-2026</link>
      <guid isPermaLink="false">metastax-4997</guid>
      <pubDate>Fri, 24 Apr 2026 17:05:36 GMT</pubDate>
      <description>Recent pg_clickhouse releases introduce JSONB, date/time, and array function pushdown, plus HTTP result set streaming for lower memory usage.</description>
      <source url="https://clickhouse.com/blog/pg_clickhouse-whats-new-april-2026">ClickHouse Blog</source>
      <category>clickhouse</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Use whisper.cpp within DuckDB to translate / transpile speech to text</title>
      <link>https://github.com/tobilg/duckdb-whisper</link>
      <guid isPermaLink="false">metastax-4888</guid>
      <pubDate>Fri, 24 Apr 2026 14:55:22 GMT</pubDate>
      <description>The article discusses using whisper.cpp within DuckDB to translate speech to text.</description>
      <source url="https://github.com/tobilg/duckdb-whisper">Hacker News - Data</source>
      <category>duckdb</category>
      <category>ml</category>
    </item>
  </channel>
</rss>
