Weekly CouchDB meeting – summary

  • 1.6.0 release: voting for 1.6 rc3 is open; Erlang R15 and R16 are fine, but several unexpected issues for testing with Erlang R14BX were reported; any help with testing and diagnosing the issue is welcome
  • rcouch merge status: the recent call for testing led to progress on COUCHDB-1994.The Windows build and some other little bugs could be fixed. Still, help with testing is very welcome!
  • BigCouch merge status: large progress was made last week, next steps will be code review and testing
  • BigCouch branch and the new multirepo design: how to edit individual repos and split big CouchDB.git into single and finally run the merge

Major Discussions

Error when installing CouchDB on Windows 7 (see thread)

Installation of CouchDB 1.5.1_R16B02 binary on Windows 7 lead to an issue caused by a problem with the Windows R16B02 installation file. This file has been replaced quickly, everything works now. The installation file can be downloaded here.

Compaction of a database with the same number of documents is getting slower over time (see thread)

A database containing a relatively stable number of documents (in CouchDB: "doc_count"), the documents themselves change frequently (including many insertions and deletions (CouchDB: "doc_del_count")). The person asking expected the compaction time to stay the same, but numbers showed that it took longer to compact the database over time, thus they asked why this was the case, and how it could overcome.

The reply is that compaction time in CouchDB scales with the sum of "doc_count" and "doc_del_count". As explained by Adam Kocoloski, this can currently be controlled by "1) [Purging] the deleted docs (lots of caveats about replication, potential for view index resets, etc.); 2) [Rotating] over to a new database, either by querying both or by replicating non-deleted docs to the new one. Neither one is particularly palatable. CouchDB currently keeps the tombstones around forever so that replication can always work. Making changes on that front is a pretty subtle thing but maybe not completely impossible. Also, there's a new compactor in the works that is faster and generates smaller files."

Data replication: replicating only non-deleted documents (same discussion as above; see thread)

The approach could be to either run a filtered replication or block deleted documents with a validate_doc_update function on the target database (see example function here).

How to handle terabyte databases (see thread)

This was a request about improving performance, insert time, compaction and replication for terabyte databases with billions of documents (setup: CouchDB 1.2 and CouchDB 1.4). Some approaches to handle a write-heavy workload:

  • Replication: Supply parameters to allocate more resources to a given job (code example here).
  • Insert time: Ensure that e.g. compaction only runs in the background and does not impact the throughput of interactive operations; this is work in process at the moment.
  • Compaction: A new, significantly faster compactor that also generates smaller post-compaction files that also eliminates the exponential falloff in throughput observed by the inquirer is in the works. – This problem can be solved by BigCouch as it tries to partition databases.
  • Ensure to not exceed the write capacity of the RAID (this effect is amplified for partitioned databases).
Vote: Release Apache CouchDB 1.6.0 rc3 (ongoing testing and discussion; see thread)

We encourage the whole community to download and test these release artefacts so that any critical issues can be resolved before the release is made. Everyone is free to vote on this release, so get stuck in on our dev@couchdb.apache.org mailing list! Find all release artefacts we are voting on in this list. If you want to test, please follow this test procedure. The changes since last vote round can be found here.

Releases in the CouchDB Universe

Opinions

Use Cases, Questions and Answers

Get involved!

If you want to get into working on CouchDB:

  • rcouch Merge: Erlang hackers and CouchDB users, e need your help with testing and review of the rcouch merge. It's easy! Find the how-to in this post.
  • Here's a list of beginner tickets around our currently ongoing Fauxton-implementation. If you have any questions or need help, don't hesitate to contact us in the couchdb-dev IRC room (#couchdb-dev) – Garren (garren) and Sue (deathbear) are happy to help.
  • You want join us for the updates of CouchDB-Python for Python 3? Take a look at issue 231.

We'd be happy to have you!

Events

  • April 26, Delhi, India: CouchDB meetup - Understanding how design documents work
  • June 16, 17, San Francisco, CA: CloudantCON

Job opportunities for people with CouchDB skills

… and also in the news

Posted on behalf of Lena Reinhard.