classification
Title: xz compressor support
Type: feature request Stage: needs patch
Components: Library (Lib) Versions: Python 3.2
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Christophe Simonis, Garen, Nikratio, amaury.forgeotdarc, arekm, devurandom, doko, eric.araujo, haypo, jreese, lars.gustaebel, leonov, loewis, nicdumz, ockham-razor, pitrou, proyvind, rcoyner, thedjatclubrock, ysj.ray
Priority: high Keywords:

Created on 2009-08-17 09:47 by devurandom, last changed 2010-08-19 20:14 by jreese.

Messages (23)
msg91657 - (view) Author: (devurandom) Date: 2009-08-17 09:47
Python currently supports zlib, gzip and bzip2 compressors. What is missing is support 
for xz (http://tukaani.org/xz/). It comes with a C library.
msg91658 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * Date: 2009-08-17 11:37
Is zc really a C library? I could find a standalone program, but no
shared object.
Actually, it seems that zc is a file format based on the lzma algorithm.
The plan could be to first implement the lzma module (issue5689), then a
xzfile module in pure python.
msg91660 - (view) Author: Skip Montanaro (skip.montanaro) * Date: 2009-08-17 11:51
What is xz compression and why is it important?

Skip
msg91661 - (view) Author: (devurandom) Date: 2009-08-17 12:13
Yes, xz-utils contains a C library, though it still caries the name
"liblzma.so", probably for historic reasons.
You are right that xz is a file format based around the lzma algorithm.
It just uses a more advanced container format. (lzma-utils had no
container at all.)

xz is the successor of lzma, which provides a better compression than
bzip2, while decompression speed is comparable with gzip. It is used by
the GNU project for source tarball compression (replacing bzip2) and
supported by GNU tar. See http://en.wikipedia.org/wiki/Xz,
http://tukaani.org/xz/ and http://tukaani.org/lzma/ for reference.
msg92163 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-09-02 11:02
Are xz and lzma formats compatible with each other? If not, which one is
the most popular?
msg92167 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * Date: 2009-09-02 14:22
As I understand it from the http://tukaani.org/xz/ page, .lzma and .xz
are two file formats bases on the lzma compression method.
- .lzma simply stores the compressed stream.
- .xz is more complex
msg92174 - (view) Author: (devurandom) Date: 2009-09-02 18:10
.lzma is actually not a format. It is just the raw output of the LZMA1
coder. XZ instead is a container format for the LZMA2 coder, which
probably means LZMA+some metadata.
XZ is the official successor to .lzma, and GNU is using it already
(look at coreutils), and GNU tar officially supports it since 1.22.
msg98774 - (view) Author: Garen (Garen) Date: 2010-02-03 06:27
Once Python gets native support for lzma/xz like it does for zlib, bzip2 it could switch to using it for bundles and remote transfers. See:

http://mercurial.selenic.com/bts/issue1463

With lzma/xz being able to compress so much better, it'd be really appreciated by users on especially slow links(!!).
msg98776 - (view) Author: Garen (Garen) Date: 2010-02-03 06:34
Ugh, can't edit previous message. Meant to say:

"Once Python gets native support for lzma/xz like it does for zlib and bzip2, Mercurial could switch to using it for bundles and remote transfers."

For platforms with native support in-kernel (e.g. Linux) that could be used instead of the bundled version.

(Since Python is officially switching to Mercurial, arguably this issue even more important.)
msg98794 - (view) Author: Arkadiusz Miskiewicz Arkadiusz Miskiewicz (arekm) Date: 2010-02-03 19:44
About why xz is important. 

gnu.org, tug.org started publishing sources in xz format, quick grep:

autoconf/autoconf.spec:Source0: http://ftp.gnu.org/gnu/autoconf/%{name}-%{version}.tar.xz
coreutils/coreutils.spec:Source0:       http://ftp.gnu.org/gnu/coreutils/%{name}-%{version}.tar.xz
libpng12/libpng12.spec:Source0: http://downloads.sourceforge.net/libpng/libpng-%{version}.tar.xz
libpng/libpng.spec:Source0:     http://downloads.sourceforge.net/libpng/%{name}-%{version}.tar.xz
parted/parted.spec:Source0:     http://ftp.gnu.org/gnu/parted/%{name}-%{version}.tar.xz
texlive-texmf/texlive-texmf.spec:Source0:       ftp://tug.org/texlive/historic/%{year}/texlive-%{version}-texmf.tar.xz

xz is also supported by automake as dist target.
msg98806 - (view) Author: Antoine Pitrou (pitrou) Date: 2010-02-04 00:40
We all agree that lzma/xz is important, what is needed is a patch.
msg98899 - (view) Author: Antoine Pitrou (pitrou) Date: 2010-02-05 19:35
There's a Python binding for some lzma lib here: https://launchpad.net/pyliblzma
msg101941 - (view) Author: tdjacr (thedjatclubrock) Date: 2010-03-30 14:41
Once xz is implemented, xz compatibility should be added to the tarfile library.
msg106427 - (view) Author: Per Øyvind Karlsen (proyvind) Date: 2010-05-25 11:20
Ooops, I kinda should've commented on this issue here in stead, rather than in issue5689, so I'll just copy-paste it here as well:

I'm the author of the pyliblzma module, and if desired, I'd be happy to help out adapting pyliblzma for inclusion with python.
Most of it's code is based on bz2module.c, so it shouldn't be very far away from being good 'nuff.
What I see as required is:
* clean out use of C99 types etc.
* clean up the LZMAOptions class (this is the biggest difference from the bz2 module, as the filter supports a wide range of various options, everything related such as parsing, api documentation etc. was placed in it's own class, I've yet to receive any feedback on this decission or find any remote equivalents out there to draw inspiration from;)
* While most of the liblzma API has been implemented, support for multiple/alternate filters still remains to be implemented. When done it will also cause some breakage with the current pyliblzma API.

I plan on doing these things sooner or later anyways, it's pretty much just a matter of motivation and priorities standing in the way, actual interest from others would certainly have a positive effect on this. ;)

For other alternatives to the LGPL liblzma, you really don't have any, keep in mind that LZMA is "merely" the algorithm, while xz (and LZMA_alone, used for '.lzma', now obsolete, but still supported) are the actual format you want support for. The LZMA SDK does not provide any compatibility for this.
msg106430 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * Date: 2010-05-25 12:05
I will happily review any implementation, and I can help with inclusion into python trunk.

> ...the LGPL liblzma...

Can you check which licences cover the different parts of the module?  I think that you will have to contribute your code under the Python Contributor Agreement; and I just grabbed some copy of the "xz-utils" source package, and it states that "liblzma is in the public domain".
msg106433 - (view) Author: Per Øyvind Karlsen (proyvind) Date: 2010-05-25 13:06
ah, you're right, I forgot that the license for the library had changed as well (motivated by attempt of pleasing BSD people IIRC;), in the past the library was LGPL while only the 'xz' util was public domain..

For my code, feel free to use your own/any other license you'd like or even public domain (if the license of bz2module.c that much of it's derived from permits of course)!

I guess everyone should be happy now then. :)

Btw. for review, I think the code already available should be pretty much good 'nuff for an initial review. Some feedback on things not derived from bz2module.c would be nice, especially the LZMAOptions class would be nice as it's where most of the remaining work required for adding additional filters support. Would kinda blow if I did the work using an approach that would be dismissed as utterly rubbish. ;)

Oh well, it's out there available for anyone already, I probably won't(/shouldn't;) have time for it in a month at least, do as you please meanwhile. :)
msg106441 - (view) Author: Antoine Pitrou (pitrou) Date: 2010-05-25 15:34
Hello,

> I'm the author of the pyliblzma module, and if desired, I'd be happy
> to help out adapting pyliblzma for inclusion with python.
> Most of it's code is based on bz2module.c, so it shouldn't be very far
> away from being good 'nuff.

Well, I wouldn't say bz2module is the best module out there, but as you
say it's probably good enough :) And we can help you fix things if
needed.

Is pyliblzma compatible with Python 3.x? It's too late to incorporate
any new feature in Python 2.x now.

> * While most of the liblzma API has been implemented, support for
> multiple/alternate filters still remains to be implemented. When done
> it will also cause some breakage with the current pyliblzma API.

Hmm, then perhaps you should first fix the current API so that adding
new features doesn't force you to break the API again. There are strict
rules for API breakage in the standard library.

By the way, adding a new module to the stdlib probably requires writing
a PEP (Python Enhancement Proposal). I wouldn't expect this very
proposal to be controversial, but someone has to do it.

Finally, when a module is in the stdlib, it is expected that maintenance
primarily happens in the Python SVN (or Mercurial) tree. We have a
couple of externally-maintained modules, but they're a source of
problems for us.
msg106567 - (view) Author: Per Øyvind Karlsen (proyvind) Date: 2010-05-26 18:48
Yeah, I guess I anyways can just break the current API right away to make it compatible with future changes, I've already figured since long ago how it should look like. It's not like I have to implement the actual functionality to ensure compatibility, no-op works like charm. ;)
msg106572 - (view) Author: Martin v. Löwis (loewis) Date: 2010-05-26 19:54
[Replying to msg106566]

> if you're already looking at issue6715, then I don't get why you're
> asking.. ;)

Can you please submit a contributor form?


> Martin: For LGPL (or even GPL for that matter, disregarding linking
> restrictions) libraries you don't have to distribute the sources of
> those libraries at all (they're already made available by others, so
> that would be quite overly redundant, uh?;). LGPL actually doesn't
> even care at all about the license of your software as long as you
> only dynamically link against it.

Of course you do. Quoting from the LGPL

"You may convey a Combined Work ... if you also do each of the following:
 ...
 d) Do one of the following:
    0) Convey the Minimal Corresponding Source under the terms of this 
       License, and the Corresponding Application Code in a form 
       suitable for, and under terms that permit, the user to recombine 
       or relink the Application with a modified version of the Linked 
       Version to produce a modified Combined Work, in the manner 
       specified by section 6 of the GNU GPL for conveying 
       Corresponding Source.
    1) [not applicable to Windows]
"

> I don't really get what the issue would be even if liblzma were still
> LGPL, it doesn't prohibit you from distributing a dynamically linked
> library along with python either if necessary (which of course would
> be of convenience on win32..)..

Of course I can distribute a copy of an lzma DLL. However, I would have to provide ("convey") a copy of the source code of that DLL as well.
msg106578 - (view) Author: Antoine Pitrou (pitrou) Date: 2010-05-26 20:47
> Of course I can distribute a copy of an lzma DLL. However, I would
> have to provide ("convey") a copy of the source code of that DLL as
> well.

Can you tell me where you are currently providing the source code for
the readline library, or the gdbm library?

Oh, and by the way, you should probably shut down PyPI, since there are
certainly Python wrappers for LGPL'ed libraries there (or even GPL'ed
one), and you aren't offering a link to download those libraries' source
code either.

You seem to have no problem "conveying" copies for the source code of
non-LGPL libraries such as OpenSSL. Why is that?
msg106580 - (view) Author: Martin v. Löwis (loewis) Date: 2010-05-26 21:01
> Can you tell me where you are currently providing the source code for
> the readline library, or the gdbm library?

We don't, as they aren't included in the Windows distribution. The 
readline library doesn't work on Windows, anyway, and instead of gdbm, 
we had traditionally been distributing bsddb instead on Windows.

> Oh, and by the way, you should probably shut down PyPI, since there are
> certainly Python wrappers for LGPL'ed libraries there (or even GPL'ed
> one), and you aren't offering a link to download those libraries' source
> code either.

This is off-topic for this bug tracker. Please remain objective and 
professional if you can manage to.

> You seem to have no problem "conveying" copies for the source code of
> non-LGPL libraries such as OpenSSL. Why is that?

Not sure what you are referring to. We don't provide the sources for the 
OpenSSL libraries along with the Windows installer, because the license 
of OpenSSL doesn't require us to. This is very convenient for our users.
msg106581 - (view) Author: Antoine Pitrou (pitrou) Date: 2010-05-26 21:34
> > Oh, and by the way, you should probably shut down PyPI, since there are
> > certainly Python wrappers for LGPL'ed libraries there (or even GPL'ed
> > one), and you aren't offering a link to download those libraries' source
> > code either.
> 
> This is off-topic for this bug tracker. Please remain objective and 
> professional if you can manage to.

This whole subthread was already off-topic (since it was pointed out,
before your previous message, that the underlying lib is in the public
domain).

Actually, I would argue that the whole idea of promoting a rigorous
interpretation of a license has no place on the bug tracker. It makes no
sense to do this on an ad hoc fashion, especially if you want lawyers to
be involved (they are certainly not reading this).

(of course, you will also have understood that I disagree with such a
rigorous interpretation)

> Not sure what you are referring to. We don't provide the sources for the 
> OpenSSL libraries along with the Windows installer, because the license 
> of OpenSSL doesn't require us to. This is very convenient for our users.

This was not about providing the sources together with the installer
(which even the GPL or the LGPL don't require to do), but providing them
as a separate bundle on the download site.
We do have a copy of the OpenSSL source tree somewhere, it is used by
the Windows build process.
msg106710 - (view) Author: Per Øyvind Karlsen (proyvind) Date: 2010-05-29 06:48
I've ported pyliblzma to py3k now and also implemented the missing functionality I mentioned earlier, for anyone interested in my progress the branch is found at:
https://code.launchpad.net/~proyvind/pyliblzma/py3k

I need to fix some memory leakages (side effect of the new PyUnicode/Pybytes change I'm not 100% with yet;) and some various memory errors reported by valgrind etc. though, but things are starting to look quite nice already. :)
History
Date User Action Args
2010-08-19 20:14:16jreesesetnosy: + jreese
2010-06-22 16:08:27rcoynersetnosy: + rcoyner
2010-05-29 06:48:48proyvindsetmessages: + msg106710
2010-05-26 21:34:48pitrousetmessages: + msg106581
2010-05-26 21:15:37dokosetnosy: + doko
2010-05-26 21:01:18loewissetmessages: + msg106580
2010-05-26 20:47:25pitrousetmessages: + msg106578
2010-05-26 19:54:04loewissetnosy: + loewis
messages: + msg106572
2010-05-26 18:48:55proyvindsetmessages: + msg106567
2010-05-26 04:55:06lars.gustaebelsetnosy: + lars.gustaebel
2010-05-25 15:34:23pitrousetmessages: + msg106441
2010-05-25 13:18:45ysj.raysetnosy: + ysj.ray
2010-05-25 13:06:35proyvindsetmessages: + msg106433
2010-05-25 12:05:03amaury.forgeotdarcsetmessages: + msg106430
2010-05-25 11:20:28proyvindsetnosy: + proyvind
messages: + msg106427
2010-05-21 20:38:46Christophe Simonissetnosy: + Christophe Simonis
2010-05-21 15:29:20hayposetnosy: + haypo
2010-05-20 20:41:00skip.montanarosetnosy: - skip.montanaro
2010-05-08 14:30:20brian.curtinsetversions: - Python 2.7
2010-05-08 14:18:54ockham-razorsetnosy: + ockham-razor
2010-04-09 13:07:44Nikratiosetnosy: + Nikratio
2010-04-09 08:06:02nicdumzsetnosy: + nicdumz
2010-03-30 14:41:19thedjatclubrocksetnosy: + thedjatclubrock
messages: + msg101941
2010-02-05 19:39:41eric.araujosetnosy: + eric.araujo
2010-02-05 19:35:27pitrousetmessages: + msg98899
2010-02-04 00:40:56pitrousetmessages: + msg98806
2010-02-03 19:44:25arekmsetnosy: + arekm
messages: + msg98794
2010-02-03 06:34:59Garensetmessages: + msg98776
2010-02-03 06:27:58Garensetnosy: + Garen
messages: + msg98774
2010-01-27 15:58:57pitroulinkissue5689 dependencies
2010-01-27 15:58:41pitrousetpriority: high
dependencies: - please support lzma compression as an extension and in the tarfile module
2009-09-21 03:17:36leonovsetnosy: + leonov
2009-09-02 18:10:39devurandomsetmessages: + msg92174
2009-09-02 14:22:51amaury.forgeotdarcsetmessages: + msg92167
2009-09-02 11:02:12pitrousetnosy: + pitrou
messages: + msg92163
2009-08-17 12:13:13devurandomsetmessages: + msg91661
title: xz compression support -> xz compressor support
2009-08-17 11:51:44skip.montanarosetnosy: + skip.montanaro
messages: + msg91660
2009-08-17 11:37:26amaury.forgeotdarcsetversions: - Python 2.6, Python 3.0, Python 3.1
nosy: + amaury.forgeotdarc

messages: + msg91658

dependencies: + please support lzma compression as an extension and in the tarfile module
stage: needs patch
2009-08-17 09:47:22devurandomcreate