PyLZMA
Platform independent python bindings for the LZMA compression library.
PyLZMA
Impressed by the spectacular compression ratios of the Inno Setup compiler, I wanted to use the great compression algorithm LZMA in my own Python programs. As the LZMA SDK by Igor Pavlov is Open Source, it was no problem writing some Python wrappers for the C library. They currently run fine both on Windows and Linux, so hopefully, I can provide a tool that enables the user to read and create 7-zip compatible archives on Linux (as this is not supported by the original 7-zip).
Comparison
Here are the compression results of different data files with the zlib, bz2 and pylzma modules:
Description | Original | zlib | bz2 | pylzma | ||||
SVN export of version 0.1.0 | 542.720 | 100% | 97.923 | 18.04% | 79.660 | 14.68% | 74.009 | 13.64% |
20 JPEG wallpapers | 7.178.240 | 100% | 6.989.049 | 97.36% | 7.022.040 | 97.82% | 698.0443 | 97.24% |
libxml2-2.6.22.tar | 34.232.320 | 100% | 4.567.489 | 13.34% | 3.408.457 | 9.96% | 2.475.885 | 7.23% |
Depending on your input data, the differences between zlib/bz2 and pylzma may be even bigger!
Features
- Compression / decompression of a single block of data
- Compression from a file-like object (must provide a read method)
- Streaming decompression through multiple calls to decompress
- An initial library that supports reading of 7-zip archives (both solid and non-solid)
- Compiles and runs on Windows, Linux and OSX
- Multithreaded compression on Windows
- Built with LZMA SDK 4.65
Download
You can download the binaries and the source code for the wrappers from the Python Package Index.
For building, simply run:
python setup.py build
Afterwards, you will find a file pylzma.pyd in the directory build/lib.win32-<PythonVersion> that can get imported by Python. On linux, the file will be called pylzma.so and can be found in a directory called build/lib.linux-<arch>-<PythonVersion>.
Compilation has been tested with Microsoft Visual Studio 2003, GCC 3 (Linux, Cygwin), GCC 4 (Linux) but should work with any ANSI C compiler. Please let me know if you encounter any problems.
Installation using Python eggs
If you installed the EasyInstall package, you can install the latest version of pylzma using the following command:
easy_install pylzma
Refer to the EasyInstall documentation for further details about installing Python eggs. EasyInstall queries the Python Package Index and automatically fetches the latest release.
Git repository
To get access to my development repository, head your browser to the following URL: http:/github.com/fancycode/pylzma
Third-party ports
You can find a port to FreeBSD on freshports.org.
A MacOS X port is maintained at darwinports.com.
Bugs
Please bring all issues to my Bugzilla bugtracker.
If you like this software, please give me some feedback!
Will there be a port of PyLZMA to Python 3 sometime soon?
I currently don’t have plans for a Python 3 port, but I happily welcome patches you send 😉
Experimental code to support Python 3 has just been committed to Github….
Have fun!
Excellent. Checking it out now.
i’ve got an error under windows :
C:\Temp\fancycode-pylzma-v0.4.2-0-gda25a6e\fancycode-pylzma-f6adfd5>python2 setu
p.py install >> result.txt
Traceback (most recent call last):
File “setup.py”, line 147, in
zip_safe = False,
File “C:\DEV\Python2\lib\distutils\core.py”, line 152, in setup
dist.run_commands()
File “C:\DEV\Python2\lib\distutils\dist.py”, line 953, in run_commands
self.run_command(cmd)
File “C:\DEV\Python2\lib\distutils\dist.py”, line 972, in run_command
cmd_obj.run()
File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
command\install.py”, line 76, in run
File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
command\install.py”, line 85, in do_egg_install
File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
dist.py”, line 395, in get_command_class
File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\pkg_resourc
es.py”, line 1954, in load
File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
command\easy_install.py”, line 21, in
File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
package_index.py”, line 2, in
File “C:\DEV\Python2\lib\urllib2.py”, line 94, in
import httplib
File “C:\DEV\Python2\lib\httplib.py”, line 70, in
import socket
File “C:\DEV\Python2\lib\socket.py”, line 47, in
import _socket
ImportError: DLL load failed: %1 nÆest pas une application Win32 valide.
This looks more like a problem with your local Python installation, as it can’t import the “_socket” module.
[…] […]
Hello,
I tried to compile on centos 5.5 – the 64 bit version goes okay but the i386 machne got error:
In file included from src/7zip/C/CpuArch.c:4:
src/7zip/C/CpuArch.h:136: warning: function declaration isn’t a prototype
src/7zip/C/CpuArch.h:137: warning: function declaration isn’t a prototype
src/7zip/C/CpuArch.c:127: warning: function declaration isn’t a prototype
src/7zip/C/CpuArch.c:160: warning: function declaration isn’t a prototype
src/7zip/C/CpuArch.c: In function ‘MyCPUID’:
src/7zip/C/CpuArch.c:75: error: can’t find a register in class ‘BREG’ while reloading ‘asm’
error: command ‘gcc’ failed with exit status 1
Please help.
Many thanks for your supports
Could you please try the latest code from Github and see if this fixes the issue?
This is great, just what I am looking for.
But I cannot build it, as it appears I need a C-compiler.
Is there anyway I can circumvent this?
As this is a binary extension, you will need a compiler to build it. However if you are running on Windows, you can use the binary egg releases from PyPI.
Hi,
I want to find LZMA compression algorithm specification which is
unabridged and detailed.Where can I find this?
The information is not unabridged on http://7-zip.org/7z.html.
Contect me
EMAIL: sdfuch@yahoo.com.cn
Thanks
Please contact the original authors of 7-Zip for any questions about the LZMA format. My library is just a wrapper and I don’t know nothing about the algorithm itself.
Does it provide the readline interface, just like that in bz2 ?
Not yet, but I happily accept patches 😉
Hey (:
I’ve been trying to compile pylzma for windows 7 /python 2.6.6 amd64 , but I’m definitely out of luck …
Here’s what I’ve tried:
– Already had installed MSVC compiler for AMD64 with Visual Studio 2008 (v9.0)
– Cloned the GIT repo
– tried setup.py install, but got bunch of compiler error
– tried to change all .c file extension to .cpp and modified the setup.py accordingly
– Now they compile fine (after few tweaks in CpuArch and another file which had a switch/case block), but now the linker choke with these errors (Might look really bad in this little text field):
Creating library build\temp.win-amd64-2.6\Release\src/pylzma\pylzma.lib and object build\temp.win-amd64-2.6\Release\src/pylzma\pylzma.exp
pylzma.obj : error LNK2001: unresolved external symbol “char const * const doc_decompress” (?doc_decompress@@3QBDB)
pylzma.obj : error LNK2001: unresolved external symbol “struct _object * __cdecl pylzma_decompress(struct _object *,struct _object *)” (?pylzma_decompress@@YAPEAU_object@@PEAU1@0@Z)
pylzma.obj : error LNK2001: unresolved external symbol “char const * const doc_compress” (?doc_compress@@3QBDB)
Aes.obj : error LNK2019: unresolved external symbol “void __cdecl AesCtr_Code_Intel(unsigned int *,unsigned char *,unsigned __int64)” (?AesCtr_Code_Intel@@YAXPEAIPEAE_K@Z) referenced in function AesGenTables
Aes.obj : error LNK2019: unresolved external symbol “void __cdecl AesCbc_Decode_Intel(unsigned int *,unsigned char *,unsigned __int64)” (?AesCbc_Decode_Intel@@YAXPEAIPEAE_K@Z) referenced in function AesGenTables
Aes.obj : error LNK2019: unresolved external symbol “void __cdecl AesCbc_Encode_Intel(unsigned int *,unsigned char *,unsigned __int64)” (?AesCbc_Encode_Intel@@YAXPEAIPEAE_K@Z) referenced in function AesGenTables
AesOpt.obj : error LNK2019: unresolved external symbol _mm_aesenclast_si128 referenced in function “void __cdecl AesCbc_Encode_Intel(union __m128i *,union __m128i *,unsigned __int64)” (?AesCbc_Encode_Intel@@YAXPEAT__m128i@@0_K@Z)
AesOpt.obj : error LNK2019: unresolved external symbol _mm_aesenc_si128 referenced in function “void __cdecl AesCbc_Encode_Intel(union __m128i *,union __m128i *,unsigned __int64)” (?AesCbc_Encode_Intel@@YAXPEAT__m128i@@0_K@Z)
AesOpt.obj : error LNK2019: unresolved external symbol _mm_aesdeclast_si128 referenced in function “void __cdecl AesCbc_Decode_Intel(union __m128i *,union __m128i *,unsigned __int64)” (?AesCbc_Decode_Intel@@YAXPEAT__m128i@@0_K@Z)
AesOpt.obj : error LNK2019: unresolved external symbol _mm_aesdec_si128 referenced in function “void __cdecl AesCbc_Decode_Intel(union __m128i *,union __m128i *,unsigned __int64)” (?AesCbc_Decode_Intel@@YAXPEAT__m128i@@0_K@Z)
build\lib.win-amd64-2.6\pylzma.pyd : fatal error LNK1120: 10 unresolved externals
error: command ‘”C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\BIN\amd64\link.exe”‘ failed with exit status 1120
…
Any idea how I could make msvc to compile the .c files or how to solve those link issues? :/ I’ve some problem figuring what they mean, even with msdn (and my lack of c/cpp knowledge)
Thanks a lot (: and can’t wait to use the python binding.
Ps: Or even better, is there a way to install pylzma for windows 7 x64 and python 2.6.6?
Replied on SO. And here is a patch http://aura.zartsoft.ru/~zart/pypi/pylzma-fix-msvc-compiling.diff
Thank you very much!
The patch has been applied on Github, thanks!
Also posted on SO:
http://stackoverflow.com/questions/4764972/installing-compiling-pylzma-lzma-python-binding
[…] Hello guys,I’ve already posted this question on the authors website (and still waiting for moderation), but I thought I might ask here as well.I’ve been trying […]
I’ve just successfully built and installed pylzma-0.4.3 on Windows XP Pro with Python 2.7.1 using mingw. No problems. (Python installed with the Windows x86 MSI installer from python.org).
pylzma is being used on (pickle) files that are transferred over slightly unreliable 9600 baud dialup connections. Half the size of zip compression. Reduced transfer time = fewer redials = lower costs and much less frustration. Thank you.
help!
if i try to install with easy_install, it exits with
File “C:\Python26\lib\site-packages\setuptools-0.6c11-py2.6.egg\setuptools\package_index.py”, line 475, in fetch_distribution
AttributeError: ‘NoneType’ ibject has no attribute ‘clone’
you dont happen to have a simple, pre-built distribution do you? xxx
probably a good idea to mention i am on python 2.6 xx
What the heck?
It says “You can download the binaries and the source code for the wrappers below,” but there is no link anywhere on this page that I can find.
It looks like the best thing to do is go to http://www.joachim-bauch.de/
From there it says “You can get the source tarball from the Python Package Index or github.”
But the current page (projects/pylzma) is still the #1 hit on google for “python lzma” and although there is a link to github, I think the Python Package Index is easier for most people to download from, and certainly feels less scary than trying to fetch what may or may not be an experimental bleeding edge version in the source repository.
Well, by using “easy_install” as described above, the Python Package Index is queried for the latest released version. The text above was copied from my old page und surely is a bit unclear. I’ll update it to refer to the Python PI.
Hi Joachim, I’d like to install PyLZMA and I’ve got error like this:
setup.py:92: UnsupportedPlatformWarning: Multithreading is not supported on the platform “linux2”,
please contact mail@joachim-bauch.de for more informations.
please contact mail@joachim-bauch.de for more informations.””” % (sys.platform), UnsupportedPlatformWarning)
building ‘pylzma’ extension
creating build/temp.linux-i686-2.6
creating build/temp.linux-i686-2.6/src
creating build/temp.linux-i686-2.6/src/pylzma
creating build/temp.linux-i686-2.6/src/sdk
creating build/temp.linux-i686-2.6/src/7zip
creating build/temp.linux-i686-2.6/src/7zip/C
creating build/temp.linux-i686-2.6/src/compat
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DWITH_COMPAT=1 -Isrc/sdk -I/usr/include/python2.6 -c src/pylzma/pylzma.c -o build/temp.linux-i686-2.6/src/pylzma/pylzma.o
src/pylzma/pylzma.c:26: fatal error: Python.h: No such file or directory
compilation terminated.
error: command ‘gcc’ failed with exit status 1
Thank you for your nice wrapper
You will have to install the Python development package containing the header files and libraries to link against.
Usually this package is called “python-dev” on linux platforms.
Thank you Joachim, python-dev solved the problem
Is there API documentation available?
I can’t seem to find any hosted online. I’ve tried running help(pylzma), but many of the doc strings are marked as “todo” or do not explain any of the parameters. Some of the doc strings invite you to instead run “help(type(x))” where I assume x is the class/module/function. This is not helpful either.
How convert lastwritetime file to python datetime?
i’m looking for this too!
Has anyone tried decompressing data generated from this JS lZMA port?
http://nmrugg.github.com/LZMA-JS/LZMA-JS_demo_simple/LZMADemo_simple.html
I’ve been unable to get it to work, though it seems like it should be compatible
Not that it’s the answer, but
p = Popen([‘/opt/local/bin/unlzma’], stdout=PIPE,stdin=PIPE)
payload = p.communicate(input=bytes)[0]
Works perfectly decoding data from LZMA-JS, so I’m guessing its some difference between the command line version and the JS version.
Hi Joachim, is there any pylzma egg package for Linux? I am browsing through pylzma repository and I’ve only found egg package for Windows.
Thank you
There is no binary egg for Linux as the dependencies might vary between the different distributions. However it’s a lot easier on Linux install a compiler and build the modules than on Windows.
Just get the source tarball and execute
python setup.py bdist_egg
to build the egg after extracting.Hi,
is there an example or documentation in how to use this library?
e.g. compressing Files?
Found no clue’s in the sourcecode or arround the web.
Thank you.
You can check out the unittests included in the distribution. The API is pretty similar to what the “zlib” module provides.
Does anyone has a clear example of how to descompress a 7zip file?
[…] for quite a while and couldn’t find a python implementation of ppmz, but I did find another method ported to python with lzma, the compressor behind 7zip. Lzma uses a different implementation of Lempel-Ziv, […]
Hello,
Sorry…I am looking for documentation…
Install OK
I just try to decompress a 7z file… without success
Tx
Hey Joachim – you might never read this but thanks for writing up this post and showing off these libraries. I was trying to find a solution to use in Windows so I’ll download that package from PyPI and see what I can do.
Cheers,
-FB
Hi,
I’m quite new to handling binary files, I’m looking for a python implementation to decompress a 7zip and save the result as a new file.
Any chance of pointing me to a small example illustrating this? (I understand this goes beyond the scope of pylzma…..still it would be very helpful to me to understand how to achieve it
Thanks
L
Below is code I use to convert 7z to zip. Should be a good start for you.
# open the archive file
fp = open(archive_rom_name, ‘rb’)
archive = Archive7z(fp)
filenames = list(archive.getnames())
z = zipfile.ZipFile(file_name + “.zip”, “w”,compression)
# loop through all files in archive and write to zip
for archive_file_name in filenames:
try:
zip_IO_buffer = StringIO
cf = archive.getmember(archive_file_name)
zip_IO_buffer = cf.read()
z.writestr(archive_file_name, zip_IO_buffer)
#del zip_IO_buffer
except:
z.close()
os.remove(archive_file_name)
break;
z.close()
[…] for quite a while and couldn’t find a python implementation of ppmz, but I did find another method ported to python with lzma, the compressor behind 7zip. Lzma uses a different implementation of Lempel-Ziv, […]
How in the world does one create a 7z with multiple files in it? My google fu is failing me and I just can’t get it to work. Below is the method I’m using to 7z one file and it works fine. Do, I header/data/header/data, header/header/data/data or sumthing else entirely as I’ve tried em both?
import struct
# import compression mods
import pylzma
from py7zlib import Archive7z
from StringIO import StringIO
# open file and create 7z
iconfile = open(“testbin.col”,”rb”)
file_data = iconfile.read()
fin = open(“test_encode.7z”, “wb”)
archive_data = StringIO()
comp_data = pylzma.compressfile(StringIO(file_data))
# LZMA header
result = comp_data.read(5)
# size of uncompressed data
result += struct.pack(‘<Q', len(file_data))
# compressed data
archive_data.write(result + comp_data.read())
fin.write(archive_data.getvalue())
archive_data.close()
fin.close()
iconfile.close()
PyLZMA doesn’t work with regular 7z files, only single-file LZMA archives. (Also not for LZMA2/XZ.) So quick answer: Pipe through tar before you compress. If you need to work with regular 7z files, you need to use 7z.dll or 7z.so.
How to Use pylzma?
There is a file l2.exe.zinn it must decompress, how do it?
http://cdn.inn.ru/ncsoft/lineage2/patch/live_nb/system/l2.exe.zinn
Size l2.exe 3130696
I see, I guess that’s why I couldn’t get it to work. Guess I’ll just end up adding full 7z as a dependency of my current project. Thanks.
would you please add a win64 binary package? it’s a PITA to compile it
I get the message: “DeprecationWarning: matchfinder selection is deprecated and will be ignored”.
Why is it deprecated?
How can i use 2 bytes hashing?
I’m using IronPython 2.7. Unfortunately, IronPython does not have setuptools.
How would I go about manually compiling the library?
T
IronPython does support distutils.
Just wanted to say thanks for releasing this. It worked like a charm in my project. I took the python 3.2 fork off github, and it compiled and worked without a hitch for python 2.6, 2.7 and 3.2.
Hi!
Anyone can help me to make 2 little python script based on this pylzma:
1. 7zcompress input.ext
Compress with 7z WITHOUT 7z HEADER to input.ext.7z
No need to be 7z compatible! 7z format contains header, file informations, and I no need it! I want a pure stream/string compressor based on 7z.
I will use it from Linux CLI and need a SMALLEST file size.
2. 7zdecompress input.ext.7z
Output will be input.ext
Thanks,
Alain
Hi,
What are the pros/cons of using pylzma over pyliblzma?
Hi Joachim,
thanks very much for your port of LZMA – I find it very useful for filtering archives of network data. However, I have a problem decompressing archives which are generated incrementally adding two or more files simultaneously. I have found that the assumption in Archive7z.__init__ that “every file has it’s own folder” does not hold. I appreciate this may be a cause of “Don’t Do That!”. I am sure I should raise this on your bugzilla, but you do not seem to have a category for py7zlib bugs?
In order to work around the problem I found that (1) SubstreamsInfo.__init__ contained a bug:
Where id == PROPERTY_SIZE, the sum must be set to zero before the inner loop (otherwise the total is carried across and incorrect sizes (often negative!) result.
(2) In order to more easily process the unpacking info, I changed the loop and its preamble in Archive7z.__init__ to:
self.solid = packinfo.numstreams == 1
if self.solid:
# the files are stored in substreams
if hasattr(subinfo, ‘unpacksizes’):
unpacksizes = subinfo.unpacksizes
else:
unpacksizes = [x.unpacksizes[0] for x in folders]
else:
# check every file has its own folder with compressed data
if unpackinfo.numfolders == files.numfiles:
unpacksizes = [x.unpacksizes[0] for x in folders]
else:
unpacksizes = subinfo.unpacksizes
src_pos = self.afterheader
maxsize = (self.solid and packinfo.packsizes[0]) or None
idx2 = 0
for fidx in range(unpackinfo.numfolders):
folder = folders[fidx]
pos = 0
old_src_pos = src_pos
numps = subinfo.numunpackstreams[fidx]
for ssidx in range(numps):
info = files.files[idx2]
if info[’emptystream’]:
continue
info[‘compressed’] = (not self.solid and packsizes[fidx]) or None
filesize = unpacksizes[idx2]
info[‘uncompressed’] = filesize
file = ArchiveFile(info, pos, src_pos,
# unpacksizes[fidx],
filesize,
folder, self, maxsize=maxsize)
if subinfo.digestsdefined[idx2]:
file.digest = subinfo.digests[idx2]
self.files.append(file)
pos += unpacksizes[idx2]
idx2 += 1
src_pos = old_src_pos
if not self.solid:
src_pos += packsizes[fidx]
My apologies for the length of this post, and any poor code formatting. I should also add that I have not conducted exhastive testing on my workaround – it just works for the kind of archives I am encoundering.
Regards,
Hamish
Is there a chance to get py7ziplib.Archive7z works with archive’s subdirs on Windows? I got “IndexError: list index out of range” if opened archive contains directories.
[…] want to use PyLZMA to extract a file from an archive (e.g. test.7z) and extract it to the same […]
[…] I check the documentation on the authors page http://www.joachim-bauch.de/projects/pylzma/ and go to the designated folder I see the following […]
Please add that exist python-pylzma for openSUSE
https://software.opensuse.org/package/python-pylzma?search_term=python-pylzma
https://build.opensuse.org/package/show/devel:languages:python/python-pylzma
[…] I’m sure that there might be some more obscure formats with better compression, but lzma is the best, of those that are well supported. There are some python bindings here. […]
[…] I check the documentation on the authors page http://www.joachim-bauch.de/projects/pylzma/ and go to the designated folder I see the following […]