Open Source Projects for Extracting Data and Metadata from Files & the Web

I’ve been looking around for open-source libraries (preferably in Java, but not required) for extracting data and metadata from common file formats and Web formats. One project that looks very promising is Aperture. Do you know of any others that are ready or almost ready for prime-time use? Please let me know in the comments! Thanks.