Skip to content

This project consists in the design and implementation of a Bloom Filter for IMDb datasets using MapReduce (Hadoop and Spark frameworks).

License

Notifications You must be signed in to change notification settings

balditommaso/BloomFilter-MapReduce

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BloomFilter-MapReduce

Project developed for the Cloud Computing course of the Master of Artificial Intelligence and Data Engineering at the University of Pisa.

This project consists in the design and implementation of a Bloom Filter for IMDb datasets using MapReduce (Hadoop and Spark frameworks).

Repository

The repository is organized as follows:

  • dataset/ contains the IMDb dataset stored in film_ratings.txt
  • docs/ contains the report and the assignment
  • hadoop/ contains the Hadoop implementation and test
  • results/ contains testing results and analysis
  • spark/ contains the Spark implementation and test

Contributors

About

This project consists in the design and implementation of a Bloom Filter for IMDb datasets using MapReduce (Hadoop and Spark frameworks).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 48.2%
  • Jupyter Notebook 45.2%
  • Python 6.6%