top of page

Spark File Format Conversion Utility

  • Writer: Vishakh Rameshan
    Vishakh Rameshan
  • Jan 3, 2021
  • 1 min read

I am a Data Engineer performing business operations on Big Data files of various formats like Avro, Parquet, CSV, Text etc on HDFS and the processed data is made available to Data Scientists or Data Stewards.


Writing Spark Job is easy compared with testing it and creating mock test data. It becomes even more harder when the input and output files need to be encrypted and decrypted. As you all know Avro and Parquet files are non readable, so having mock data created in csv/txt and later converting to Avro or Parquet and if the test cases fail, converting back the input files or editing csv files and again converting back to avro is a tedious job.


So to make my work and my colleagues work easy, I have created a utility that independently runs as spring boot with spark integrated and having a Swagger UI to interact with.



Currently supports the following conversions:-

  • PARQUET Conversion

    • TEXT to PARQUET

    • CSV to PARQUET

    • AVRO to PARQUET

  • CSV Conversion

    • TEXT to CSV

    • AVRO to CSV

    • PARQUET to CSV

  • AVRO Conversion

    • TEXT to AVRO

    • CSV to ARVO

    • PARQUET to AVRO

  • TEXT Conversion

    • CSV to TEXT

    • AVRO to TEXT

    • PARQUET to TEXT

Interact with Swagger UI


This utility does not support complex Avro Schemas

1 commento


proofofcinceptoncloud
03 gen 2021

Something that I was looking for. Thank you

Mi piace
bottom of page